337 lines
16 KiB
Markdown
337 lines
16 KiB
Markdown
|
|
<p align="center"><a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><picture>
|
||
|
|
<source media="(prefers-color-scheme: dark)" srcset="assets/ideogram_logo_darkmode.svg">
|
||
|
|
<source media="(prefers-color-scheme: light)" srcset="assets/ideogram_logo.svg">
|
||
|
|
<img src="assets/ideogram_logo.svg" alt="Ideogram" width="500">
|
||
|
|
</picture></a></p>
|
||
|
|
|
||
|
|
<p align="center"><em>Ideogram 4: Open image model at the forefront of design</em></p>
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<a href="https://ideogram.ai/blog/ideogram-4.0/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Blog-Post-orange" alt="Blog Post"></a>
|
||
|
|
<a href="https://github.com/ideogram-oss/ideogram4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Code-GitHub-181717?logo=github" alt="Code"></a>
|
||
|
|
<a href="https://huggingface.co/collections/ideogram-ai/ideogram-4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Model-HuggingFace-blue?logo=huggingface" alt="Model"></a>
|
||
|
|
<a href="https://developer.ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/API-developer.ideogram.ai-purple" alt="API"></a>
|
||
|
|
<a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Official%20Site-ideogram.ai-ff69b4" alt="Official Site"></a>
|
||
|
|
</p>
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/samples/collage_landscape.jpg" alt="A collage of Ideogram 4 samples spanning photorealism, illustration, typography, and poster design">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
|
||
|
|
Ideogram 4 is **[Ideogram](https://ideogram.ai)'s first open-weight text-to-image model**. It is a **state-of-the-art foundation model trained from scratch** — not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at **[ideogram.ai](https://ideogram.ai/)**.
|
||
|
|
|
||
|
|
We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence.
|
||
|
|
|
||
|
|
## Table of Contents
|
||
|
|
|
||
|
|
1. [News](#news)
|
||
|
|
2. [Model Zoo](#model-zoo)
|
||
|
|
3. [Performance](#performance)
|
||
|
|
4. [Quick Start](#quick-start)
|
||
|
|
5. [Model Summary](#model-summary)
|
||
|
|
6. [Prompting Guide](#prompting-guide)
|
||
|
|
7. [Documentation](#documentation)
|
||
|
|
8. [Citation](#citation)
|
||
|
|
|
||
|
|
## News
|
||
|
|
|
||
|
|
* **[2026-06-03]** **Ideogram 4 released!** Inference code and weights
|
||
|
|
are now public, and our [technical blog post](https://ideogram.ai/blog/ideogram-4.0/) is live. See the
|
||
|
|
[Quick Start](#quick-start) section to generate your first image, or try the
|
||
|
|
model online at [ideogram.ai](https://ideogram.ai/).
|
||
|
|
|
||
|
|
## Model Zoo
|
||
|
|
|
||
|
|
| Model | Params | Weight Quantization | Supported Hardware | Diffusers Support | License |
|
||
|
|
| :--- | :---: | :---: | :---: | :---: | :---: |
|
||
|
|
| **[Ideogram 4 (nf4)](https://huggingface.co/ideogram-ai/ideogram-4-nf4)** | 9.3B | nf4 | CUDA | Yes | [Ideogram 4 Non-Commercial](model_licenses/LICENSE-IDEOGRAM-4-NON-COMMERCIAL) |
|
||
|
|
| **[Ideogram 4 (fp8)](https://huggingface.co/ideogram-ai/ideogram-4-fp8)** | 9.3B | fp8 | All | No | [Ideogram 4 Non-Commercial](model_licenses/LICENSE-IDEOGRAM-4-NON-COMMERCIAL) |
|
||
|
|
|
||
|
|
We plan to support more quantizations in the future.
|
||
|
|
|
||
|
|
|
||
|
|
## Performance
|
||
|
|
|
||
|
|
We evaluate Ideogram 4 across third-party arenas and benchmarks, standard
|
||
|
|
open-source benchmarks, and our own internal human-preference benchmark. Across
|
||
|
|
all of them, **Ideogram 4 is the best open-weight image model by far, and sits
|
||
|
|
at the frontier of design.**
|
||
|
|
|
||
|
|
### Design Arena
|
||
|
|
|
||
|
|
[Design Arena](https://www.designarena.ai/) is a third-party image Elo
|
||
|
|
leaderboard focused specifically on design-oriented generation. On the overall
|
||
|
|
board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary
|
||
|
|
GPT and Gemini models:
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/design_arena.png" alt="Design Arena overall image Elo leaderboard with Ideogram 4.0 as the top open-weight model">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
Filtered to open-weight models only, Ideogram 4 leads by a commanding margin,
|
||
|
|
well ahead of the next-best open model:
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/design_arena2.png" alt="Design Arena open-weight image Elo leaderboard, with Ideogram 4.0 well ahead of all other open models">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
### ContraLabs
|
||
|
|
|
||
|
|
[ContraLabs](https://contralabs.com/research) ran a blind typography evaluation judged by
|
||
|
|
ten professional designers from Contra's top-earning talent. Ideogram 4 leads on
|
||
|
|
first-place win rate, picked as the best of four models 47.9% of the time
|
||
|
|
overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%,
|
||
|
|
FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/contralabs_typography.png" alt="ContraLabs typography first-place win rate, with Ideogram v4 leading">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
It also wins on practical usability: asked "Would you use this in real client
|
||
|
|
work?", the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly
|
||
|
|
above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/contralabs_typography2.png" alt="ContraLabs 'would you use this in real client work?' rating, with Ideogram v4 leading">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
### LMArena
|
||
|
|
|
||
|
|
On [LMArena](https://lmarena.ai/), a third-party text-to-image leaderboard that
|
||
|
|
measures general-purpose text-to-image use cases, Ideogram is the top-ranked
|
||
|
|
open-weight lab and a top-5 image generation lab overall — beaten only by giant
|
||
|
|
companies with vastly larger budgets and resources:
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/lmarena_benchmark.png" alt="LMArena text-to-image lab leaderboard with Ideogram">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
### Ideogram internal eval
|
||
|
|
|
||
|
|
For our internal human-preference benchmark, focused on graphic design and
|
||
|
|
photography, we had graphic designers deeply familiar with professional design
|
||
|
|
work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall —
|
||
|
|
behind only GPT Image 2 medium — and the top open-weight model:
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/ideogram_benchmark.png" alt="Ideogram internal design leaderboard with Ideogram 4.0">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
### Open-source benchmarks
|
||
|
|
|
||
|
|
On standard open-source benchmarks measuring core capabilities — layout control
|
||
|
|
(7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering
|
||
|
|
(X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the
|
||
|
|
leading closed-source models across every axis. On layout control (7Bench), it
|
||
|
|
is significantly better than all closed-source models:
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/opensource.png" alt="Five-axis capability radar comparing Ideogram 4.0 to leading closed-source models on layout control, spatial reasoning, object fidelity, prompt alignment, and text rendering">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight
|
||
|
|
release we benchmarked — ahead of much larger models like Qwen-Image (20B),
|
||
|
|
FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="assets/benchmarks/opensource2.png" alt="Parameter-efficiency scatter plot showing Ideogram 4.0 at 9.3B parameters leading all other open-weight models on text rendering">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
### Install
|
||
|
|
|
||
|
|
```bash
|
||
|
|
pip install .
|
||
|
|
```
|
||
|
|
|
||
|
|
If you plan to modify the code, install in editable mode instead so changes
|
||
|
|
under `src/ideogram4/` take effect without reinstalling:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
pip install -e .
|
||
|
|
```
|
||
|
|
|
||
|
|
### Model access
|
||
|
|
|
||
|
|
The model weights are **gated** on Hugging Face, so you must accept the gate and
|
||
|
|
authenticate before the code can download them — otherwise the download fails
|
||
|
|
with a `404` / `GatedRepoError`.
|
||
|
|
|
||
|
|
1. Open the model page — [ideogram-ai/ideogram-4-nf4](https://huggingface.co/ideogram-ai/ideogram-4-nf4)
|
||
|
|
(or [ideogram-ai/ideogram-4-fp8](https://huggingface.co/ideogram-ai/ideogram-4-fp8)) — and click
|
||
|
|
**Agree and access repository** to accept the license gate.
|
||
|
|
2. Create a Hugging Face access token at
|
||
|
|
[huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) and log in so the
|
||
|
|
download is authenticated:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
hf auth login
|
||
|
|
```
|
||
|
|
|
||
|
|
Alternatively, export the token directly: `export HF_TOKEN="hf_..."`.
|
||
|
|
|
||
|
|
### CLI
|
||
|
|
|
||
|
|
The plain `--prompt` is rewritten into the structured JSON caption the model
|
||
|
|
expects by a "magic prompt" LLM. By default this uses Ideogram's hosted
|
||
|
|
magic-prompt API, which is **free** and does the expansion server-side (no local
|
||
|
|
model or system prompt needed). It reads `IDEOGRAM_API_KEY` — get a key at
|
||
|
|
https://developer.ideogram.ai/:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python run_inference.py \
|
||
|
|
--prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
|
||
|
|
--output out.png \
|
||
|
|
--quantization "nf4" \
|
||
|
|
--magic-prompt-key "$IDEOGRAM_API_KEY"
|
||
|
|
```
|
||
|
|
|
||
|
|
You can also run the expansion through your own LLM provider — one of our magic-prompt
|
||
|
|
system prompt is **open source**. See the
|
||
|
|
[Prompting Guide](docs/prompting.md#magic-prompt) for details.
|
||
|
|
|
||
|
|
For the highest-quality images, set `--height 2048 --width 2048` and
|
||
|
|
`--sampler-preset V4_QUALITY_48`.
|
||
|
|
|
||
|
|
#### Safety screening with Hive
|
||
|
|
|
||
|
|
Prompt and output safety screening is performed via [Hive](https://thehive.ai/).
|
||
|
|
Sign up and create a Text Moderation key and a Visual Content Moderation key,
|
||
|
|
then export them as `HIVE_TEXT_MODERATION_KEY` and `HIVE_VISUAL_MODERATION_KEY`
|
||
|
|
(or pass them via `--hive-text-key` / `--hive-visual-key`).
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python run_inference.py \
|
||
|
|
--prompt "an isometric illustration of a tiny city floating in the clouds" \
|
||
|
|
--output out.png \
|
||
|
|
--quantization "nf4" \
|
||
|
|
--magic-prompt-key "$MAGIC_PROMPT_API_KEY" \
|
||
|
|
--hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
|
||
|
|
--hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"
|
||
|
|
```
|
||
|
|
|
||
|
|
For sampler presets, parameter reference, and optimization tips, see
|
||
|
|
[docs/inference.md](docs/inference.md).
|
||
|
|
|
||
|
|
## Model Summary
|
||
|
|
|
||
|
|
Ideogram 4 is a **foundation model trained entirely from scratch**, not a
|
||
|
|
fine-tune or distillation of any existing checkpoint. It is a flow-matching
|
||
|
|
text-to-image model built on a **fully single-stream** Diffusion Transformer
|
||
|
|
(DiT) architecture.
|
||
|
|
|
||
|
|
**Architecture:**
|
||
|
|
- **Fully single-stream DiT.** Text and image tokens are concatenated into one
|
||
|
|
unified sequence and processed through the same 34-layer transformer, with no
|
||
|
|
separate text or image branches. This enables deep cross-modal interaction at
|
||
|
|
every layer.
|
||
|
|
- **Vision-language model as text encoder.** Instead of a text-only encoder
|
||
|
|
like CLIP or T5, Ideogram 4 uses
|
||
|
|
[Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct),
|
||
|
|
a full vision-language model that provides far richer understanding of visual
|
||
|
|
concepts. Hidden states are extracted from **13 intermediate layers** and
|
||
|
|
concatenated, giving the model multi-scale semantic features ranging from
|
||
|
|
surface-level token information to deep compositional understanding.
|
||
|
|
- **Dual-branch classifier-free guidance.** The conditional (positive) and
|
||
|
|
unconditional (negative) branches can be independently refined, enabling
|
||
|
|
separate control over prompt adherence and image quality.
|
||
|
|
- **Flexible resolution.** Native support for any resolution from 256 to 2048
|
||
|
|
(multiples of 16), with aspect ratios up to 6:1. A single model handles
|
||
|
|
everything from square thumbnails to ultrawide banners, with the noise
|
||
|
|
schedule auto-adjusting per resolution.
|
||
|
|
|
||
|
|
**Key Capabilities:**
|
||
|
|
- **Extreme controllability.** Ideogram 4 is trained on structured JSON
|
||
|
|
captions, giving users unprecedented control over composition, style,
|
||
|
|
lighting, color palette, typography, and spatial layout, all from a single
|
||
|
|
prompt.
|
||
|
|
- **State-of-the-art text rendering.** Ideogram 4 delivers best-in-class
|
||
|
|
in-image text generation (signage, logos, captions, watermarks, multi-line
|
||
|
|
text) with high fidelity directly from the prompt.
|
||
|
|
- **Spatial layout control.** Bounding-box coordinates in the prompt allow
|
||
|
|
explicit placement of subjects, text elements, and background regions.
|
||
|
|
- **Color palette conditioning.** Specify hex colors in the prompt to steer the
|
||
|
|
image's dominant color scheme.
|
||
|
|
|
||
|
|
For full architecture details, see
|
||
|
|
[docs/model_architecture.md](docs/model_architecture.md). For a walkthrough of
|
||
|
|
how the pipeline components fit together, see
|
||
|
|
[docs/pipeline.md](docs/pipeline.md).
|
||
|
|
|
||
|
|
## Prompting Guide
|
||
|
|
|
||
|
|
Ideogram 4 is trained exclusively on **structured JSON captions**. While
|
||
|
|
plain-text prompts work, you will get the best results by providing a JSON
|
||
|
|
object that follows our caption schema.
|
||
|
|
|
||
|
|
|
||
|
|
Key points:
|
||
|
|
|
||
|
|
- **Use JSON prompts** for maximum controllability — the model was trained on
|
||
|
|
them and understands the structure natively.
|
||
|
|
- **Color palette conditioning** — specify a `colour_palette` array of hex
|
||
|
|
colors in the style description to steer the image's color scheme.
|
||
|
|
- **Aspect ratio flexibility** — Ideogram 4 supports a wide range of aspect
|
||
|
|
ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This
|
||
|
|
is a key advantage for practical use: portraits, landscapes, banners,
|
||
|
|
phone wallpapers, social media formats, etc.
|
||
|
|
- **Bounding-box layout** — specify `bbox` coordinates in the prompt to
|
||
|
|
explicitly place subjects, text elements, and background regions.
|
||
|
|
- **Compositional control** — use `compositional_deconstruction` with bounding
|
||
|
|
boxes and per-element descriptions for precise spatial layout.
|
||
|
|
|
||
|
|
|
||
|
|
**Why JSON-only training?** We train exclusively on JSON so that training
|
||
|
|
and inference share a single, common prompt format. The training captions themselves are deliberately
|
||
|
|
**extremely descriptive**: each JSON exhaustively describes everything in
|
||
|
|
the image to maximize training efficiency. The more
|
||
|
|
text-to-image relationships each caption pins down, the more grounded
|
||
|
|
supervision the model extracts from a single training pair, rather than
|
||
|
|
having to infer those relationships across many sparsely-captioned samples.
|
||
|
|
|
||
|
|
**Why JSON at inference time?** Because the model was trained on captions
|
||
|
|
that name every object explicitly, the most reliable way to get every
|
||
|
|
requested object rendered is to mirror that pattern. Plain-text prompts still work, but
|
||
|
|
won't perform as well since the model was only trained on structured JSON captions.
|
||
|
|
|
||
|
|
**Don't want to write JSON by hand?** That's what *magic prompt* is for: it uses
|
||
|
|
an LLM to expand a plain-text prompt into a full structured caption before
|
||
|
|
generation, so you get JSON-quality results from a casual prompt. It runs by
|
||
|
|
default in `run_inference.py` (see the [CLI](#cli) section).
|
||
|
|
|
||
|
|
See [docs/prompting.md](docs/prompting.md) for a full guide.
|
||
|
|
|
||
|
|
## Documentation
|
||
|
|
|
||
|
|
| Document | Description |
|
||
|
|
| :------- | :---------- |
|
||
|
|
| [docs/prompting.md](docs/prompting.md) | How to write JSON prompts, color palette conditioning, aspect ratios |
|
||
|
|
| [docs/inference.md](docs/inference.md) | Sampler presets, parameter reference, resolutions, optimization tips |
|
||
|
|
| [docs/model_architecture.md](docs/model_architecture.md) | Architecture diagram, DiT spec, component details |
|
||
|
|
| [docs/pipeline.md](docs/pipeline.md) | Conceptual pipeline walkthrough — how all components fit together |
|
||
|
|
| [docs/development.md](docs/development.md) | Dev setup, pre-commit hooks, contributing |
|
||
|
|
| [docs/safety.md](docs/safety.md) | Pre-training, post-training, and inference-time safety mitigations; how to report violations |
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
If you find the provided code or models useful for your research, consider citing them as:
|
||
|
|
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@misc{ideogram-4-2026,
|
||
|
|
author={Ideogram AI},
|
||
|
|
title={{Ideogram 4}},
|
||
|
|
year={2026},
|
||
|
|
howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## We're Hiring!
|
||
|
|
|
||
|
|
We're looking for **Research Scientists** and **Research Engineers** to
|
||
|
|
work on next-generation generative models and the products built on top of
|
||
|
|
them. Interested candidates please apply https://jobs.ashbyhq.com/ideogram
|