docs/README.md

<p align="center"><a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><picture>
  <source media="(prefers-color-scheme: dark)" srcset="assets/ideogram_logo_darkmode.svg">
  <source media="(prefers-color-scheme: light)" srcset="assets/ideogram_logo.svg">
  <img src="assets/ideogram_logo.svg" alt="Ideogram" width="500">
</picture></a></p>

<p align="center"><em>Ideogram 4: Open image model at the forefront of design</em></p>

<p align="center">
  <a href="https://ideogram.ai/blog/ideogram-4.0/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Blog-Post-orange" alt="Blog Post"></a>
  <a href="https://github.com/ideogram-oss/ideogram4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Code-GitHub-181717?logo=github" alt="Code"></a>
  <a href="https://huggingface.co/collections/ideogram-ai/ideogram-4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Model-HuggingFace-blue?logo=huggingface" alt="Model"></a>
  <a href="https://developer.ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/API-developer.ideogram.ai-purple" alt="API"></a>
  <a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Official%20Site-ideogram.ai-ff69b4" alt="Official Site"></a>
</p>

<p align="center">
  <img src="assets/samples/collage_landscape.jpg" alt="A collage of Ideogram 4 samples spanning photorealism, illustration, typography, and poster design">
</p>


Ideogram 4 is **[Ideogram](https://ideogram.ai)'s first open-weight text-to-image model**. It is a **state-of-the-art foundation model trained from scratch** — not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at **[ideogram.ai](https://ideogram.ai/)**.

We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence.

## Table of Contents

1. [News](#news)
2. [Model Zoo](#model-zoo)
3. [Performance](#performance)
4. [Quick Start](#quick-start)
5. [Model Summary](#model-summary)
6. [Prompting Guide](#prompting-guide)
7. [Documentation](#documentation)
8. [Citation](#citation)

## News

* **[2026-06-03]** **Ideogram 4 released!** Inference code and weights
  are now public, and our [technical blog post](https://ideogram.ai/blog/ideogram-4.0/) is live. See the
  [Quick Start](#quick-start) section to generate your first image, or try the
  model online at [ideogram.ai](https://ideogram.ai/).

## Model Zoo

| Model | Params | Weight Quantization | Supported Hardware | Diffusers Support | License |
| :---  | :---:  | :---:        | :---:   | :---:   | :---:   |
| **[Ideogram 4 (nf4)](https://huggingface.co/ideogram-ai/ideogram-4-nf4)** | 9.3B | nf4 | CUDA | Yes | [Ideogram 4 Non-Commercial](model_licenses/LICENSE-IDEOGRAM-4-NON-COMMERCIAL) |
| **[Ideogram 4 (fp8)](https://huggingface.co/ideogram-ai/ideogram-4-fp8)** | 9.3B | fp8 | All | No | [Ideogram 4 Non-Commercial](model_licenses/LICENSE-IDEOGRAM-4-NON-COMMERCIAL) |

We plan to support more quantizations in the future.


## Performance

We evaluate Ideogram 4 across third-party arenas and benchmarks, standard
open-source benchmarks, and our own internal human-preference benchmark. Across
all of them, **Ideogram 4 is the best open-weight image model by far, and sits
at the frontier of design.**

### Design Arena

[Design Arena](https://www.designarena.ai/) is a third-party image Elo
leaderboard focused specifically on design-oriented generation. On the overall
board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary
GPT and Gemini models:

<p align="center">
  <img src="assets/benchmarks/design_arena.png" alt="Design Arena overall image Elo leaderboard with Ideogram 4.0 as the top open-weight model">
</p>

Filtered to open-weight models only, Ideogram 4 leads by a commanding margin,
well ahead of the next-best open model:

<p align="center">
  <img src="assets/benchmarks/design_arena2.png" alt="Design Arena open-weight image Elo leaderboard, with Ideogram 4.0 well ahead of all other open models">
</p>

### ContraLabs

[ContraLabs](https://contralabs.com/research) ran a blind typography evaluation judged by
ten professional designers from Contra's top-earning talent. Ideogram 4 leads on
first-place win rate, picked as the best of four models 47.9% of the time
overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%,
FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):

<p align="center">
  <img src="assets/benchmarks/contralabs_typography.png" alt="ContraLabs typography first-place win rate, with Ideogram v4 leading">
</p>

It also wins on practical usability: asked "Would you use this in real client
work?", the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly
above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):

<p align="center">
  <img src="assets/benchmarks/contralabs_typography2.png" alt="ContraLabs 'would you use this in real client work?' rating, with Ideogram v4 leading">
</p>

### LMArena

On [LMArena](https://lmarena.ai/), a third-party text-to-image leaderboard that
measures general-purpose text-to-image use cases, Ideogram is the top-ranked
open-weight lab and a top-5 image generation lab overall — beaten only by giant
companies with vastly larger budgets and resources:

<p align="center">
  <img src="assets/benchmarks/lmarena_benchmark.png" alt="LMArena text-to-image lab leaderboard with Ideogram">
</p>

### Ideogram internal eval

For our internal human-preference benchmark, focused on graphic design and
photography, we had graphic designers deeply familiar with professional design
work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall —
behind only GPT Image 2 medium — and the top open-weight model:

<p align="center">
  <img src="assets/benchmarks/ideogram_benchmark.png" alt="Ideogram internal design leaderboard with Ideogram 4.0">
</p>

### Open-source benchmarks

On standard open-source benchmarks measuring core capabilities — layout control
(7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering
(X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the
leading closed-source models across every axis. On layout control (7Bench), it
is significantly better than all closed-source models:

<p align="center">
  <img src="assets/benchmarks/opensource.png" alt="Five-axis capability radar comparing Ideogram 4.0 to leading closed-source models on layout control, spatial reasoning, object fidelity, prompt alignment, and text rendering">
</p>

At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight
release we benchmarked — ahead of much larger models like Qwen-Image (20B),
FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):

<p align="center">
  <img src="assets/benchmarks/opensource2.png" alt="Parameter-efficiency scatter plot showing Ideogram 4.0 at 9.3B parameters leading all other open-weight models on text rendering">
</p>


## Quick Start

### Install

```bash
pip install .
```

If you plan to modify the code, install in editable mode instead so changes
under `src/ideogram4/` take effect without reinstalling:

```bash
pip install -e .
```

### Model access

The model weights are **gated** on Hugging Face, so you must accept the gate and
authenticate before the code can download them — otherwise the download fails
with a `404` / `GatedRepoError`.

1. Open the model page — [ideogram-ai/ideogram-4-nf4](https://huggingface.co/ideogram-ai/ideogram-4-nf4)
   (or [ideogram-ai/ideogram-4-fp8](https://huggingface.co/ideogram-ai/ideogram-4-fp8)) — and click
   **Agree and access repository** to accept the license gate.
2. Create a Hugging Face access token at
   [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) and log in so the
   download is authenticated:

   ```bash
   hf auth login
   ```

   Alternatively, export the token directly: `export HF_TOKEN="hf_..."`.

### CLI

The plain `--prompt` is rewritten into the structured JSON caption the model
expects by a "magic prompt" LLM. By default this uses Ideogram's hosted
magic-prompt API, which is **free** and does the expansion server-side (no local
model or system prompt needed). It reads `IDEOGRAM_API_KEY` — get a key at
https://developer.ideogram.ai/:

```bash
python run_inference.py \
  --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$IDEOGRAM_API_KEY"
```

You can also run the expansion through your own LLM provider — one of our magic-prompt
system prompt is **open source**. See the
[Prompting Guide](docs/prompting.md#magic-prompt) for details.

For the highest-quality images, set `--height 2048 --width 2048` and
`--sampler-preset V4_QUALITY_48`.

#### Safety screening with Hive

Prompt and output safety screening is performed via [Hive](https://thehive.ai/).
Sign up and create a Text Moderation key and a Visual Content Moderation key,
then export them as `HIVE_TEXT_MODERATION_KEY` and `HIVE_VISUAL_MODERATION_KEY`
(or pass them via `--hive-text-key` / `--hive-visual-key`).

```bash
python run_inference.py \
  --prompt "an isometric illustration of a tiny city floating in the clouds" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$MAGIC_PROMPT_API_KEY" \
  --hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
  --hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"
```

For sampler presets, parameter reference, and optimization tips, see
[docs/inference.md](docs/inference.md).

## Model Summary

Ideogram 4 is a **foundation model trained entirely from scratch**, not a
fine-tune or distillation of any existing checkpoint. It is a flow-matching
text-to-image model built on a **fully single-stream** Diffusion Transformer
(DiT) architecture.

**Architecture:**
- **Fully single-stream DiT.** Text and image tokens are concatenated into one
  unified sequence and processed through the same 34-layer transformer, with no
  separate text or image branches. This enables deep cross-modal interaction at
  every layer.
- **Vision-language model as text encoder.** Instead of a text-only encoder
  like CLIP or T5, Ideogram 4 uses
  [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct),
  a full vision-language model that provides far richer understanding of visual
  concepts. Hidden states are extracted from **13 intermediate layers** and
  concatenated, giving the model multi-scale semantic features ranging from
  surface-level token information to deep compositional understanding.
- **Dual-branch classifier-free guidance.** The conditional (positive) and
  unconditional (negative) branches can be independently refined, enabling
  separate control over prompt adherence and image quality.
- **Flexible resolution.** Native support for any resolution from 256 to 2048
  (multiples of 16), with aspect ratios up to 6:1. A single model handles
  everything from square thumbnails to ultrawide banners, with the noise
  schedule auto-adjusting per resolution.

**Key Capabilities:**
- **Extreme controllability.** Ideogram 4 is trained on structured JSON
  captions, giving users unprecedented control over composition, style,
  lighting, color palette, typography, and spatial layout, all from a single
  prompt.
- **State-of-the-art text rendering.** Ideogram 4 delivers best-in-class
  in-image text generation (signage, logos, captions, watermarks, multi-line
  text) with high fidelity directly from the prompt.
- **Spatial layout control.** Bounding-box coordinates in the prompt allow
  explicit placement of subjects, text elements, and background regions.
- **Color palette conditioning.** Specify hex colors in the prompt to steer the
  image's dominant color scheme.

For full architecture details, see
[docs/model_architecture.md](docs/model_architecture.md). For a walkthrough of
how the pipeline components fit together, see
[docs/pipeline.md](docs/pipeline.md).

## Prompting Guide

Ideogram 4 is trained exclusively on **structured JSON captions**. While
plain-text prompts work, you will get the best results by providing a JSON
object that follows our caption schema.


Key points:

- **Use JSON prompts** for maximum controllability — the model was trained on
  them and understands the structure natively.
- **Color palette conditioning** — specify a `colour_palette` array of hex
  colors in the style description to steer the image's color scheme.
- **Aspect ratio flexibility** — Ideogram 4 supports a wide range of aspect
  ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This
  is a key advantage for practical use: portraits, landscapes, banners,
  phone wallpapers, social media formats, etc.
- **Bounding-box layout** — specify `bbox` coordinates in the prompt to
  explicitly place subjects, text elements, and background regions.
- **Compositional control** — use `compositional_deconstruction` with bounding
  boxes and per-element descriptions for precise spatial layout.


**Why JSON-only training?** We train exclusively on JSON so that training
and inference share a single, common prompt format. The training captions themselves are deliberately
**extremely descriptive**: each JSON exhaustively describes everything in
the image to maximize training efficiency. The more
text-to-image relationships each caption pins down, the more grounded
supervision the model extracts from a single training pair, rather than
having to infer those relationships across many sparsely-captioned samples.

**Why JSON at inference time?** Because the model was trained on captions
that name every object explicitly, the most reliable way to get every
requested object rendered is to mirror that pattern. Plain-text prompts still work, but
won't perform as well since the model was only trained on structured JSON captions.

**Don't want to write JSON by hand?** That's what *magic prompt* is for: it uses
an LLM to expand a plain-text prompt into a full structured caption before
generation, so you get JSON-quality results from a casual prompt. It runs by
default in `run_inference.py` (see the [CLI](#cli) section).

See [docs/prompting.md](docs/prompting.md) for a full guide.

## Documentation

| Document | Description |
| :------- | :---------- |
| [docs/prompting.md](docs/prompting.md) | How to write JSON prompts, color palette conditioning, aspect ratios |
| [docs/inference.md](docs/inference.md) | Sampler presets, parameter reference, resolutions, optimization tips |
| [docs/model_architecture.md](docs/model_architecture.md) | Architecture diagram, DiT spec, component details |
| [docs/pipeline.md](docs/pipeline.md) | Conceptual pipeline walkthrough — how all components fit together |
| [docs/development.md](docs/development.md) | Dev setup, pre-commit hooks, contributing |
| [docs/safety.md](docs/safety.md) | Pre-training, post-training, and inference-time safety mitigations; how to report violations |

## Citation

If you find the provided code or models useful for your research, consider citing them as:


```bibtex
@misc{ideogram-4-2026,
    author={Ideogram AI},
    title={{Ideogram 4}},
    year={2026},
    howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}},
}
```

## We're Hiring!

We're looking for **Research Scientists** and **Research Engineers** to
work on next-generation generative models and the products built on top of
them. Interested candidates please apply https://jobs.ashbyhq.com/ideogram
Initial commit: Ideogram 4 Prompt Builder 2026-06-13 16:36:27 +08:00			`<p align="center"><a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><picture>`
			`<source media="(prefers-color-scheme: dark)" srcset="assets/ideogram_logo_darkmode.svg">`
			`<source media="(prefers-color-scheme: light)" srcset="assets/ideogram_logo.svg">`
			`<img src="assets/ideogram_logo.svg" alt="Ideogram" width="500">`
			`</picture></a></p>`

			`<p align="center"><em>Ideogram 4: Open image model at the forefront of design</em></p>`

			`<p align="center">`
			`<a href="https://ideogram.ai/blog/ideogram-4.0/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Blog-Post-orange" alt="Blog Post"></a>`
			`<a href="https://github.com/ideogram-oss/ideogram4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Code-GitHub-181717?logo=github" alt="Code"></a>`
			`<a href="https://huggingface.co/collections/ideogram-ai/ideogram-4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Model-HuggingFace-blue?logo=huggingface" alt="Model"></a>`
			`<a href="https://developer.ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/API-developer.ideogram.ai-purple" alt="API"></a>`
			`<a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Official%20Site-ideogram.ai-ff69b4" alt="Official Site"></a>`
			`</p>`

			`<p align="center">`
			`<img src="assets/samples/collage_landscape.jpg" alt="A collage of Ideogram 4 samples spanning photorealism, illustration, typography, and poster design">`
			`</p>`


			Ideogram 4 is [Ideogram](https://ideogram.ai)'s first open-weight text-to-image model. It is a state-of-the-art foundation model trained from scratch — not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at [ideogram.ai](https://ideogram.ai/).

			`We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence.`

			`## Table of Contents`

			`1. [News](#news)`
			`2. [Model Zoo](#model-zoo)`
			`3. [Performance](#performance)`
			`4. [Quick Start](#quick-start)`
			`5. [Model Summary](#model-summary)`
			`6. [Prompting Guide](#prompting-guide)`
			`7. [Documentation](#documentation)`
			`8. [Citation](#citation)`

			`## News`

			`* [2026-06-03] Ideogram 4 released! Inference code and weights`
			`are now public, and our [technical blog post](https://ideogram.ai/blog/ideogram-4.0/) is live. See the`
			`[Quick Start](#quick-start) section to generate your first image, or try the`
			`model online at [ideogram.ai](https://ideogram.ai/).`

			`## Model Zoo`

			`\| Model \| Params \| Weight Quantization \| Supported Hardware \| Diffusers Support \| License \|`
			`\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \|`
			`\| [Ideogram 4 (nf4)](https://huggingface.co/ideogram-ai/ideogram-4-nf4) \| 9.3B \| nf4 \| CUDA \| Yes \| [Ideogram 4 Non-Commercial](model_licenses/LICENSE-IDEOGRAM-4-NON-COMMERCIAL) \|`
			`\| [Ideogram 4 (fp8)](https://huggingface.co/ideogram-ai/ideogram-4-fp8) \| 9.3B \| fp8 \| All \| No \| [Ideogram 4 Non-Commercial](model_licenses/LICENSE-IDEOGRAM-4-NON-COMMERCIAL) \|`

			`We plan to support more quantizations in the future.`


			`## Performance`

			`We evaluate Ideogram 4 across third-party arenas and benchmarks, standard`
			`open-source benchmarks, and our own internal human-preference benchmark. Across`
			`all of them, **Ideogram 4 is the best open-weight image model by far, and sits`
			`at the frontier of design.**`

			`### Design Arena`

			`[Design Arena](https://www.designarena.ai/) is a third-party image Elo`
			`leaderboard focused specifically on design-oriented generation. On the overall`
			`board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary`
			`GPT and Gemini models:`

			`<p align="center">`
			`<img src="assets/benchmarks/design_arena.png" alt="Design Arena overall image Elo leaderboard with Ideogram 4.0 as the top open-weight model">`
			`</p>`

			`Filtered to open-weight models only, Ideogram 4 leads by a commanding margin,`
			`well ahead of the next-best open model:`

			`<p align="center">`
			`<img src="assets/benchmarks/design_arena2.png" alt="Design Arena open-weight image Elo leaderboard, with Ideogram 4.0 well ahead of all other open models">`
			`</p>`

			`### ContraLabs`

			`[ContraLabs](https://contralabs.com/research) ran a blind typography evaluation judged by`
			`ten professional designers from Contra's top-earning talent. Ideogram 4 leads on`
			`first-place win rate, picked as the best of four models 47.9% of the time`
			`overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%,`
			`FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):`

			`<p align="center">`
			`<img src="assets/benchmarks/contralabs_typography.png" alt="ContraLabs typography first-place win rate, with Ideogram v4 leading">`
			`</p>`

			`It also wins on practical usability: asked "Would you use this in real client`
			`work?", the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly`
			`above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):`

			`<p align="center">`
			`<img src="assets/benchmarks/contralabs_typography2.png" alt="ContraLabs 'would you use this in real client work?' rating, with Ideogram v4 leading">`
			`</p>`

			`### LMArena`

			`On [LMArena](https://lmarena.ai/), a third-party text-to-image leaderboard that`
			`measures general-purpose text-to-image use cases, Ideogram is the top-ranked`
			`open-weight lab and a top-5 image generation lab overall — beaten only by giant`
			`companies with vastly larger budgets and resources:`

			`<p align="center">`
			`<img src="assets/benchmarks/lmarena_benchmark.png" alt="LMArena text-to-image lab leaderboard with Ideogram">`
			`</p>`

			`### Ideogram internal eval`

			`For our internal human-preference benchmark, focused on graphic design and`
			`photography, we had graphic designers deeply familiar with professional design`
			`work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall —`
			`behind only GPT Image 2 medium — and the top open-weight model:`

			`<p align="center">`
			`<img src="assets/benchmarks/ideogram_benchmark.png" alt="Ideogram internal design leaderboard with Ideogram 4.0">`
			`</p>`

			`### Open-source benchmarks`

			`On standard open-source benchmarks measuring core capabilities — layout control`
			`(7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering`
			`(X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the`
			`leading closed-source models across every axis. On layout control (7Bench), it`
			`is significantly better than all closed-source models:`

			`<p align="center">`
			`<img src="assets/benchmarks/opensource.png" alt="Five-axis capability radar comparing Ideogram 4.0 to leading closed-source models on layout control, spatial reasoning, object fidelity, prompt alignment, and text rendering">`
			`</p>`

			`At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight`
			`release we benchmarked — ahead of much larger models like Qwen-Image (20B),`
			`FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):`

			`<p align="center">`
			`<img src="assets/benchmarks/opensource2.png" alt="Parameter-efficiency scatter plot showing Ideogram 4.0 at 9.3B parameters leading all other open-weight models on text rendering">`
			`</p>`


			`## Quick Start`

			`### Install`

			```bash
			`pip install .`
			```

			`If you plan to modify the code, install in editable mode instead so changes`
			under `src/ideogram4/` take effect without reinstalling:

			```bash
			`pip install -e .`
			```

			`### Model access`

			`The model weights are gated on Hugging Face, so you must accept the gate and`
			`authenticate before the code can download them — otherwise the download fails`
			with a `404` / `GatedRepoError`.

			`1. Open the model page — [ideogram-ai/ideogram-4-nf4](https://huggingface.co/ideogram-ai/ideogram-4-nf4)`
			`(or [ideogram-ai/ideogram-4-fp8](https://huggingface.co/ideogram-ai/ideogram-4-fp8)) — and click`
			`Agree and access repository to accept the license gate.`
			`2. Create a Hugging Face access token at`
			`[huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) and log in so the`
			`download is authenticated:`

			```bash
			`hf auth login`
			```

			Alternatively, export the token directly: `export HF_TOKEN="hf_..."`.

			`### CLI`

			The plain `--prompt` is rewritten into the structured JSON caption the model
			`expects by a "magic prompt" LLM. By default this uses Ideogram's hosted`
			`magic-prompt API, which is free and does the expansion server-side (no local`
			model or system prompt needed). It reads `IDEOGRAM_API_KEY` — get a key at
			`https://developer.ideogram.ai/:`

			```bash
			`python run_inference.py \`
			`--prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \`
			`--output out.png \`
			`--quantization "nf4" \`
			`--magic-prompt-key "$IDEOGRAM_API_KEY"`
			```

			`You can also run the expansion through your own LLM provider — one of our magic-prompt`
			`system prompt is open source. See the`
			`[Prompting Guide](docs/prompting.md#magic-prompt) for details.`

			For the highest-quality images, set `--height 2048 --width 2048` and
			`--sampler-preset V4_QUALITY_48`.

			`#### Safety screening with Hive`

			`Prompt and output safety screening is performed via [Hive](https://thehive.ai/).`
			`Sign up and create a Text Moderation key and a Visual Content Moderation key,`
			then export them as `HIVE_TEXT_MODERATION_KEY` and `HIVE_VISUAL_MODERATION_KEY`
			(or pass them via `--hive-text-key` / `--hive-visual-key`).

			```bash
			`python run_inference.py \`
			`--prompt "an isometric illustration of a tiny city floating in the clouds" \`
			`--output out.png \`
			`--quantization "nf4" \`
			`--magic-prompt-key "$MAGIC_PROMPT_API_KEY" \`
			`--hive-text-key "$HIVE_TEXT_MODERATION_KEY" \`
			`--hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"`
			```

			`For sampler presets, parameter reference, and optimization tips, see`
			`[docs/inference.md](docs/inference.md).`

			`## Model Summary`

			`Ideogram 4 is a foundation model trained entirely from scratch, not a`
			`fine-tune or distillation of any existing checkpoint. It is a flow-matching`
			`text-to-image model built on a fully single-stream Diffusion Transformer`
			`(DiT) architecture.`

			`Architecture:`
			`- Fully single-stream DiT. Text and image tokens are concatenated into one`
			`unified sequence and processed through the same 34-layer transformer, with no`
			`separate text or image branches. This enables deep cross-modal interaction at`
			`every layer.`
			`- Vision-language model as text encoder. Instead of a text-only encoder`
			`like CLIP or T5, Ideogram 4 uses`
			`[Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct),`
			`a full vision-language model that provides far richer understanding of visual`
			`concepts. Hidden states are extracted from 13 intermediate layers and`
			`concatenated, giving the model multi-scale semantic features ranging from`
			`surface-level token information to deep compositional understanding.`
			`- Dual-branch classifier-free guidance. The conditional (positive) and`
			`unconditional (negative) branches can be independently refined, enabling`
			`separate control over prompt adherence and image quality.`
			`- Flexible resolution. Native support for any resolution from 256 to 2048`
			`(multiples of 16), with aspect ratios up to 6:1. A single model handles`
			`everything from square thumbnails to ultrawide banners, with the noise`
			`schedule auto-adjusting per resolution.`

			`Key Capabilities:`
			`- Extreme controllability. Ideogram 4 is trained on structured JSON`
			`captions, giving users unprecedented control over composition, style,`
			`lighting, color palette, typography, and spatial layout, all from a single`
			`prompt.`
			`- State-of-the-art text rendering. Ideogram 4 delivers best-in-class`
			`in-image text generation (signage, logos, captions, watermarks, multi-line`
			`text) with high fidelity directly from the prompt.`
			`- Spatial layout control. Bounding-box coordinates in the prompt allow`
			`explicit placement of subjects, text elements, and background regions.`
			`- Color palette conditioning. Specify hex colors in the prompt to steer the`
			`image's dominant color scheme.`

			`For full architecture details, see`
			`[docs/model_architecture.md](docs/model_architecture.md). For a walkthrough of`
			`how the pipeline components fit together, see`
			`[docs/pipeline.md](docs/pipeline.md).`

			`## Prompting Guide`

			`Ideogram 4 is trained exclusively on structured JSON captions. While`
			`plain-text prompts work, you will get the best results by providing a JSON`
			`object that follows our caption schema.`


			`Key points:`

			`- Use JSON prompts for maximum controllability — the model was trained on`
			`them and understands the structure natively.`
			- Color palette conditioning — specify a `colour_palette` array of hex
			`colors in the style description to steer the image's color scheme.`
			`- Aspect ratio flexibility — Ideogram 4 supports a wide range of aspect`
			`ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This`
			`is a key advantage for practical use: portraits, landscapes, banners,`
			`phone wallpapers, social media formats, etc.`
			- Bounding-box layout — specify `bbox` coordinates in the prompt to
			`explicitly place subjects, text elements, and background regions.`
			- Compositional control — use `compositional_deconstruction` with bounding
			`boxes and per-element descriptions for precise spatial layout.`


			`Why JSON-only training? We train exclusively on JSON so that training`
			`and inference share a single, common prompt format. The training captions themselves are deliberately`
			`extremely descriptive: each JSON exhaustively describes everything in`
			`the image to maximize training efficiency. The more`
			`text-to-image relationships each caption pins down, the more grounded`
			`supervision the model extracts from a single training pair, rather than`
			`having to infer those relationships across many sparsely-captioned samples.`

			`Why JSON at inference time? Because the model was trained on captions`
			`that name every object explicitly, the most reliable way to get every`
			`requested object rendered is to mirror that pattern. Plain-text prompts still work, but`
			`won't perform as well since the model was only trained on structured JSON captions.`

			`Don't want to write JSON by hand? That's what magic prompt is for: it uses`
			`an LLM to expand a plain-text prompt into a full structured caption before`
			`generation, so you get JSON-quality results from a casual prompt. It runs by`
			default in `run_inference.py` (see the [CLI](#cli) section).

			`See [docs/prompting.md](docs/prompting.md) for a full guide.`

			`## Documentation`

			`\| Document \| Description \|`
			`\| :------- \| :---------- \|`
			`\| [docs/prompting.md](docs/prompting.md) \| How to write JSON prompts, color palette conditioning, aspect ratios \|`
			`\| [docs/inference.md](docs/inference.md) \| Sampler presets, parameter reference, resolutions, optimization tips \|`
			`\| [docs/model_architecture.md](docs/model_architecture.md) \| Architecture diagram, DiT spec, component details \|`
			`\| [docs/pipeline.md](docs/pipeline.md) \| Conceptual pipeline walkthrough — how all components fit together \|`
			`\| [docs/development.md](docs/development.md) \| Dev setup, pre-commit hooks, contributing \|`
			`\| [docs/safety.md](docs/safety.md) \| Pre-training, post-training, and inference-time safety mitigations; how to report violations \|`

			`## Citation`

			`If you find the provided code or models useful for your research, consider citing them as:`


			```bibtex
			`@misc{ideogram-4-2026,`
			`author={Ideogram AI},`
			`title={{Ideogram 4}},`
			`year={2026},`
			`howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}},`
			`}`
			```

			`## We're Hiring!`

			`We're looking for Research Scientists and Research Engineers to`
			`work on next-generation generative models and the products built on top of`
			`them. Interested candidates please apply https://jobs.ashbyhq.com/ideogram`