a5c319a1fc
PyQt6 desktop app for building Ideogram 4 JSON captions: bbox canvas, palette editor, presets, prompt library with previews, localisation (en/ru), light/dark themes, and ComfyUI dependency check + generation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
363 lines
17 KiB
Markdown
363 lines
17 KiB
Markdown
# Prompting Guide
|
||
|
||
Ideogram 4 is trained exclusively on **structured JSON captions** (represented as string type). While the
|
||
model can accept plain-text prompts, providing a JSON object that follows the
|
||
caption schema gives significantly better results, especially for
|
||
controllability, spatial layout, and style fidelity.
|
||
|
||
## Plain-text vs. JSON prompts
|
||
|
||
You can pass in plain-text prompts directly to the model and it will work. The
|
||
sampling parameters come from a named preset in `ideogram4.PRESETS` (the same
|
||
ones `run_inference.py` exposes via `--sampler-preset`), unpacked into the
|
||
`pipe()` call:
|
||
|
||
```python
|
||
from ideogram4 import PRESETS
|
||
|
||
preset = PRESETS["V4_QUALITY_48"]
|
||
images = pipe(
|
||
"a golden retriever on a skateboard",
|
||
height=1024,
|
||
width=1024,
|
||
num_steps=preset.num_steps,
|
||
guidance_schedule=preset.guidance_schedule,
|
||
mu=preset.mu,
|
||
std=preset.std,
|
||
)
|
||
```
|
||
|
||
|
||
But for higher quality image generations and more control, pass a JSON string as the prompt:
|
||
|
||
```python
|
||
import json
|
||
from ideogram4 import PRESETS
|
||
|
||
caption = {
|
||
"high_level_description": "A golden retriever riding a skateboard down a sunny sidewalk.",
|
||
"style_description": {
|
||
"aesthetics": "warm, playful, vibrant",
|
||
"lighting": "bright afternoon sunlight, long soft shadows",
|
||
"photo": "shallow depth of field, eye-level, 85mm lens",
|
||
"medium": "photograph",
|
||
"color_palette": ["#F5C542", "#87CEEB", "#4A4A4A", "#FFFFFF", "#2E8B57"]
|
||
},
|
||
"compositional_deconstruction": {
|
||
"background": "A sun-drenched suburban sidewalk lined with green hedges and a white picket fence. Dappled light filters through overhead trees.",
|
||
"elements": [
|
||
{"type": "obj", "bbox": [200, 300, 800, 900], "desc": "A golden retriever with a fluffy coat, standing on a red skateboard with all four paws. Its tongue is out and ears are flapping in the wind."},
|
||
{"type": "obj", "bbox": [250, 750, 750, 950], "desc": "A worn red skateboard with black wheels rolling along the concrete sidewalk."}
|
||
]
|
||
}
|
||
}
|
||
|
||
preset = PRESETS["V4_QUALITY_48"]
|
||
images = pipe(
|
||
json.dumps(caption, separators=(",", ":"), ensure_ascii=False),
|
||
height=1024,
|
||
width=1024,
|
||
num_steps=preset.num_steps,
|
||
guidance_schedule=preset.guidance_schedule,
|
||
mu=preset.mu,
|
||
std=preset.std,
|
||
)
|
||
```
|
||
|
||
## Magic prompt
|
||
|
||
Writing these captions by hand is optional. *Magic prompt* uses an LLM to expand
|
||
a plain-text prompt into a full structured caption for you, so you get the
|
||
quality of a JSON prompt from a casual one. It is enabled by default in
|
||
`run_inference.py`; you can also call it directly:
|
||
|
||
```python
|
||
import os
|
||
from ideogram4 import ClaudeOpusMagicPromptV1, PRESETS
|
||
|
||
magic = ClaudeOpusMagicPromptV1(api_key=os.environ["MAGIC_PROMPT_API_KEY"])
|
||
caption = magic.expand("a golden retriever on a skateboard", aspect_ratio="1:1")
|
||
preset = PRESETS["V4_QUALITY_48"]
|
||
images = pipe(
|
||
caption,
|
||
height=1024,
|
||
width=1024,
|
||
num_steps=preset.num_steps,
|
||
guidance_schedule=preset.guidance_schedule,
|
||
mu=preset.mu,
|
||
std=preset.std,
|
||
)
|
||
```
|
||
|
||
The package ships three configurations, registered by name in
|
||
`ideogram4.MAGIC_PROMPTS` (the keys `run_inference.py` accepts via
|
||
`--magic-prompt-model`):
|
||
|
||
| Config class | Registry key | Backend |
|
||
| :--- | :--- | :--- |
|
||
| `Ideogram4MagicPromptV1` | `ideogram-4-v1` | Ideogram's hosted magic-prompt API (free; reads `IDEOGRAM_API_KEY`) |
|
||
| `ClaudeOpusMagicPromptV1` | `claude-opus-v1` | [OpenRouter](https://openrouter.ai) (reads `MAGIC_PROMPT_API_KEY`) |
|
||
| `ClaudeSonnetMagicPromptV1` | `claude-sonnet-v1` | [OpenRouter](https://openrouter.ai) (reads `MAGIC_PROMPT_API_KEY`) |
|
||
|
||
`ideogram-4-v1` is the default and is **free**. It runs the expansion
|
||
server-side, so there is no local model or system prompt involved — it just needs
|
||
an Ideogram API key (get one at
|
||
[developer.ideogram.ai](https://developer.ideogram.ai)). The `claude-*`
|
||
configurations instead send one of our open-source system prompt to an OpenRouter model;
|
||
select one with `--magic-prompt-model` and export `MAGIC_PROMPT_API_KEY`:
|
||
|
||
```bash
|
||
python run_inference.py \
|
||
--prompt "an isometric illustration of a tiny city floating in the clouds" \
|
||
--output out.png \
|
||
--quantization "nf4" \
|
||
--magic-prompt-model claude-opus-v1 \
|
||
--magic-prompt-key "$MAGIC_PROMPT_API_KEY"
|
||
```
|
||
|
||
See the README's [CLI](../README.md#cli) section for the rest of the flags.
|
||
|
||
Our magic-prompt system prompts are **open source** (they ship in
|
||
`src/ideogram4/magic_prompt_system_prompts/`), so you're also welcome to
|
||
construct the caption with any system prompt and LLM of your choosing.
|
||
|
||
**A few caveats:**
|
||
|
||
- At Ideogram we've tested this magic prompt with **Claude Opus**. You're welcome
|
||
to implement your own `MagicPrompt` configurations and/or drive a different LLM
|
||
with our system prompt, but those paths aren't tested by us and quality may
|
||
vary.
|
||
- The magic prompt shipped here is **not** the same magic prompt used in
|
||
production at [Ideogram.ai](https://ideogram.ai) — results will differ from the
|
||
hosted product (including the `ideogram-4-v1` API).
|
||
|
||
## JSON caption schema
|
||
|
||
> **Note:** Following this schema is **not required** — the model accepts any
|
||
> string as a prompt. The schema below describes the exact structure the model
|
||
> was trained on, and matching it minimizes train/eval mismatch so the model
|
||
> generates closer to its full quality. Treat the "required" / "must" language
|
||
> in the rest of this section as the format the [`CaptionVerifier`](../src/ideogram4/caption_verifier.py)
|
||
> checks against, not as a hard pipeline constraint. Deviating from the schema
|
||
> is allowed; it just means you're sampling outside the training distribution.
|
||
|
||
The full caption schema has three top-level fields:
|
||
|
||
1. `high_level_description` — optional string, but strongly recommended.
|
||
2. `style_description` — optional object.
|
||
3. `compositional_deconstruction` — **required** object.
|
||
|
||
`compositional_deconstruction` must always be present. Within it, both
|
||
`background` and `elements` are required.
|
||
|
||
### `high_level_description`
|
||
|
||
A one- or two-sentence summary of the entire image. Strongly recommended in every prompt.
|
||
|
||
```json
|
||
"high_level_description": "A medium-shot photograph of a barista pouring latte art in a cozy cafe."
|
||
```
|
||
|
||
### `style_description`
|
||
|
||
Controls the visual style, lighting, medium, and color palette.
|
||
|
||
`style_description` must contain **exactly one** of:
|
||
|
||
- `photo` — for photographic captions (paired with `medium: "photograph"`).
|
||
- `art_style` — for non-photographic captions (illustration, painting, 3D render, etc.).
|
||
|
||
`aesthetics`, `lighting`, and `medium` are also required when `style_description` is present. `color_palette` is optional.
|
||
|
||
**Key order is strict** and depends on which of `photo` / `art_style` is used:
|
||
|
||
| Caption type | Required key order |
|
||
| :----------- | :----------------- |
|
||
| Photo (uses `photo`) | `aesthetics`, `lighting`, `photo`, `medium`, `color_palette` |
|
||
| Non-photo (uses `art_style`) | `aesthetics`, `lighting`, `medium`, `art_style`, `color_palette` |
|
||
|
||
`color_palette` is the only field in this list that may be omitted; if it is included it must remain in the final position.
|
||
|
||
Field descriptions:
|
||
|
||
| Field | Type | Description |
|
||
| :---- | :--- | :---------- |
|
||
| `aesthetics` | string | Aesthetic keywords (e.g. "moody, cinematic, desaturated") |
|
||
| `lighting` | string | Lighting description (e.g. "golden hour, rim light, dramatic shadows") |
|
||
| `photo` | string | Camera/lens details for photographic outputs (e.g. "35mm, f/1.4, bokeh"). Use this OR `art_style`, not both. |
|
||
| `medium` | string | Medium type: `"photograph"`, `"illustration"`, `"3d_render"`, `"painting"`, `"graphic_design"`, etc. |
|
||
| `art_style` | string | Art style description for non-photo captions (e.g. "flat vector illustration, bold outlines"). Use this OR `photo`, not both. |
|
||
| `color_palette` | list[str] | Hex color codes that steer the image's dominant colors. Up to 16 entries. |
|
||
|
||
### `compositional_deconstruction`
|
||
|
||
Provides fine-grained spatial control over the image layout using bounding
|
||
boxes and per-element descriptions. Both fields below are required.
|
||
|
||
| Field | Type | Description |
|
||
| :---- | :--- | :---------- |
|
||
| `background` | string | Description of the background/environment (required) |
|
||
| `elements` | list[dict] | List of elements with optional bounding boxes (required) |
|
||
|
||
`background` must come before `elements`.
|
||
|
||
Each element in `elements` must follow a fixed **key order** depending on its
|
||
type. `bbox` and `color_palette` are optional within an element; if present they
|
||
must appear in the positions shown below.
|
||
|
||
| Type | Required key order |
|
||
| :--- | :----------------- |
|
||
| `"obj"` | `type`, `bbox`, `desc`, `color_palette` |
|
||
| `"text"` | `type`, `bbox`, `text`, `desc`, `color_palette` |
|
||
|
||
Field descriptions:
|
||
|
||
| Field | Type | Description |
|
||
| :---- | :--- | :---------- |
|
||
| `type` | string | `"obj"` for objects/subjects, `"text"` for in-image text |
|
||
| `bbox` | list[int] | `[y_min, x_min, y_max, x_max]` in normalized `0–1000` coordinates (origin at top-left). Optional. |
|
||
| `desc` | string | Detailed description of the element |
|
||
| `text` | string | (only for `type: "text"`) The literal text to render |
|
||
| `color_palette` | list[str] | Optional per-element palette. Up to 5 hex entries. |
|
||
|
||
**Key ordering matters.** The model was trained on JSON with a consistent key
|
||
order, so maintaining it improves generation quality. The pipeline runs
|
||
[`CaptionVerifier`](../src/ideogram4/caption_verifier.py) on every prompt and emits
|
||
warnings for unknown keys, missing required keys, or out-of-order keys.
|
||
|
||
**Hex color format.** Colors in `color_palette` must be uppercase
|
||
`#RRGGBB` strings (e.g. `#1B1B2F`, not `#1b1b2f` or `#fff`).
|
||
|
||
**Encoding.** When serializing with Python's `json` module, pass
|
||
`separators=(",", ":")` and `ensure_ascii=False`.
|
||
`CaptionVerifier` warns when it detects `\uXXXX` escapes with no literal
|
||
non-ASCII characters in the raw text.
|
||
|
||
## Color palette conditioning
|
||
|
||
One of Ideogram 4's distinctive features is **color palette control**. By
|
||
providing a `color_palette` array of hex colors in `style_description`, you
|
||
can steer the dominant colors of the generated image.
|
||
|
||
```json
|
||
"style_description": {
|
||
"aesthetics": "moody, cinematic",
|
||
"lighting": "low-key, deep shadows",
|
||
"photo": "35mm, f/1.4",
|
||
"medium": "photograph",
|
||
"color_palette": ["#1B1B2F", "#162447", "#1F4068", "#E43F5A", "#F5F5F5"]
|
||
}
|
||
```
|
||
|
||
Tips for effective color palette use:
|
||
|
||
- **Up to 16 colors** in `style_description.color_palette` for the overall
|
||
image palette, and **up to 5 colors** per element in
|
||
`compositional_deconstruction.elements[*].color_palette`.
|
||
- **Include background colors** — if you want a dark background, include the
|
||
dark hex in the palette.
|
||
- **Contrast pairs** — include both your highlight and shadow colors for more
|
||
controlled lighting.
|
||
- **Uppercase hex only** — `#RRGGBB` form, no shorthand.
|
||
|
||
### Example: warm sunset palette
|
||
|
||
```json
|
||
{
|
||
"high_level_description": "A lone sailboat on calm water at sunset.",
|
||
"style_description": {
|
||
"aesthetics": "serene, warm, golden hour",
|
||
"lighting": "golden hour backlighting, warm atmospheric haze",
|
||
"photo": "wide angle, f/8, long exposure",
|
||
"medium": "photograph",
|
||
"color_palette": ["#FF6B35", "#F7C59F", "#004E89", "#1A659E", "#2B2D42"]
|
||
},
|
||
"compositional_deconstruction": {
|
||
"background": "A calm ocean stretching to a low horizon, sky washed in orange and pink with thin wisps of cloud.",
|
||
"elements": [
|
||
{"type": "obj", "desc": "A single sailboat with a white triangular sail, silhouetted against the setting sun."}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
|
||
### Example: corporate design palette
|
||
|
||
```json
|
||
{
|
||
"high_level_description": "A clean, modern business card layout for a tech company.",
|
||
"style_description": {
|
||
"aesthetics": "minimal, professional, geometric",
|
||
"lighting": "even, diffuse studio lighting",
|
||
"medium": "graphic_design",
|
||
"art_style": "flat vector design, generous whitespace, sans-serif typography",
|
||
"color_palette": ["#FFFFFF", "#F0F0F0", "#333333", "#0066FF", "#00CC88"]
|
||
},
|
||
"compositional_deconstruction": {
|
||
"background": "A solid off-white card surface with subtle paper texture.",
|
||
"elements": [
|
||
{"type": "text", "text": "ACME TECH", "desc": "Bold dark grey sans-serif company name across the upper third of the card."},
|
||
{"type": "text", "text": "hello@acme.tech", "desc": "Small blue sans-serif contact email near the bottom of the card."}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
|
||
|
||
## Full example
|
||
|
||
```json
|
||
{
|
||
"high_level_description": "A medium-shot photograph of Formula 1 driver Max Verstappen wearing his Red Bull Racing racing suit and cap, smiling as he holds his racing helmet and talks to a man in a white shirt and black vest at a race track.",
|
||
"style_description": {
|
||
"aesthetics": "saturated primary colors, rule of thirds, joyful and triumphant",
|
||
"lighting": "overcast daylight, diffused, soft subtle shadows",
|
||
"photo": "shallow depth of field, sharp focus, eye-level, telephoto",
|
||
"medium": "photograph"
|
||
},
|
||
"compositional_deconstruction": {
|
||
"background": "The background is an out-of-focus racing paddock or track environment. Several blurred figures are visible, including one in an orange shirt. A purple and white structure with a red 'F1' logo stands on the left. The scene is outdoors with daylight, though the sky is not visible.",
|
||
"elements": [
|
||
{"type": "obj", "bbox": [55, 642, 1000, 937], "desc": "An older man standing in profile, facing left toward Max Verstappen. He has grey hair and fair skin. He is wearing a white long-sleeved button-down shirt with a navy blue quilted vest over it. He has a slight smile."},
|
||
{"type": "obj", "bbox": [34, 137, 1000, 617], "desc": "Max Verstappen, a fair-skinned male Formula 1 driver, positioned in the center. He is facing forward with a joyful expression and a slight smile. He wears a navy blue Red Bull Racing team uniform with numerous sponsor logos and a matching baseball cap with the number '1'. He is holding a white and red racing helmet in his hands. He has a silver watch on his left wrist."},
|
||
{"type": "obj", "bbox": [422, 212, 792, 452], "desc": "Max Verstappen's racing helmet, held in front of his chest. It features a white, red, and yellow design with the Red Bull logo and the 'Player 0.0' branding. The visor is clear and open."},
|
||
{"type": "text", "bbox": [657, 0, 755, 142], "text": "F1", "desc": "Large, stylized red logo on a black and purple background in the lower left."},
|
||
{"type": "text", "bbox": [768, 0, 818, 147], "text": "Formula 1\nWorld Championship™", "desc": "Small white sans-serif text below the F1 logo on the left side."},
|
||
{"type": "text", "bbox": [78, 447, 117, 510], "text": "ORACLE\nRed Bull\nRacing", "desc": "Very small white and orange logo on the front of the navy blue cap."},
|
||
{"type": "text", "bbox": [78, 417, 120, 440], "text": "1", "desc": "Bold red numeral '1' on the front left side of the navy blue cap."},
|
||
{"type": "text", "bbox": [332, 442, 363, 483], "text": "Red Bull", "desc": "Small yellow and red text logo on the collar of the uniform."},
|
||
{"type": "text", "bbox": [373, 490, 423, 532], "text": "RAUCH", "desc": "Small yellow and blue logo on the right chest of the uniform."},
|
||
{"type": "text", "bbox": [422, 473, 500, 532], "text": "BYBIT\nHONDA", "desc": "Medium-sized white sans-serif text on the right chest of the uniform."},
|
||
{"type": "text", "bbox": [410, 203, 442, 257], "text": "RAUCH", "desc": "Small yellow logo on the left upper arm of the uniform."},
|
||
{"type": "text", "bbox": [530, 448, 627, 510], "text": "Red Bull", "desc": "Medium red text logo on the right side of the torso, part of the Red Bull graphic."},
|
||
{"type": "text", "bbox": [680, 417, 768, 523], "text": "Red Bull", "desc": "Large red text logo across the lower torso of the uniform."},
|
||
{"type": "text", "bbox": [797, 475, 815, 518], "text": "MAX", "desc": "Small white text next to a Dutch flag on the belt area of the uniform."},
|
||
{"type": "text", "bbox": [558, 317, 715, 355], "text": "Player 0.0", "desc": "Black sans-serif text on a white band on the racing helmet."},
|
||
{"type": "text", "bbox": [560, 800, 582, 835], "text": "IA.COM", "desc": "Small blue sans-serif text on the right sleeve of the white shirt."},
|
||
{"type": "text", "bbox": [968, 8, 997, 332], "text": "© Anadolu Agency via Getty Images", "desc": "Small white watermark text in the bottom left corner."}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
## Safety filter
|
||
|
||
NSFW prompts are blocked. Instead of an image, the model returns a gray screen
|
||
with the text "Image blocked by safety filter". False positive rates for safety
|
||
is higher for non-json like prompts. We are aware that this is an issue an we may
|
||
make a future checkpoint update to improve it.
|
||
|
||
# Congratulations!
|
||
|
||
You are now a certified Ideogram 4 prompter!
|
||
|
||
With structured JSON captions, you have fine-grained control over composition,
|
||
color palettes, typography, and spatial layout — capabilities that go far
|
||
beyond what plain-text prompts can express!
|
||
We'd love to see what you create :-)
|
||
Share your results, experiments, and creative discoveries with the community,
|
||
especially the unexpected ones. Tag us on social media or open a discussion on
|
||
the repo. Happy generating!
|