Image Generation#
Some providers can produce images as part of an agent's reply. Generated images are always returned the same way — as a list of BinaryResult objects on reply.files — regardless of which provider produced them.
AG2 exposes two different mechanisms, because the providers expose two different APIs:
| Provider | Mechanism | How to enable |
|---|---|---|
| OpenAI | ImageGenerationTool (server-side tool, Responses API) | Add the tool to the agent |
| Gemini | IMAGE response modality on an image model | Set response_modalities=["TEXT", "IMAGE"] |
Note
OpenAI's ImageGenerationTool is not supported on Gemini, and Gemini's image modality is not available on OpenAI. Each provider uses its own mechanism below.
Reading generated images#
Every generated image is a BinaryResult on reply.files. It carries the raw bytes and a metadata dict; the image's media type is stored under the media_type key.
reply.files is empty when the model returns only text, so it is safe to iterate even when no image was produced.
OpenAI#
Add ImageGenerationTool to an agent configured with the Responses API (OpenAIResponsesConfig). The model decides when to call the tool and the generated image is appended to reply.files.
| Parameter | Description |
|---|---|
quality | "low", "medium", "high", or "auto" |
size | e.g. "1024x1024", "1536x1024", or "auto" |
background | "transparent", "opaque", or "auto" |
output_format | "png", "jpeg", or "webp" |
output_compression | 0–100, for jpeg/webp only |
partial_images | 1–3, number of partial images to stream |
Warning
ImageGenerationTool requires the Responses API. Using it with the Chat Completions API (OpenAIConfig) raises an UnsupportedToolError.
Gemini#
Gemini does not use a tool for image generation. Instead, you select an image-capable model and request the IMAGE response modality via response_modalities. The model returns the image inline and AG2 surfaces it on reply.files.
Note
Image output requires an image-capable Gemini model (for example gemini-3.1-flash-image). Requesting the IMAGE modality on a text-only model returns no image. Include "TEXT" alongside "IMAGE" so the model can still return any accompanying text in reply.body.
response_modalities is also available on VertexAIConfig for Gemini models served through Vertex AI.
Controlling size and aspect ratio#
Gemini does not take a pixel size string like OpenAI. Instead, pass a types.ImageConfig through image_config to set the aspect ratio and a resolution tier:
| Field | Values |
|---|---|
aspect_ratio | e.g. "1:1", "4:3", "3:4", "16:9", "9:16", "21:9" |
image_size | resolution tier — "1K", "2K" (higher tiers are model-dependent) |
image_config is a full passthrough of the SDK's types.ImageConfig, so any other field it supports (such as person_generation) is available too. It is also accepted on VertexAIConfig.
Editing an existing image#
To edit an image instead of generating one from scratch, pass it in as an ImageInput alongside your instruction. The edited image is returned on reply.files, exactly like a freshly generated one — so the same image can be sent back in for further rounds of editing.
ImageInput also accepts raw bytes — ImageInput(data=image.data, media_type="image/png") — which lets you feed a generated image straight back in for another edit without writing it to disk.