OpenAI Responses API V2 Client - Complete Guide#

This notebook demonstrates the OpenAIResponsesV2Client which implements the new OpenAI Responses API with rich UnifiedResponse objects.

Key Features#

Stateful Conversations: Maintain conversation context via previous_response_id
Built-in Tools: Web search, image generation, apply_patch
Rich Content Blocks: TextContent, ReasoningContent, CitationContent, ImageContent, ToolCallContent
Multimodal Support: Send and receive images
Structured Output: Pydantic models and JSON schema support
Cost Tracking: Token and image generation cost tracking
Agent Integration: Works with AG2 agents for single, two-agent, and group chat

Requirements#

AG2 requires Python>=3.10. Install the required packages:

%pip install "ag2[openai]"

Setup#

Set your OpenAI API key as an environment variable or pass it directly to the client.

# Set your API key (or use environment variable OPENAI_API_KEY)
# os.environ["OPENAI_API_KEY"] = "sk-..."

1. Basic Usage#

The OpenAIResponsesV2Client returns rich UnifiedResponse objects with typed content blocks.

from autogen.llm_clients.openai_responses_v2 import OpenAIResponsesV2Client

# Create the V2 client
client = OpenAIResponsesV2Client()

# Make a simple request
response = client.create({
    "model": "gpt-5-nano",
    "messages": [
        {"role": "user", "content": "how are you? tell me about yourself? and what is a machine? in one line"}
    ],
})

# Access the response
print(f"Response ID: {response.id}")
print(f"Model: {response.model}")
print(f"Content: {response.messages[0].get_text()}")

Understanding UnifiedResponse Structure#

The UnifiedResponse contains rich, typed content blocks:

from autogen.llm_clients.models.content_blocks import (
    ReasoningContent,
    TextContent,
)

# Inspect the response structure
print(f"Number of messages: {len(response.messages)}")
print(f"Usage: {response.usage}")
print(f"Cost: ${response.cost:.6f}")

# Iterate through content blocks
for msg in response.messages:
    print(f"\nRole: {msg.role}")
    for block in msg.content:
        if isinstance(block, TextContent):
            print(f"  Text: {block.text[:100]}..." if len(block.text) > 100 else f"  Text: {block.text}")
        elif isinstance(block, ReasoningContent):
            # Note that OpenAI may not return reasoning content with its API
            print(f"  Reasoning: {block.text[:100]}...")

2. Stateful Conversations#

The Responses API is stateful - it maintains conversation context server-side using previous_response_id.

# Create a new client for stateful conversation
stateful_client = OpenAIResponsesV2Client()

# First message
response1 = stateful_client.create({
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "My name is Alice. Remember this."}],
})
print(f"Response 1: {response1.messages[0].get_text()}")
print(f"Response ID: {response1.id}")

# Second message - the client automatically tracks state
response2 = stateful_client.create({"model": "gpt-4.1", "messages": [{"role": "user", "content": "What is my name?"}]})
print(f"Response 2: {response2.messages[0].get_text()}")
print("\nThe model remembered the context from the previous turn!")

# Reset conversation state to start fresh
stateful_client.reset_conversation()

response3 = stateful_client.create({"model": "gpt-4.1", "messages": [{"role": "user", "content": "What is my name?"}]})
print(f"After reset: {response3.messages[0].get_text()}")
print("\nThe model no longer has context from previous conversation.")

Manual State Control#

You can also manually control the conversation state:

# Get current state
current_state = stateful_client._get_previous_response_id()
print(f"Current state: {current_state}")

# Get a fresh response ID
response_a = client.create({"model": "gpt-4.1", "messages": [{"role": "user", "content": "Hello, my name is Alice"}]})

response_a_id = client._get_previous_response_id()

response_b = client.create({"model": "gpt-4.1", "messages": [{"role": "user", "content": "Hello, my name is Hatter"}]})

response_b_id = client._get_previous_response_id()

# Use the fresh ID immediately
client._set_previous_response_id(response_a.id)
response_a1 = client.create({
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "What's my name?"}],
    # "previous_response_id": response_a.id  # Use the real, fresh ID
})

response_a1.messages[0].get_text()

client._set_previous_response_id(response_b.id)
response_b1 = client.create({
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "What's my name?"}],
    # "previous_response_id": response_b.id  # Use the real, fresh ID
})
response_a1_id = client._get_previous_response_id()
response_b1_id = client._get_previous_response_id()

print("response_a1_id", response_a1_id)
print("response_b1_id", response_b1_id)

3. Multimodal Support#

Send images in your messages using various formats.

# Create a multimodal message with an image URL
multimodal_message = OpenAIResponsesV2Client.create_multimodal_message(
    text="What do you see in this image?",
    images=["https://images.unsplash.com/photo-1587300003388-59208cc962cb?w=400"],
    role="user",
)

print("Multimodal message structure:")
print(multimodal_message)

# Send multimodal request
mm_client = OpenAIResponsesV2Client()

response = mm_client.create({
    "model": "gpt-4.1",  # Use a vision-capable model
    "messages": [multimodal_message],
})

print(f"Image description: {response.messages[0].get_text()}")

4. Built-in Tools#

The Responses API provides built-in tools that don’t require function definitions. supports an array of built in tools : [“web_search”, “image_generation”, “apply_patch”, “apply_patch_async”, “shell_tool”] - web_search - Enables the model to search the web for real-time information and returns results with citations. - image_generation - Allows the model to generate images from text descriptions using DALL-E or GPT-Image models. - apply_patch - Enables file operations (create, update, delete files) in a workspace directory with path restrictions. - apply_patch_async - Same as apply_patch but executes file operations asynchronously for better performance. - shell - Executes shell commands with configurable sandboxing, command filtering, and security restrictions. ## 4.1 Web Search

# Enable web search
search_client = OpenAIResponsesV2Client()

response = search_client.create({
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "What is the latest news about AI?"}],
    "built_in_tools": ["web_search"],
})

print(f"Response: {response.messages[0].get_text()[:500]}...")

# Extract citations from the response
citations = OpenAIResponsesV2Client.get_citations(response)

print(f"\nFound {len(citations)} citations:")
for citation in citations[:5]:  # Show first 5
    print(f"  - {citation.title}: {citation.url}")

4.2 Image Generation#

import base64

from IPython.display import Image, display

# Enable image generation
image_client = OpenAIResponsesV2Client()

# # Configure image output parameters
image_client.set_image_output_params(quality="high", size="1024x1024", output_format="png")

response = image_client.create({
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "Generate an image of a tree with fruit"}],
    "built_in_tools": ["image_generation"],
})

# Extract generated images
images = OpenAIResponsesV2Client.get_generated_images(response)
print(f"Generated {len(images)} image(s)")

if images:
    # Get the data URI
    data_uri = images[0].data_uri

    # Extract base64 data (remove the "data:image/png;base64," prefix)
    if data_uri and data_uri.startswith("data:"):
        # Split on comma to get base64 data
        base64_data = data_uri.split(",", 1)[1]

        # Display the image
        display(Image(data=base64.b64decode(base64_data)))

# Check image generation costs
print(f"Image costs: ${image_client.get_image_costs():.4f}")
print(f"Total costs: ${image_client.get_total_costs():.4f}")

4.3 Structured Output#

from pydantic import BaseModel

from autogen.llm_clients.openai_responses_v2 import OpenAIResponsesV2Client

# Define a Pydantic model for structured output
class Person(BaseModel):
    name: str
    age: int
    occupation: str

# Request structured output
struct_client = OpenAIResponsesV2Client()

response = struct_client.create({
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "Generate a fictional person's profile"}],
    "response_format": Person,
})

# Get the parsed object
parsed = OpenAIResponsesV2Client.get_parsed_object(response)

if parsed:
    print(parsed)
    print("---------------------------------")
    print(f"Name: {parsed.name}")
    print(f"Age: {parsed.age}")
    print(f"Occupation: {parsed.occupation}")

5. Cost Tracking#

The V2 client tracks both token costs and image generation costs.

cost_client = OpenAIResponsesV2Client()

# Make several requests
for i in range(3):
    response = cost_client.create({"model": "gpt-4.1", "messages": [{"role": "user", "content": f"Count to {i + 1}"}]})

    # Per-request cost
    usage = OpenAIResponsesV2Client.get_usage(response)
    print(f"Request {i + 1}: {usage['total_tokens']} tokens, ${usage['cost']:.6f}")

# Get cumulative usage
cumulative = cost_client.get_cumulative_usage()
print("\nCumulative Usage:")
print(f"  Total prompt tokens: {cumulative['prompt_tokens']}")
print(f"  Total completion tokens: {cumulative['completion_tokens']}")
print(f"  Total tokens: {cumulative['total_tokens']}")
print(f"  Token cost: ${cumulative['token_cost']:.6f}")
print(f"  Image cost: ${cumulative['image_cost']:.6f}")
print(f"  Total cost: ${cumulative['total_cost']:.6f}")

# Reset cost tracking
cost_client.reset_all_costs()
print(f"After reset: ${cost_client.get_total_costs():.6f}")

6. V1 Backward Compatibility#

For code that expects ChatCompletion format, use create_v1_compatible().

v2_client = OpenAIResponsesV2Client()

# Get ChatCompletion-like response
response = v2_client.create_v1_compatible({"model": "gpt-4.1", "messages": [{"role": "user", "content": "Hello!"}]})

# Access like standard ChatCompletion
print(f"Type: {type(response).__name__}")
print(f"Content: {response.choices[0].message.content}")
print(f"Tokens: {response.usage.total_tokens}")
print(f"Cost: ${response.cost:.6f}")

7. Agent Integration#

The V2 client integrates with AG2 agents for conversational AI workflows.

7.1 Single Agent#

from autogen import ConversableAgent

# Configure LLM with Responses API
config_list = [
    {
        "model": "gpt-5-nano",
        "api_type": "responses_v2",  # Use Responses API
    }
]

llm_config = {"config_list": config_list}

def math_tool(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

# Create a single assistant agent
assistant = ConversableAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="You are a helpful AI assistant who can do math. use the math_tool to do math.",
    functions=[math_tool],
)

assistant.register_for_execution()(math_tool)

# Start a conversation
result = assistant.run(
    assistant,
    message="use a tool to perform 2+2",
    max_turns=2,
)
result.process()

7.2 Two-Agent Chat#

# Create two specialized agents
researcher = ConversableAgent(
    name="researcher",
    llm_config=llm_config,
    system_message="""You are a research assistant. Your job is to:
    1. Analyze questions thoroughly
    2. Provide detailed, factual information
    3. Cite sources when possible""",
)

critic = ConversableAgent(
    name="critic",
    llm_config=llm_config,
    system_message="""You are a critical reviewer. Your job is to:
    1. Review the researcher's findings
    2. Point out any gaps or inaccuracies
    3. Suggest improvements
    Say 'TERMINATE' when the research is satisfactory.""",
)

# Two-agent collaboration
response = researcher.run(
    critic, message="Research the benefits and drawbacks of renewable energy sources.", max_turns=2
)
response.process()

7.3 Group Chat#

# Create multiple specialized agents for group chat
planner = ConversableAgent(
    name="planner",
    llm_config=llm_config,
    system_message="""You are a project planner. Break down tasks into actionable steps.
    Focus on creating clear, organized plans. Do not do any coding.""",
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""),
)

developer = ConversableAgent(
    name="developer",
    llm_config=llm_config,
    system_message="""You are a software developer. Implement solutions based on the plan.
    Write clean, well-documented code. Do not create plans.""",
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""),
)

reviewer = ConversableAgent(
    name="reviewer",
    llm_config=llm_config,
    system_message="""You are a code reviewer. Review implementations for:
    1. Correctness
    2. Best practices
    3. Potential improvements
    Say 'TERMINATE' when the solution is complete and reviewed.""",
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""),
)

# Create group chat
from autogen.agentchat import run_group_chat
from autogen.agentchat.group.patterns import AutoPattern

pattern = AutoPattern(
    initial_agent=planner,
    agents=[planner, developer, reviewer],
    group_manager_args={"llm_config": llm_config},
)

response = run_group_chat(
    pattern=pattern,
    messages="Create a Python function that calculates the Fibonacci sequence up to n terms.",
    max_rounds=4,
)

response.process()

8. Advanced: Custom Function Tools#

Combine built-in tools with custom function tools.

# Define custom tools
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # Mock implementation
    return f"The weather in {city} is sunny, 72°F"

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

# Define tool schemas
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string", "description": "City name"{{ "}}" }},
                "required": ["city"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a math expression",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string", "description": "Math expression"{{ "}}" }},
                "required": ["expression"],
            },
        },
    },
]

print("Tools defined successfully!")

from autogen.llm_clients.openai_responses_v2 import OpenAIResponsesV2Client, TextContent, ToolCallContent

# Use custom tools with the V2 client
tools_client = OpenAIResponsesV2Client()

response = tools_client.create({
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "What's 25 * 4 + 10?"}],
    "tools": tools,
})

# Check for tool calls
for msg in response.messages:
    for block in msg.content:
        if isinstance(block, ToolCallContent):
            print(f"Tool call: {block.name}({block.arguments})")
        elif isinstance(block, TextContent):
            print(f"Text: {block.text}")

Summary#

The OpenAIResponsesV2Client provides:

Feature	Description
Stateful Conversations	Automatic context tracking via `previous_response_id`
Rich Content Blocks	TextContent, ReasoningContent, CitationContent, ImageContent, ToolCallContent
Built-in Tools	Web search, image generation, apply_patch
Multimodal Support	Send and receive images
Structured Output	Pydantic models and JSON schemas
Cost Tracking	Token and image generation cost tracking
V1 Compatibility	`create_v1_compatible()` for ChatCompletion format
Agent Integration	Works with AG2 single, two-agent, and group chat

For more information, see the AG2 documentation.