OpenAICompletionsClient(api_key=None, base_url=None, timeout=60.0, **kwargs)
Bases: ModelClient
OpenAI Chat Completions API client implementing ModelClientV2 protocol.
This client works with OpenAI's Chat Completions API (client.chat.completions.create) which returns structured output with reasoning blocks (o1/o3 models), tool calls, and more.
Key Features: - Preserves reasoning blocks as ReasoningContent (o1/o3 models) - Handles tool calls and results - Supports multimodal content - Provides backward compatibility via create_v1_compatible()
Example
client = OpenAICompletionsClient(api_key="...")
Get rich response with reasoning
response = client.create({ "model": "o1-preview", "messages": [{"role": "user", "content": "Explain quantum computing"}] })
Access reasoning blocks
for reasoning in response.reasoning: print(f"Reasoning: {reasoning.reasoning}")
Get text response
print(f"Answer: {response.text}")
Initialize OpenAI Chat Completions API client.
| PARAMETER | DESCRIPTION |
api_key | OpenAI API key (or set OPENAI_API_KEY env var) TYPE: str | None DEFAULT: None |
base_url | Custom base URL for OpenAI API TYPE: str | None DEFAULT: None |
timeout | Request timeout in seconds TYPE: float DEFAULT: 60.0 |
**kwargs | Additional arguments passed to OpenAI client TYPE: Any DEFAULT: {} |
Source code in autogen/llm_clients/openai_completions_client.py
| def __init__(
self,
api_key: str | None = None,
base_url: str | None = None,
timeout: float = 60.0,
**kwargs: Any,
):
"""
Initialize OpenAI Chat Completions API client.
Args:
api_key: OpenAI API key (or set OPENAI_API_KEY env var)
base_url: Custom base URL for OpenAI API
timeout: Request timeout in seconds
**kwargs: Additional arguments passed to OpenAI client
"""
if openai_import_exception is not None:
raise openai_import_exception
self.client = OpenAI(api_key=api_key, base_url=base_url, timeout=timeout, **kwargs) # type: ignore[misc]
self._cost_per_token = {
# GPT-5 series - Latest flagship models (per million tokens)
"gpt-5": {"prompt": 1.25 / 1_000_000, "completion": 10.00 / 1_000_000},
"gpt-5-mini": {"prompt": 0.25 / 1_000_000, "completion": 2.00 / 1_000_000},
"gpt-5-nano": {"prompt": 0.05 / 1_000_000, "completion": 0.40 / 1_000_000},
# GPT-4o series - Multimodal flagship (per million tokens)
"gpt-4o": {"prompt": 2.50 / 1_000_000, "completion": 10.00 / 1_000_000},
"gpt-4o-mini": {"prompt": 0.15 / 1_000_000, "completion": 0.60 / 1_000_000},
# GPT-4 Turbo (per million tokens)
"gpt-4-turbo": {"prompt": 10.00 / 1_000_000, "completion": 30.00 / 1_000_000},
# GPT-4 legacy (per million tokens)
"gpt-4": {"prompt": 10.00 / 1_000_000, "completion": 30.00 / 1_000_000},
# GPT-3.5 Turbo (per million tokens)
"gpt-3.5-turbo": {"prompt": 0.50 / 1_000_000, "completion": 1.50 / 1_000_000},
# o1 series - Reasoning models (keep existing if still valid)
"o1-preview": {"prompt": 0.015 / 1000, "completion": 0.060 / 1000},
"o1-mini": {"prompt": 0.003 / 1000, "completion": 0.012 / 1000},
"o3-mini": {"prompt": 0.003 / 1000, "completion": 0.012 / 1000},
}
|
RESPONSE_USAGE_KEYS class-attribute instance-attribute
RESPONSE_USAGE_KEYS = ['prompt_tokens', 'completion_tokens', 'total_tokens', 'cost', 'model']
client instance-attribute
client = OpenAI(api_key=api_key, base_url=base_url, timeout=timeout, **kwargs)
ModelClientResponseProtocol
Bases: Protocol
choices instance-attribute
Choice
Bases: Protocol
message instance-attribute
Message
Bases: Protocol
content instance-attribute
create
Create a completion and return UnifiedResponse with all features preserved.
This method implements ModelClient.create() but returns UnifiedResponse instead of ModelClientResponseProtocol. The rich UnifiedResponse structure is compatible via duck typing - it has .model attribute and works with message_retrieval().
| PARAMETER | DESCRIPTION |
params | Request parameters including: - model: Model name (e.g., "o1-preview") - messages: List of message dicts - temperature: Optional temperature (not supported by o1 models) - max_tokens: Optional max completion tokens - tools: Optional tool definitions - **other OpenAI parameters TYPE: dict[str, Any] |
| RETURNS | DESCRIPTION |
UnifiedResponse | UnifiedResponse with reasoning blocks, citations, and all content preserved |
Source code in autogen/llm_clients/openai_completions_client.py
| def create(self, params: dict[str, Any]) -> UnifiedResponse: # type: ignore[override]
"""
Create a completion and return UnifiedResponse with all features preserved.
This method implements ModelClient.create() but returns UnifiedResponse instead
of ModelClientResponseProtocol. The rich UnifiedResponse structure is compatible
via duck typing - it has .model attribute and works with message_retrieval().
Args:
params: Request parameters including:
- model: Model name (e.g., "o1-preview")
- messages: List of message dicts
- temperature: Optional temperature (not supported by o1 models)
- max_tokens: Optional max completion tokens
- tools: Optional tool definitions
- **other OpenAI parameters
Returns:
UnifiedResponse with reasoning blocks, citations, and all content preserved
"""
# Call OpenAI API
response = self.client.chat.completions.create(**params)
# Transform to UnifiedResponse
return self._transform_response(response, params.get("model", "unknown"))
|
create_v1_compatible
create_v1_compatible(params)
Create completion in backward-compatible ChatCompletionExtended format.
This method provides compatibility with existing AG2 code that expects ChatCompletionExtended format. Note that reasoning blocks and citations will be lost in this format.
| PARAMETER | DESCRIPTION |
params | Same parameters as create() TYPE: dict[str, Any] |
| RETURNS | DESCRIPTION |
Any | ChatCompletionExtended-compatible dict (flattened response) |
Warning
This method loses information (reasoning blocks, citations) when converting to the legacy format. Prefer create() for new code.
Source code in autogen/llm_clients/openai_completions_client.py
| def create_v1_compatible(self, params: dict[str, Any]) -> Any:
"""
Create completion in backward-compatible ChatCompletionExtended format.
This method provides compatibility with existing AG2 code that expects
ChatCompletionExtended format. Note that reasoning blocks and citations
will be lost in this format.
Args:
params: Same parameters as create()
Returns:
ChatCompletionExtended-compatible dict (flattened response)
Warning:
This method loses information (reasoning blocks, citations) when
converting to the legacy format. Prefer create() for new code.
"""
# Get rich response
unified_response = self.create(params)
# Convert to legacy format (simplified - would need full ChatCompletionExtended in practice)
# Extract role and convert UserRoleEnum to string
role = unified_response.messages[0].role if unified_response.messages else UserRoleEnum.ASSISTANT
role_str = role.value if isinstance(role, UserRoleEnum) else role
return {
"id": unified_response.id,
"model": unified_response.model,
"created": unified_response.provider_metadata.get("created"),
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": role_str,
"content": unified_response.text,
},
"finish_reason": unified_response.finish_reason,
}
],
"usage": unified_response.usage,
"cost": unified_response.cost,
}
|
cost
Calculate cost from response usage.
Implements ModelClient.cost() but accepts UnifiedResponse via duck typing.
| PARAMETER | DESCRIPTION |
response | UnifiedResponse with usage information TYPE: UnifiedResponse |
| RETURNS | DESCRIPTION |
float | Cost in USD for the API call |
Source code in autogen/llm_clients/openai_completions_client.py
| def cost(self, response: UnifiedResponse) -> float: # type: ignore[override]
"""
Calculate cost from response usage.
Implements ModelClient.cost() but accepts UnifiedResponse via duck typing.
Args:
response: UnifiedResponse with usage information
Returns:
Cost in USD for the API call
"""
if not response.usage:
return 0.0
model = response.model
prompt_tokens = response.usage.get("prompt_tokens", 0)
completion_tokens = response.usage.get("completion_tokens", 0)
# Find pricing for model (exact match or prefix)
pricing = None
for model_key in self._cost_per_token:
if model.startswith(model_key):
pricing = self._cost_per_token[model_key]
break
if not pricing:
# Unknown model - use default pricing (GPT-4 Turbo level, per million tokens)
pricing = {"prompt": 10.00 / 1_000_000, "completion": 30.00 / 1_000_000}
return (prompt_tokens * pricing["prompt"]) + (completion_tokens * pricing["completion"])
|
get_usage staticmethod
Extract usage statistics from response.
Implements ModelClient.get_usage() but accepts UnifiedResponse via duck typing.
| PARAMETER | DESCRIPTION |
response | UnifiedResponse from create() TYPE: UnifiedResponse |
| RETURNS | DESCRIPTION |
dict[str, Any] | Dict with keys from RESPONSE_USAGE_KEYS |
Source code in autogen/llm_clients/openai_completions_client.py
| @staticmethod
def get_usage(response: UnifiedResponse) -> dict[str, Any]: # type: ignore[override]
"""
Extract usage statistics from response.
Implements ModelClient.get_usage() but accepts UnifiedResponse via duck typing.
Args:
response: UnifiedResponse from create()
Returns:
Dict with keys from RESPONSE_USAGE_KEYS
"""
return {
"prompt_tokens": response.usage.get("prompt_tokens", 0),
"completion_tokens": response.usage.get("completion_tokens", 0),
"total_tokens": response.usage.get("total_tokens", 0),
"cost": response.cost or 0.0,
"model": response.model,
}
|
message_retrieval
message_retrieval(response)
Retrieve text content from response messages.
Implements ModelClient.message_retrieval() but accepts UnifiedResponse via duck typing.
| PARAMETER | DESCRIPTION |
response | UnifiedResponse from create() TYPE: UnifiedResponse |
| RETURNS | DESCRIPTION |
list[str] | List of text strings from message content blocks |
Source code in autogen/llm_clients/openai_completions_client.py
| def message_retrieval(self, response: UnifiedResponse) -> list[str]: # type: ignore[override]
"""
Retrieve text content from response messages.
Implements ModelClient.message_retrieval() but accepts UnifiedResponse via duck typing.
Args:
response: UnifiedResponse from create()
Returns:
List of text strings from message content blocks
"""
return [msg.get_text() for msg in response.messages]
|