Skip to content

OpenTelemetry Tracing

AG2 provides built-in OpenTelemetry instrumentation for multi-agent workflows. The tracing module follows the OpenTelemetry GenAI Semantic Conventions for agent spans, giving you structured observability into conversations, LLM calls, tool executions, code execution, and human-in-the-loop interactions.

Because AG2 uses standard OpenTelemetry, traces can be exported to any compatible backend -- Jaeger, Grafana Tempo, Datadog, Honeycomb, Axiom, and many others.

Installation#

Install AG2 with the tracing extra to pull in the required OpenTelemetry SDK and exporter packages:

pip install "ag2[tracing]"

This installs opentelemetry-api, opentelemetry-sdk, and the OTLP gRPC exporter.

Quick Start#

The following example sets up tracing with a ConsoleSpanExporter so you can see spans printed directly to your terminal. In production you would replace this with an OTLP exporter pointed at your backend.

quickstart_tracing.py
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter

from autogen import ConversableAgent, LLMConfig
from autogen.opentelemetry import instrument_agent, instrument_llm_wrapper

# 1. Configure the TracerProvider
resource = Resource.create(attributes={"service.name": "ag2-quickstart"})
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(tracer_provider)

# 2. Create agents
llm_config = LLMConfig({"model": "gpt-4o-mini"})

assistant = ConversableAgent(
    name="assistant",
    system_message="You are a helpful assistant.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

user_proxy = ConversableAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=1,
)

# 3. Instrument agents and LLM calls
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_agent(assistant, tracer_provider=tracer_provider)
instrument_agent(user_proxy, tracer_provider=tracer_provider)

# 4. Run a chat
result = user_proxy.run(
    assistant,
    message="What is the capital of France?",
    max_turns=2,
)
result.process()

When you run this script, the ConsoleSpanExporter prints each span to stdout as it completes. You will see a conversation span wrapping the entire chat, invoke_agent spans for each generate_reply call, and chat spans for each LLM API request -- all connected by a shared trace ID.

Instrumentation API#

AG2 exposes four instrumentation functions in autogen.opentelemetry. Each one takes a tracer_provider keyword argument and patches the target object in place.

instrument_agent#

from autogen.opentelemetry import instrument_agent

instrument_agent(agent, tracer_provider=tracer_provider)

Instruments a single ConversableAgent (or any subclass) to emit spans for:

Activity Span name pattern When it fires
Conversations conversation {agent.name} run / initiate_chat / a_initiate_chat / resume
Agent invocations invoke_agent {agent.name} generate_reply / a_generate_reply
Remote agent calls invoke_agent {agent.name} a_generate_remote_reply (A2A)
Tool execution execute_tool {func_name} execute_function / a_execute_function
Code execution execute_code {agent.name} Code-execution reply handler
Human input await_human_input {agent.name} get_human_input / a_get_human_input
Sequential / parallel chats agent.initiate_chats initiate_chats / a_initiate_chats

The function returns the same agent object (modified in place), so you can chain it if you like:

agent = instrument_agent(ConversableAgent(...), tracer_provider=tracer_provider)

instrument_llm_wrapper#

from autogen.opentelemetry import instrument_llm_wrapper

instrument_llm_wrapper(tracer_provider=tracer_provider)

Instruments all LLM calls globally by patching OpenAIWrapper.create(). Each call produces a chat {model} span that captures:

  • Provider name (openai, anthropic, azure.ai.openai, etc.)
  • Model name
  • Token usage (input and output)
  • Request parameters (temperature, max_tokens, etc.)
  • Response metadata (finish reasons, cost)

LLM spans automatically become children of the active invoke_agent span through OpenTelemetry's context propagation.

Message capture -- By default, request and response messages are not captured because they may contain sensitive data. To enable message capture for debugging, pass capture_messages=True:

instrument_llm_wrapper(tracer_provider=tracer_provider, capture_messages=True)

See the Advanced: Message Capture section for details.

instrument_pattern#

from autogen.opentelemetry import instrument_pattern

instrument_pattern(pattern, tracer_provider=tracer_provider)

Instruments a group chat Pattern (such as AutoPattern, RoundRobinPattern, etc.). This is the recommended approach for group chats because it automatically:

  • Instruments all agents in the pattern
  • Instruments the GroupChatManager
  • Wraps speaker selection to produce speaker_selection spans with candidate and selected-agent attributes
group_chat_tracing.py
from autogen import ConversableAgent, LLMConfig
from autogen.agentchat import run_group_chat
from autogen.agentchat.group.patterns import AutoPattern
from autogen.opentelemetry import instrument_llm_wrapper, instrument_pattern

llm_config = LLMConfig({"model": "gpt-4o-mini"})

researcher = ConversableAgent(
    name="researcher",
    system_message="You research topics and provide factual information. Be concise.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

writer = ConversableAgent(
    name="writer",
    system_message="You take research and write clear summaries. Say TERMINATE when done.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

user = ConversableAgent(name="user", human_input_mode="NEVER", llm_config=False)

pattern = AutoPattern(
    initial_agent=researcher,
    agents=[researcher, writer],
    user_agent=user,
    group_manager_args={"llm_config": llm_config},
)

# Instrument everything at once
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_pattern(pattern, tracer_provider=tracer_provider)

result = run_group_chat(
    pattern=pattern,
    messages="What are the three laws of thermodynamics? Summarize briefly.",
    max_rounds=5,
)
result.process()

instrument_a2a_server#

from autogen.opentelemetry import instrument_a2a_server

instrument_a2a_server(server, tracer_provider=tracer_provider)

Instruments an A2aAgentServer for distributed tracing across services. This function:

  • Adds ASGI middleware that extracts W3C Trace Context headers from incoming HTTP requests
  • Instruments the server's underlying agent (calls instrument_agent internally)

When a client sends a request with a traceparent header, the server-side spans are linked to the client's trace, giving you a single end-to-end trace across services.

a2a_server_traced.py
from autogen import ConversableAgent, LLMConfig
from autogen.a2a import A2aAgentServer
from autogen.opentelemetry import instrument_a2a_server

llm_config = LLMConfig({"model": "gpt-4o-mini"})

agent = ConversableAgent(
    name="tech_agent",
    system_message="You solve technical problems.",
    llm_config=llm_config,
)

server = A2aAgentServer(agent, url="http://localhost:18123/")
server = instrument_a2a_server(server, tracer_provider=tracer_provider)
app = server.build()

On the client side, instrument_agent automatically injects traceparent headers into outgoing HTTP calls for A2aRemoteAgent, so no additional setup is needed on the client.

Trace Hierarchy#

AG2 traces form a hierarchical tree that mirrors how agents process a conversation. Here is the typical structure for a two-agent chat:

conversation user_proxy                   # run
  |-- invoke_agent assistant              # generate_reply
  |     |-- chat gpt-4o-mini              # LLM API call
  |-- invoke_agent user_proxy             # generate_reply
  |-- invoke_agent assistant              # generate_reply
  |     |-- chat gpt-4o-mini              # LLM API call
  |     +-- execute_tool get_weather      # tool execution
  |-- invoke_agent assistant              # generate_reply
  |     +-- chat gpt-4o-mini              # LLM API call
  +-- invoke_agent user_proxy             # generate_reply

For group chats with a pattern, the tree includes speaker selection:

conversation chat_manager                 # run_chat (GroupChatManager)
  |-- speaker_selection                   # auto speaker selection
  |     +-- invoke_agent speaker_sel...   # internal LLM call to pick speaker
  |           +-- chat gpt-4o-mini
  |-- invoke_agent researcher             # selected agent generates reply
  |     +-- chat gpt-4o-mini
  |-- speaker_selection
  |     +-- invoke_agent speaker_sel...
  |           +-- chat gpt-4o-mini
  +-- invoke_agent writer
        +-- chat gpt-4o-mini

Span Types#

Every span emitted by AG2 includes an ag2.span.type attribute that identifies what the span represents:

ag2.span.type Operation name Triggered by
conversation conversation run, initiate_chat, a_initiate_chat, resume, run_chat, a_run_chat
multi_conversation initiate_chats initiate_chats, a_initiate_chats (sequential or parallel)
agent invoke_agent generate_reply, a_generate_reply, a_generate_remote_reply
llm chat OpenAIWrapper.create() (every LLM API call)
tool execute_tool execute_function, a_execute_function
code_execution execute_code Code-execution reply handler
human_input await_human_input get_human_input, a_get_human_input
speaker_selection speaker_selection _auto_select_speaker, a_auto_select_speaker (group chat)

Semantic Attributes#

AG2 spans carry both standard OpenTelemetry GenAI attributes and AG2-specific attributes.

Standard OpenTelemetry GenAI Attributes#

These follow the OpenTelemetry GenAI Semantic Conventions:

Attribute Type Description Span types
gen_ai.operation.name string Operation name (conversation, invoke_agent, chat, etc.) All
gen_ai.agent.name string Name of the agent conversation, agent, code_execution, human_input
gen_ai.provider.name string LLM provider (openai, anthropic, azure.ai.openai, etc.) conversation, agent, llm
gen_ai.request.model string Requested model name conversation, agent, llm
gen_ai.response.model string Model name in the response (may differ from request) conversation, llm
gen_ai.usage.input_tokens int Number of input/prompt tokens conversation, llm
gen_ai.usage.output_tokens int Number of output/completion tokens conversation, llm
gen_ai.usage.cost float Total cost of the operation conversation, llm
gen_ai.request.temperature float Temperature parameter llm
gen_ai.request.max_tokens int Max tokens parameter llm
gen_ai.request.top_p float Top-p parameter llm
gen_ai.input.messages string (JSON) Input messages in OTEL format conversation, agent, llm (opt-in)
gen_ai.output.messages string (JSON) Output messages in OTEL format conversation, agent, llm (opt-in)
gen_ai.response.finish_reasons string (JSON) Finish reasons from the LLM response llm
gen_ai.tool.name string Tool function name tool
gen_ai.tool.call.id string Tool call ID tool
gen_ai.tool.call.arguments string (JSON) Tool call arguments tool
gen_ai.tool.call.result string Tool call result tool
gen_ai.conversation.id string Conversation/chat ID conversation
gen_ai.conversation.turns int Number of turns in the conversation conversation
gen_ai.conversation.max_turns int Maximum allowed turns conversation

AG2-Specific Attributes#

Attribute Type Description Span types
ag2.span.type string AG2 span type (see Span Types table) All
ag2.speaker_selection.candidates string (JSON) List of candidate agent names speaker_selection
ag2.speaker_selection.selected string Name of the selected speaker speaker_selection
ag2.human_input.prompt string Prompt shown to the human human_input
ag2.human_input.response string Human's response human_input
ag2.code_execution.exit_code int Exit code from code execution code_execution
ag2.code_execution.output string Output from code execution (truncated to 4096 chars) code_execution
ag2.chats.count int Number of chats in initiate_chats multi_conversation
ag2.chats.mode string "sequential" or "parallel" multi_conversation
ag2.chats.recipients string (JSON) List of recipient agent names multi_conversation
gen_ai.agent.remote bool Whether the agent is a remote A2A agent agent
server.address string URL of the remote A2A agent agent
error.type string Error type on failure (ExecutionError, CodeExecutionError, etc.) Any

Backend Integration#

Grafana Tempo#

Grafana Tempo is an open-source distributed tracing backend that integrates with the Grafana observability stack.

tempo_setup.py
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

resource = Resource.create(attributes={"service.name": "my-ag2-service"})
tracer_provider = TracerProvider(resource=resource)

# Point to your Tempo/OTel Collector OTLP gRPC endpoint
exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tracer_provider)

Jaeger#

Jaeger natively supports OTLP ingestion, so the setup is the same:

jaeger_setup.py
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Jaeger's OTLP gRPC endpoint (default port 4317)
exporter = OTLPSpanExporter(endpoint="http://localhost:4317")

Commercial Backends#

Most commercial observability platforms accept OTLP traces. Typically you only need to change the endpoint and add authentication headers:

exporter = OTLPSpanExporter(
    endpoint="https://otel.vendor.com:4317",
    headers={"api-key": "YOUR_API_KEY"},
)

Consult your vendor's documentation for the exact endpoint and authentication details. Popular options include Datadog, Honeycomb, and Grafana Cloud.

Local Development Stack#

For local development and testing, you can run an OpenTelemetry Collector, Grafana Tempo, and Grafana using Docker Compose. See the Local OpenTelemetry Setup page for the full configuration files and instructions.

Once the stack is running, point your exporter at the collector:

exporter = OTLPSpanExporter(endpoint="http://localhost:14317")

Then open http://localhost:3333 in your browser to explore traces in Grafana.

Examples#

Two-Agent Chat#

A basic code-review workflow between a reviewer and a coder, with full tracing of the conversation, each agent turn, and every LLM call:

two_agent_tracing.py
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

from autogen import ConversableAgent, LLMConfig
from autogen.opentelemetry import instrument_agent, instrument_llm_wrapper

# Setup tracing
resource = Resource.create(attributes={"service.name": "two-agent-chat"})
tracer_provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="http://localhost:14317")
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tracer_provider)

llm_config = LLMConfig({"model": "gpt-4o-mini"})

reviewer = ConversableAgent(
    name="reviewer",
    system_message="You are a code reviewer. Provide constructive feedback.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

coder = ConversableAgent(
    name="coder",
    system_message="You are an expert Python developer.",
    llm_config=llm_config,
    human_input_mode="NEVER",
    is_termination_msg=lambda x: "LGTM" in x.get("content", ""),
)

# Instrument
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_agent(reviewer, tracer_provider=tracer_provider)
instrument_agent(coder, tracer_provider=tracer_provider)

result = reviewer.run(
    coder,
    message="Write a Python function to compute the Fibonacci sequence.",
    max_turns=3,
)
result.process()

Tool Execution#

Tool calls produce execute_tool spans nested inside the agent's invoke_agent span:

tool_tracing.py
from typing import Annotated

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

from autogen import ConversableAgent, LLMConfig
from autogen.opentelemetry import instrument_agent, instrument_llm_wrapper
from autogen.tools import tool

# Setup tracing
resource = Resource.create(attributes={"service.name": "tool-tracing"})
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:14317"))
)
trace.set_tracer_provider(tracer_provider)

@tool(description="Get weather information for a city")
def get_weather(city: Annotated[str, "The city name"]) -> str:
    """Get weather information for a city."""
    weather_data = {
        "new york": "Sunny, 72F",
        "london": "Cloudy, 15C",
        "tokyo": "Rainy, 18C",
    }
    return weather_data.get(city.lower(), f"Weather data not available for {city}")

llm_config = LLMConfig({"model": "gpt-4o-mini"})

weather_agent = ConversableAgent(
    name="weather",
    system_message="Use the get_weather tool to answer weather questions.",
    functions=[get_weather],
    llm_config=llm_config,
    human_input_mode="NEVER",
)

# Instrument
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_agent(weather_agent, tracer_provider=tracer_provider)

result = weather_agent.run(message="What is the weather in Tokyo?", max_turns=2)
result.process()

Group Chat with Pattern#

Use instrument_pattern instead of instrumenting each agent individually:

group_chat_tracing.py
from autogen import ConversableAgent, LLMConfig
from autogen.agentchat import run_group_chat
from autogen.agentchat.group.patterns import AutoPattern
from autogen.opentelemetry import instrument_llm_wrapper, instrument_pattern

llm_config = LLMConfig({"model": "gpt-4o-mini"})

researcher = ConversableAgent(
    name="researcher",
    system_message="You research topics and provide factual information.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

writer = ConversableAgent(
    name="writer",
    system_message="You write clear summaries. Say TERMINATE after being asked to summarize.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

user = ConversableAgent(name="user", human_input_mode="NEVER", llm_config=False, default_auto_reply="Great, summarize for me.")

pattern = AutoPattern(
    initial_agent=researcher,
    agents=[researcher, writer],
    user_agent=user,
    group_manager_args={"llm_config": llm_config},
)

# Instrument everything in one call
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_pattern(pattern, tracer_provider=tracer_provider)

result = run_group_chat(
    pattern=pattern,
    messages="Explain quantum computing in simple terms.",
    max_rounds=5,
)
result.process()

Distributed Tracing with A2A#

For multi-service architectures where some agents run as separate A2A servers, instrument both sides to get a single end-to-end trace:

Remote agent server:

remote_server.py
from autogen import ConversableAgent, LLMConfig
from autogen.a2a import A2aAgentServer
from autogen.opentelemetry import instrument_a2a_server

llm_config = LLMConfig({"model": "gpt-4o-mini"})

tech_agent = ConversableAgent(
    name="tech_agent",
    system_message="You solve technical problems.",
    llm_config=llm_config,
)

server = A2aAgentServer(tech_agent, url="http://localhost:18123/")
server = instrument_a2a_server(server, tracer_provider=tracer_provider)
app = server.build()

# Run with: uvicorn remote_server:app --port 18123

Client with remote agent in a group chat:

client_group_chat.py
import asyncio

from autogen import ConversableAgent, LLMConfig
from autogen.a2a import A2aRemoteAgent
from autogen.agentchat import a_run_group_chat
from autogen.agentchat.group.patterns import AutoPattern
from autogen.opentelemetry import instrument_pattern

llm_config = LLMConfig({"model": "gpt-4o-mini"})

triage_agent = ConversableAgent(
    name="triage_agent",
    system_message="Route technical issues to the tech agent.",
    llm_config=llm_config,
)

# Remote agent -- calls are traced across the network
tech_agent = A2aRemoteAgent(
    "http://localhost:18123/",
    name="tech_agent",
)

pattern = AutoPattern(
    initial_agent=triage_agent,
    agents=[triage_agent, tech_agent],
    group_manager_args={"llm_config": llm_config},
)

instrument_pattern(pattern, tracer_provider=tracer_provider)

async def main():
    result = await a_run_group_chat(
        pattern=pattern,
        messages="My application crashes on startup with a segfault.",
        max_rounds=5,
    )
    await result.process()

asyncio.run(main())

The instrument_pattern call on the client side instruments A2aRemoteAgent so that W3C traceparent headers are injected into outgoing HTTP requests. The instrument_a2a_server call on the server side extracts those headers, linking the server-side spans to the client trace.

Message and Data Capture#

AG2 tracing captures data at different levels depending on the span type:

Span type Data captured Controllable?
conversation (run / initiate_chat) Input/output messages No, always captured
agent (generate_reply) Input/output messages No, always captured
tool (execute_function) Tool arguments and results No, always captured
human_input (get_human_input) Prompt shown and human's response No, always captured
llm (LLM API call) Request/response messages Yes, off by default via capture_messages

Conversation and agent spans always include input and output messages (gen_ai.input.messages / gen_ai.output.messages). Tool spans always capture call arguments (gen_ai.tool.call.arguments) and results (gen_ai.tool.call.result). Human input spans always capture the prompt shown (ag2.human_input.prompt) and the human's response (ag2.human_input.response). Only LLM spans require explicit opt-in via capture_messages=True -- all other span types capture their data by default.

Warning

If your agents process sensitive data (personal information, credentials, proprietary content), be aware that conversation messages, tool arguments and results, and human input prompts and responses will appear in your tracing backend with default settings. Ensure your backend has appropriate access controls and retention policies.

Enabling LLM Message Capture#

By default, instrument_llm_wrapper does not capture request and response messages on LLM spans. To enable this for debugging or development:

instrument_llm_wrapper(tracer_provider=tracer_provider, capture_messages=True)

When enabled, two additional attributes appear on chat (LLM) spans:

  • gen_ai.input.messages -- JSON array of input messages in OpenTelemetry GenAI format
  • gen_ai.output.messages -- JSON array of output messages in OpenTelemetry GenAI format

This includes the full request and response payloads sent to and received from the LLM provider, which can be large and may duplicate content already present on the parent agent span.