Skip to content

AG2 OpenTelemetry Tracing: Full Observability for Multi-Agent Systems

AG2 OpenTelemetry Tracing

Multi-agent systems are powerful -- but when something goes wrong, figuring out where and why is painful. Which agent made the bad decision? Was the LLM call slow, or did the tool fail? How many tokens did that group chat actually use?

AG2 now has built-in OpenTelemetry tracing that gives you full visibility into your multi-agent workflows. Every conversation, agent turn, LLM call, tool execution, and speaker selection is captured as a structured span -- connected by a shared trace ID and exportable to any OpenTelemetry-compatible backend.

Key highlights:

  • Four simple API functions to instrument agents, LLM calls, group chats, and A2A servers
  • Hierarchical traces that mirror how agents process conversations
  • Distributed tracing across services using W3C Trace Context propagation
  • Works with any backend -- Jaeger, Grafana Tempo, Datadog, Honeycomb, and more
  • Follows OpenTelemetry GenAI Semantic Conventions for standard interoperability

What is OpenTelemetry Tracing?#

OpenTelemetry is the industry-standard framework for observability in distributed systems. At its core, tracing captures the path of a request through your system as a tree of spans -- each span representing a unit of work (an LLM call, a tool execution, an agent turn) with timing, attributes, and parent-child relationships.

A few key concepts:

  • Trace -- A complete end-to-end record of a request, represented as a tree of spans sharing a single trace ID.
  • Span -- A single unit of work within a trace (e.g. an LLM call or a tool execution). Each span has a name, start/end timestamps, a parent span (forming the tree), and key-value attributes that carry metadata like model name, token counts, or tool arguments.
  • Span Kind -- An OpenTelemetry classification of how a span relates to the outside world. AG2 uses CLIENT for LLM spans (outgoing calls to an external API) and INTERNAL (the default) for all other spans like agent turns, tool executions, and conversations.
  • Span Type (ag2.span.type) -- AG2's own attribute that classifies what each span represents: conversation, agent, llm, tool, code_execution, human_input, or speaker_selection. This lets you filter and group spans by their role in the agent workflow.

For multi-agent systems, tracing answers questions like:

  • What was the full sequence of agent interactions in this conversation?
  • How long did each LLM call take, and how many tokens did it consume?
  • Which tool was called, with what arguments, and what did it return?
  • In a group chat, why was a particular agent selected to speak?
  • Across multiple services, how did a request flow from client to remote agent?

AG2's tracing follows the OpenTelemetry GenAI Semantic Conventions so your agent traces are structured in a considered, agentic, way.

Quick Start#

Install AG2 with the tracing extra:

pip install "ag2[tracing]"

Then set up a TracerProvider, instrument your agents, and run a chat:

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter

from autogen import ConversableAgent, LLMConfig
from autogen.opentelemetry import instrument_agent, instrument_llm_wrapper

# 1. Configure the TracerProvider
resource = Resource.create(attributes={"service.name": "ag2-quickstart"})
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(tracer_provider)

# 2. Create agents
llm_config = LLMConfig({"model": "gpt-4o-mini"})

assistant = ConversableAgent(
    name="assistant",
    system_message="You are a helpful assistant.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

user_proxy = ConversableAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=1,
)

# 3. Instrument agents and LLM calls
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_agent(assistant, tracer_provider=tracer_provider)
instrument_agent(user_proxy, tracer_provider=tracer_provider)

# 4. Run a chat
result = user_proxy.run(
    assistant,
    message="What is the capital of France?",
    max_turns=2,
)
result.process()

When you run this, the ConsoleSpanExporter prints each span to stdout as it completes. You'll see a conversation span wrapping the entire chat, invoke_agent spans for each agent turn, and chat spans for each LLM API call -- all connected by a shared trace ID.

To send traces to a real backend instead, swap the exporter:

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

exporter = OTLPSpanExporter(endpoint="http://localhost:14317")
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))

What Gets Traced#

AG2 traces form a hierarchical tree that mirrors how agents process a conversation. Here is the typical structure for a two-agent chat:

AG2 OpenTelemetry Tracing

With a tool execution span expanded showing the span attributes, including the span type (ag2.span.type):

AG2 OpenTelemetry Tracing

Top level conversation span expanded to show token usage and cost:

AG2 OpenTelemetry Tracing

Every span includes an ag2.span.type attribute that identifies what it represents:

ag2.span.type Span name What triggers it
conversation conversation {agent} run, run_chat, initiate_chat
agent invoke_agent {agent} generate_reply, a_generate_reply
llm chat {model} Every OpenAIWrapper.create() call
tool execute_tool {func} execute_function, a_execute_function
code_execution execute_code {agent} Code-execution reply handler
human_input await_human_input {agent} get_human_input, a_get_human_input
speaker_selection speaker_selection Group chat speaker selection

Spans carry rich attributes following the OpenTelemetry GenAI Semantic Conventions: model name, provider, token usage, cost, temperature, tool call arguments and results, and more. Conversation and agent spans also capture the input/output messages so you can reconstruct the full conversation flow from your tracing backend.

The Instrumentation API#

AG2 exposes four functions in autogen.opentelemetry. Each takes a tracer_provider and patches the target in place:

instrument_agent(agent, tracer_provider=...)#

Instruments a single ConversableAgent (or any subclass) to emit spans for conversations, agent turns, tool execution, code execution, human input, and remote A2A calls.

from autogen.opentelemetry import instrument_agent

instrument_agent(my_agent, tracer_provider=tracer_provider)

instrument_llm_wrapper(tracer_provider=..., capture_messages=False)#

Instruments all LLM calls globally. Each call produces a chat {model} span with provider, model, token usage, cost, and request parameters.

from autogen.opentelemetry import instrument_llm_wrapper

instrument_llm_wrapper(tracer_provider=tracer_provider)

By default, request/response messages are not captured on LLM spans to protect sensitive data. Pass capture_messages=True for debugging.

instrument_pattern(pattern, tracer_provider=...)#

Instruments a group chat Pattern (such as AutoPattern or RoundRobinPattern). This is the recommended approach for group chats because it automatically instruments all agents in the pattern, the GroupChatManager, and speaker selection.

from autogen.opentelemetry import instrument_pattern

instrument_pattern(pattern, tracer_provider=tracer_provider)

instrument_a2a_server(server, tracer_provider=...)#

Instruments an A2aAgentServer for distributed tracing across services. Adds middleware for W3C Trace Context extraction and instruments the server's agent.

from autogen.opentelemetry import instrument_a2a_server

instrument_a2a_server(server, tracer_provider=tracer_provider)

Instrumenting a Two-Agent Chat#

For a simple two-agent conversation, instrument each agent with instrument_agent and LLM calls with instrument_llm_wrapper:

from autogen import ConversableAgent, LLMConfig, UserProxyAgent
from autogen.opentelemetry import instrument_agent, instrument_llm_wrapper

llm_config = LLMConfig({"model": "gpt-4o-mini"})

assistant = ConversableAgent(
    name="assistant",
    system_message="You are a helpful assistant.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

user_proxy = UserProxyAgent(
    name="user_proxy",
)

# Instrument LLM calls and both agents
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_agent(assistant, tracer_provider=tracer_provider)
instrument_agent(user_proxy, tracer_provider=tracer_provider)

result = user_proxy.run(
    assistant,
    message="What is the capital of France?",
    max_turns=2,
)
result.process()

This produces a trace like:

conversation user_proxy                    # the overall chat
  |-- invoke_agent assistant               # assistant generates a reply
  |     +-- chat gpt-4o-mini               # LLM call
  +-- invoke_agent user_proxy              # user_proxy
  |     +-- await_human_input user_proxy   # human enters reply
  |-- invoke_agent assistant               # assistant generates a reply
  |     +-- chat gpt-4o-mini               # LLM call

Instrumenting Group Chats#

For group chats, instrument_pattern is the one-liner that does it all. It instruments every agent in the pattern and adds speaker_selection spans that show which agents were candidates and who was selected:

from autogen import ConversableAgent, LLMConfig
from autogen.agentchat import run_group_chat
from autogen.agentchat.group.patterns import AutoPattern
from autogen.opentelemetry import instrument_llm_wrapper, instrument_pattern

llm_config = LLMConfig({"model": "gpt-4o-mini"})

researcher = ConversableAgent(
    name="researcher",
    system_message="You research topics and provide factual information.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

writer = ConversableAgent(
    name="writer",
    system_message="You write clear summaries. Say TERMINATE after being asked to summarize.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

user = ConversableAgent(
    name="user",
    human_input_mode="NEVER",
    llm_config=False,
    default_auto_reply="Great, summarize for me.",
)

pattern = AutoPattern(
    initial_agent=researcher,
    agents=[researcher, writer],
    user_agent=user,
    group_manager_args={"llm_config": llm_config},
)

# Instrument everything in one call
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_pattern(pattern, tracer_provider=tracer_provider)

result = run_group_chat(
    pattern=pattern,
    messages="Explain quantum computing in simple terms.",
    max_rounds=5,
)
result.process()

The resulting trace tree includes speaker selection:

conversation chat_manager                 # run_chat (GroupChatManager)
  |-- speaker_selection                   # auto speaker selection
  |     +-- invoke_agent speaker_sel...   # internal LLM call to pick speaker
  |           +-- chat gpt-4o-mini
  |-- invoke_agent researcher             # selected agent generates reply
  |     +-- chat gpt-4o-mini
  |-- speaker_selection
  |     +-- invoke_agent speaker_sel...
  |           +-- chat gpt-4o-mini
  +-- invoke_agent writer
        +-- chat gpt-4o-mini

Distributed Tracing with A2A#

When agents run as separate services using the A2A (Agent-to-Agent) protocol, AG2 propagates W3C Trace Context headers across HTTP calls so that client-side and server-side spans share a single trace ID.

A2A Distributed Trace Diagram

Server side -- Use instrument_a2a_server to add tracing middleware and instrument the agent:

from autogen import ConversableAgent, LLMConfig
from autogen.a2a import A2aAgentServer
from autogen.opentelemetry import instrument_a2a_server, instrument_llm_wrapper

llm_config = LLMConfig({"model": "gpt-4o-mini"})

tech_agent = ConversableAgent(
    name="tech_agent",
    system_message="You solve technical problems.",
    llm_config=llm_config,
)

server = A2aAgentServer(tech_agent, url="http://localhost:18123/")
instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_a2a_server(server, tracer_provider=tracer_provider)

app = server.build()
# Run with: uvicorn server:app --port 18123

Client side -- instrument_agent (and instrument_pattern) automatically detects A2aRemoteAgent instances and injects traceparent headers into outgoing HTTP requests. No extra setup needed:

from autogen import ConversableAgent, LLMConfig
from autogen.a2a import A2aRemoteAgent
from autogen.opentelemetry import instrument_agent, instrument_llm_wrapper

user_proxy = ConversableAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
)

tech_agent = A2aRemoteAgent(
    "http://localhost:18123/",
    name="tech_agent",
)

instrument_llm_wrapper(tracer_provider=tracer_provider)
instrument_agent(user_proxy, tracer_provider=tracer_provider)
instrument_agent(tech_agent, tracer_provider=tracer_provider)

Both sides' spans appear in the same trace, giving you a unified view across services.

Local Observability Stack#

For local development and testing, the documentation includes a ready-to-use Docker Compose configuration that runs an OpenTelemetry Collector, Grafana Tempo (trace storage), and Grafana (visualization). See the Local OpenTelemetry Setup page for the full configuration files and step-by-step instructions.

Once the stack is running, point your AG2 exporter at the collector:

exporter = OTLPSpanExporter(endpoint="http://localhost:14317")

Then open Grafana at http://localhost:3333, select the Tempo data source in the Explore view, and search by service name or trace ID to see your traces as a flame graph.

Connecting to Cloud Backends#

Because AG2 uses standard OpenTelemetry, you can export traces to any compatible backend. Typically you just change the exporter endpoint and add authentication headers:

Jaeger:

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

exporter = OTLPSpanExporter(endpoint="http://localhost:4317")

Datadog, Honeycomb, Grafana Cloud, and others:

exporter = OTLPSpanExporter(
    endpoint="https://otel.vendor.com:4317",
    headers={"api-key": "YOUR_API_KEY"},
)

Consult your vendor's documentation for the exact endpoint and authentication details.

Getting Started#

  1. Install: pip install "ag2[tracing]"
  2. Configure: Create a TracerProvider with your chosen exporter
  3. Instrument: Call instrument_llm_wrapper() and instrument_agent() (or instrument_pattern() for group chats)
  4. Run: Execute your agent workflow as usual
  5. Observe: View traces in your backend (Grafana, Jaeger, Datadog, or the console)

Additional Resources#

Conclusion#

Observability shouldn't be an afterthought for multi-agent systems. With AG2's built-in OpenTelemetry tracing, you get structured, hierarchical traces of every conversation, agent turn, LLM call, tool execution, and speaker selection -- across local and distributed deployments.

Install ag2[tracing], add a few instrumentation calls, and start seeing exactly what your agents are doing. We'd love to hear how you use it -- share your feedback on our GitHub or Discord.