Skip to content

Multi-Agent#

AG2 OpenTelemetry Tracing: Full Observability for Multi-Agent Systems

AG2 OpenTelemetry Tracing

Multi-agent systems are powerful -- but when something goes wrong, figuring out where and why is painful. Which agent made the bad decision? Was the LLM call slow, or did the tool fail? How many tokens did that group chat actually use?

AG2 now has built-in OpenTelemetry tracing that gives you full visibility into your multi-agent workflows. Every conversation, agent turn, LLM call, tool execution, and speaker selection is captured as a structured span -- connected by a shared trace ID and exportable to any OpenTelemetry-compatible backend.

Key highlights:

  • Four simple API functions to instrument agents, LLM calls, group chats, and A2A servers
  • Hierarchical traces that mirror how agents process conversations
  • Distributed tracing across services using W3C Trace Context propagation
  • Works with any backend -- Jaeger, Grafana Tempo, Datadog, Honeycomb, and more
  • Follows OpenTelemetry GenAI Semantic Conventions for standard interoperability

What is OpenTelemetry Tracing?

OpenTelemetry is the industry-standard framework for observability in distributed systems. At its core, tracing captures the path of a request through your system as a tree of spans -- each span representing a unit of work (an LLM call, a tool execution, an agent turn) with timing, attributes, and parent-child relationships.

A few key concepts:

  • Trace -- A complete end-to-end record of a request, represented as a tree of spans sharing a single trace ID.
  • Span -- A single unit of work within a trace (e.g. an LLM call or a tool execution). Each span has a name, start/end timestamps, a parent span (forming the tree), and key-value attributes that carry metadata like model name, token counts, or tool arguments.
  • Span Kind -- An OpenTelemetry classification of how a span relates to the outside world. AG2 uses CLIENT for LLM spans (outgoing calls to an external API) and INTERNAL (the default) for all other spans like agent turns, tool executions, and conversations.
  • Span Type (ag2.span.type) -- AG2's own attribute that classifies what each span represents: conversation, agent, llm, tool, code_execution, human_input, or speaker_selection. This lets you filter and group spans by their role in the agent workflow.

For multi-agent systems, tracing answers questions like:

  • What was the full sequence of agent interactions in this conversation?
  • How long did each LLM call take, and how many tokens did it consume?
  • Which tool was called, with what arguments, and what did it return?
  • In a group chat, why was a particular agent selected to speak?
  • Across multiple services, how did a request flow from client to remote agent?

AG2's tracing follows the OpenTelemetry GenAI Semantic Conventions so your agent traces are structured in a considered, agentic, way.