# AG2 Beta - Full Documentation

> AG2 Beta (`autogen.beta`) is an async, protocol-driven Python framework for building AI agents - covering agents, tools, multi-agent networks, structured output, memory, and evaluation. This file indexes the Beta documentation for LLMs and coding assistants.

Build with `autogen.beta` only. The classic `autogen` API (`ConversableAgent`, `initiate_chat`, `GroupChat`) is retired at v1.0 - do not use it. For ready-made setup, install the AG2 Skills with `npx skills add ag2ai/ag2-skills`.

---

# AG2 Beta

Source: https://docs.ag2.ai/latest/docs/beta/motivation/

## Why did we create AG2 Beta?

The original **AutoGen** project released with its first public preview in September 2023, and **AG2** later diverged from that codebase in November 2024 to continue building on its core ideas.

**AutoGen** was one of the earliest frameworks for building AI agents and orchestrating agent-to-agent collaboration. That early vision proved valuable: it enabled real-world systems, informed the design of many tools, and helped shape the agent ecosystem.

Since then, the agent landscape has changed significantly. Over time, the community has established better practices, common protocols, and new interoperability standards. Capabilities that were once experimental are now becoming part of the expected foundation for agent platforms.

Examples include:

- [Model Context Protocol (MCP)](https://www.anthropic.com/news/model-context-protocol), introduced in November 2024
- [Agent2Agent (A2A)](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/), introduced in April 2025
- [AG-UI](https://docs.ag-ui.com/introduction), introduced in May 2025

We have increasingly found that the original architecture inherited from **AutoGen** challenged the adoption of new ideas. Shipping modern capabilities inside the original design often requires introducing complexity, unnecessary migration effort, or compatibility compromises.

Not every part of the ecosystem is standardized yet, but the direction is clear. AI agents are no longer an experiment; they are standard application infrastructure.

**AG2 Beta** is our way to move forward with a future-focused foundation while applying the lessons we learned from building and operating hundreds of agent systems based on **AG2**.

## What is AG2 Beta?

**AG2 Beta** is a new development track inside **AG2** where we build capabilities that would be difficult or impractical to introduce on the original framework architecture.

We expect it to become the primary foundation for future AG2 agent development and production-ready multi-agent systems. Therefore, it will become V1.0 of AG2.

## Why use AG2 Beta?

**AG2 Beta** is built around a small, predictable, core and a set of opt-in primitives you compose to fit your application. Here is what you get out of the box.

### 1. A clean, async-first agent API

Two methods cover the conversational surface - `agent.ask(...)` to start a turn and `reply.ask(...)` to continue one. The agent loop, tool execution, and LLM calls are async throughout, with streaming enabled by default on supported providers.

### 2. A composable harness for capable, long-running, agents

AG2's harness layers on powerful primitives to the base Agent - **assembly policies** for context shaping, a **knowledge store** for persistent memory, **sub-task delegation** with isolated streams, and **middleware** for retries, logging, token limits, and history management. Build up the agent you need with the harness doing the heavy lifting for you.

The Agent's ability to fan out work, in parallel, to a team of specialist agents (seen as tools), or as subtasks, provides natural orchestration within each agent.

### 3. Production and Scalability

**Human-in-the-loop** hooks, **structured output** (static, callable, prompted, and transformable), **OpenTelemetry** tracing, **persistent backends** for history and streams (e.g. Redis), and a **testing utility** that mocks LLMs and tool calls without hitting the network - the primitives you need to take an Agent from prototype to production.

The runtime is async end-to-end, so a single process can drive many concurrent agents, tool calls, and provider streams without blocking, and sub-tasks fan out in parallel via `asyncio.gather`. State is externalised behind protocols - `History`, `Storage`, and `Stream` can be backed by Redis, a database, or anything you build - so agents stay effectively stateless and horizontal scaling is straightforward. Cross-cutting concerns like retries, rate limits, token budgets, and history compaction are middleware you compose onto an Agent.

### 4. UI and external integration

Every event in the agent loop - model requests and responses, tool calls and results, human-input requests, observer alerts - flows through an event stream. Streams, with natural filtering capabilities, can power UIs, logging, metrics, or approvals without touching the agent itself.

The stream is bidirectional: **AG-UI** renders model output and tool calls in real time while user responses come back as `HumanMessage` events, and persistent backends like Redis let separate processes - a web frontend and a background worker - share the same live conversation.

### 5. Tools, toolkits, and built-in tools

Define tools with a `@tool` decorator on plain functions. Use **type hints**, **dependency injection** (`Context`, `Inject`, `Variable`), and **toolkits** to organize related capabilities. Wire in **built-in tools** (web search, code execution, shell, memory) or expose any agent as a tool with `Agent.as_tool(...)`.

### 6. One configuration model across providers

A single, type-safe, interface spans **OpenAI**, **OpenAI Responses**, **Anthropic**, **Gemini**, **Vertex AI**, **Ollama**, and **DashScope**. Switching providers is a config change, not a rewrite, and structured output, multimodality (images, audio, video), and built-in tools work consistently across them.

## Compatibility with AG2

We value all the users and contributors who have made AG2 what it is today and want to bring you along on this journey.

Where it is practical and beneficial, we aim to preserve compatibility with existing AG2 workflows.

From day one, **AG2 Beta** agents can participate in established **AG2** multi-agent interaction patterns. This makes it possible to adopt **Beta** agents gradually within existing systems instead of rewriting everything at once.

## How do I try it out?

The **AG2 Beta** currently resides in the AG2 repository alongside the current AG2 code.

When you `pip install ag2`, this will include the current **AG2 Beta** release under the `autogen.beta` module.

If you want the most up-to-date version, use the `main` branch. To see current work in progress, view the [GitHub repository](https://github.com/ag2ai/ag2/issues?q=state%3Aopen%20label%3Abeta) for PRs with the `beta` label.

!!! tip "Using an AI coding assistant?"
    If you build with Claude Code, Cursor, Copilot, or another AI coding assistant, see [Coding with AI Assistants](coding_with_ai.md) to set it up with AG2 Beta skills and project rules so it writes against the current `autogen.beta` API.

See the following pages for walkthroughs of the new **AG2 Beta** API.

## Current Focus Areas

**AG2 Beta** is actively focused on:

- improving the single-agent developer experience
- providing stronger context and memory management primitives
- simplifying integration with real applications, including Text UI, web, ambient, and background runtimes
- enabling new multi-agent coordination patterns that are not feasible in the current AG2 architecture
- supporting emerging standards and protocols across the AI agent ecosystem

We are building **AG2 Beta** to make agent development simpler, more modern, and easier to integrate into production-grade applications. We would love your feedback as the API evolves ([Discord](https://discord.com/invite/pAbnFJrkgZ)).

---

# Agent Communication

Source: https://docs.ag2.ai/latest/docs/beta/agents/

Agents are the central primitive in **AG2 Beta**. They maintain state, interact with models, execute tools, and handle user interactions through a clean, conversation-focused API.

## Core Communication Primitives

The API is built around two simple methods:

* `Agent.ask(...)` initiates a new turn and returns an `AgentReply` object.
* `AgentReply.ask(...)` continues an existing conversation, preserving its context and history.

The final result of any turn is safely stored in `reply.response`; use `reply.body` for the text.

## Basic Communication Example

Here's how easily you can start and continue a conversation:

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig

agent = Agent(
    "assistant",
    prompt="You are a helpful assistant.",
    config=OpenAIConfig("gpt-4o-mini"),
)

# Start a new conversation
reply = await agent.ask("Give me one sentence about AG2 beta.")
print(reply.body)

# Continue the exact same conversation context
next_turn = await reply.ask("Now make it shorter.")
print(next_turn.body)

...
```

## Empowering Agents with Tools

Agents can seamlessly use Python functions as tools. When you provide a list of `@tool`-decorated functions to an agent, it automatically manages the entire execution lifecycle (model requests to execution and returning results).

```python
from autogen.beta import Agent, Context, tool
from autogen.beta.config import OpenAIConfig

@tool
async def echo(text: str) -> str:
    """Useful for repeating exactly what was given."""
    return f"echo: {text}"

agent = Agent(
    "assistant",
    prompt="Use tools when helpful.",
    config=OpenAIConfig("gpt-4o-mini"),
    tools=[echo],
)

reply = await agent.ask("Call the echo tool with 'hello'.")
print(reply.body)
```

## Adding Human-in-the-Loop (HITL)

Sometimes an agent needs human guidance. You can configure an agent to handle `HumanInputRequest` events. This is especially effective inside tools where you can get confirmation before taking a sensitive action.

```python
from autogen.beta import Agent, Context, tool
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import HumanInputRequest, HumanMessage

@tool
async def ask_human(context: Context) -> str:
    # Pauses agent execution to await human input
    answer = await context.input("Please provide confirmation:")
    return f"Human said: {answer}"

# Define how your application handles the input request
def hitl_hook(event: HumanInputRequest) -> HumanMessage:
    # Here you could block and wait for UI/CLI input.
    # We return a static response for demonstration.
    return HumanMessage(content="confirmed")

agent = Agent(
    "assistant",
    prompt="Use ask_human when needed.",
    config=OpenAIConfig("gpt-4o-mini"),
    tools=[ask_human],
    hitl_hook=hitl_hook,
)

reply = await agent.ask("Request confirmation through the tool.")
print(reply.body)
```

## Observing Agent Actions

Need to know exactly what the agent is doing? Pass a `MemoryStream` when calling `ask()`. You can attach event subscribers to log actions, save history to a database, or update a user interface in real time.

```python
from autogen.beta import Agent, Context, MemoryStream
from autogen.beta.events import BaseEvent, ModelResponse, ToolCallEvent
from autogen.beta.config import OpenAIConfig

stream = MemoryStream()

# Listen to everything
@stream.subscribe()
async def on_any_event(event: BaseEvent) -> None:
    print(f"Event occurred: {event}")

# Only listen to specific events
@stream.where(ToolCallEvent).subscribe()
async def on_tool_call(event: ToolCallEvent) -> None:
    print("Agent requested tool:", event.name)

agent = Agent(
    "assistant",
    prompt="You are a helpful assistant.",
    config=OpenAIConfig("gpt-4o-mini"),
)

# Stream captures all events during the ask
reply = await agent.ask(
    "Give me one sentence about AG2 beta.",
    stream=stream
)
```

!!! tip "Building with an AI coding assistant?"
    See [Coding with AI Assistants](coding_with_ai.md) to set up Claude Code, Cursor, Copilot, or another assistant with AG2 Beta skills and project rules so it writes against the current `autogen.beta` API.

---

# The Agent Harness

Source: https://docs.ag2.ai/latest/docs/beta/agent_harness/

A bare `Agent` is just a model loop. The **harness** is the set of opt-in primitives you compose onto it to give it richer capabilities - context assembly, persistent knowledge, sub-task spawning, and the supporting middleware they wire in.

This page is the configuration reference for those primitives. For the conversational entry point (`agent.ask()`, tools, HITL, observing events), see [Agent Communication](agents.md).

## Constructor

```python
Agent(
    name: str,
    prompt: str | Callable | Iterable = (),
    *,
    config: ModelConfig | None = None,
    tools: Iterable = (),
    middleware: Iterable = (),
    observers: Iterable = (),
    dependencies: dict | None = None,
    variables: dict | None = None,
    response_schema: ResponseProto | type | None = None,
    hitl_hook: HumanHook | None = None,
    plugins: Iterable[Plugin] = (),
    assembly: Iterable[AssemblyPolicy] = (),
    knowledge: KnowledgeConfig | None = None,
    tasks: TaskConfig | Literal[False] | None = None,
)
```

The loop-related parameters (`config`, `tools`, `middleware`, `observers`, `prompt`, ...) are covered in [Agent Communication](agents.md) and the parameter-specific guides. The harness hooks are `assembly=`, `knowledge=`, and `tasks=`, each documented below.

## `assembly=` - context policies

A list of [`AssemblyPolicy`](advanced/assembly.md) instances. When non-empty, the Agent wires an internal `AssemblerMiddleware` at the outermost position of the middleware chain so your policies transform `(prompts, events)` before every LLM call.

```python
from autogen.beta import Agent
from autogen.beta.policies import (
    AlertPolicy,
    SlidingWindowPolicy,
    WorkingMemoryPolicy,
)

agent = Agent(
    "assistant",
    config=config,
    assembly=[
        WorkingMemoryPolicy(),                # inject /memory/working.md
        AlertPolicy(),                         # deliver ObserverAlerts, halt on FATAL
        SlidingWindowPolicy(max_events=50),   # cap history footprint
    ],
)
```

Order matters - see the [ordering rule in the assembly doc](advanced/assembly.md#ordering-matters). `AssemblerMiddleware.validate_order()` will flag known problematic compositions.

## `knowledge=` - KnowledgeConfig

Groups everything that involves the [`KnowledgeStore`](advanced/knowledge_store.md): the store itself, optional bootstrap, and optional compaction + aggregation strategies.

```python
from dataclasses import dataclass

@dataclass
class KnowledgeConfig:
    store: KnowledgeStore
    expose_tool: bool = True
    write_event_log: bool = True
    compact: CompactStrategy | None = None
    compact_trigger: CompactTrigger | None = None
    aggregate: AggregateStrategy | None = None
    aggregate_trigger: AggregateTrigger | None = None
    bootstrap: StoreBootstrap | None = None
```

| Field | What it does |
|---|---|
| `store` | Registered in `context.dependencies[KnowledgeStore]` so policies like `WorkingMemoryPolicy` / `EpisodicMemoryPolicy` can read it. |
| `expose_tool` | When `True` (default), the agent gets an auto-injected `knowledge` action-group tool that lets the LLM call `read` / `write` / `list` / `delete` on the store. Set to `False` when the store should be policy-only - the model never sees the tool, and the bootstrap SKILL.md text drops its "use the `knowledge` tool" sentence. |
| `write_event_log` | When `True` (default), the agent persists its stream history to `/log/{stream_id}.jsonl` at the end of each `ask()`. Set to `False` to keep the store free of stream logs (e.g. when the store is purely user-facing memory). |
| `compact` / `compact_trigger` | Wires a compaction middleware that fires [`compact()`](advanced/compaction.md) between turns when the trigger thresholds are exceeded. |
| `aggregate` / `aggregate_trigger` | Wires an aggregation middleware that fires [`aggregate()`](advanced/aggregation.md) on the configured cadence. Failures emit `AggregationFailed` on the stream - see [Aggregation > Wiring onto an Agent](advanced/aggregation.md#wiring-onto-an-agent) for the full lifecycle event triple. |
| `bootstrap` | Runs once on first use to seed the store. `None` falls back to `DefaultBootstrap(mention_tool=expose_tool)`, so the generated SKILL.md text matches whether the LLM can actually call the `knowledge` tool. |

```python
from autogen.beta import Agent, KnowledgeConfig
from autogen.beta.aggregate import AggregateTrigger, ConversationSummaryAggregate
from autogen.beta.compact import CompactTrigger, TailWindowCompact
from autogen.beta.knowledge import DiskKnowledgeStore
from autogen.beta.policies import WorkingMemoryPolicy
from pathlib import Path

store = DiskKnowledgeStore(Path("./knowledge"))

agent = Agent(
    "assistant",
    config=main_config,
    knowledge=KnowledgeConfig(
        store=store,
        compact=TailWindowCompact(target=100),
        compact_trigger=CompactTrigger(max_events=200),
        aggregate=ConversationSummaryAggregate(config=summarizer_config),
        aggregate_trigger=AggregateTrigger(every_n_turns=10, on_end=True),
    ),
    assembly=[WorkingMemoryPolicy()],
)
```

The compaction and aggregation middleware are opt-in per field: passing `compact=` without `compact_trigger=` still works (a default `CompactTrigger()` with all thresholds disabled is used). Omit a strategy entirely and the corresponding middleware is not wired.

## `tasks=` - TaskConfig

Sub-task delegation is **off by default** - a bare Agent has no `run_subtask` / `run_subtasks` tools. Pass `tasks=TaskConfig(...)` to opt in, and the Agent will auto-inject the pair of sub-task tools that let the LLM spawn isolated child Agents to handle self-contained work. `TaskConfig` configures how those children are built.

```python
from dataclasses import dataclass

@dataclass
class TaskConfig:
    config: ModelConfig | None = None
    prompt: str = (
        "You are a task agent. Complete the assigned task thoroughly and "
        "concisely. Return only the result."
    )
    include_tools: Iterable[str] | None = None
    exclude_tools: Iterable[str] = ()
    extra_tools: Iterable[Callable | Tool] = ()
```

| Field | What it does |
|---|---|
| `config` | The `ModelConfig` used for sub-task Agents. Falls back to the parent Agent's `config`. |
| `prompt` | Default system prompt for sub-task Agents. |
| `include_tools` | Allowlist of parent-tool names to inherit. `None` means "inherit all". |
| `exclude_tools` | Blocklist of parent-tool names to drop. Applied after `include_tools`. |
| `extra_tools` | Additional tools given to sub-tasks that the parent does not have. |

By default a sub-task Agent inherits **all** of the parent's user-supplied tools. Sub-tasks are themselves constructed with `tasks=False` (the Agent default), so they have no `run_subtask` / `run_subtasks` tools - recursive delegation is structurally impossible and no depth limit is needed.

```python
from autogen.beta import Agent, TaskConfig

agent = Agent(
    "orchestrator",
    config=main_config,
    tools=[search, fetch_url, summarize],
    tasks=TaskConfig(
        config=worker_config,                 # cheaper model for sub-tasks
        prompt="You are a focused worker; one step only.",
        include_tools=["search", "fetch_url"],  # don't expose summarize to children
    ),
)
```

### `tasks=False` - the default

`tasks=False` is the Agent default, so a bare Agent never spawns children. You only need to pass it explicitly to be self-documenting; otherwise just omit `tasks=` entirely.

```python
focused = Agent(
    "summarizer",
    prompt="Summarise the input. Do not delegate.",
    config=main_config,
    # tasks=False is the default - no run_subtask / run_subtasks tools.
)
```

## `run_subtask` / `run_subtasks` - auto-injected tools

When you opt in via `tasks=TaskConfig(...)`, the Agent exposes two tools to the LLM:

- `run_subtask(task: str)` - spawn one sub-task Agent. Useful when the LLM has a single self-contained piece of work to delegate.
- `run_subtasks(tasks: list[str], parallel: bool = True)` - spawn multiple sub-tasks in one tool call. Defaults to running them concurrently with `asyncio.gather`; pass `parallel=False` only when later tasks depend on earlier results.

The LLM is told (via the tool descriptions) that it can call `run_subtask` multiple times in parallel within a single response, and that `run_subtasks` is the deliberate fan-out form. Each child gets a fresh `MemoryStream` and the parent's tools (filtered by `TaskConfig`).

For a more explicit, named delegate where the parent LLM sees a tool like `task_researcher` instead of generic `run_subtask`, use [`Agent.as_tool()`](#agent-as-tool). The two patterns can coexist: a coordinator can have both auto-injected sub-tasks and a named `task_researcher` tool.

See [Task Delegation](task_delegation.md) for the full sub-task delegation guide - context flow and custom streams for self-delegation via `as_tool()`.

## Agent.as_tool()

Expose any Agent as a `FunctionTool` so another Agent can invoke it like any other tool:

```python
child = Agent(
    "researcher",
    prompt="Answer the objective concisely.",
    config=main_config,
)

parent = Agent(
    "lead",
    config=main_config,
    tools=[child.as_tool(description="Delegate fact-finding to a researcher.")],
)

reply = await parent.ask("Find out where Melbourne is.")
```

`as_tool()` returns a `FunctionTool` named `task_{child.name}` that accepts an `objective` parameter and forwards it into the child's stream. See [Task Delegation](task_delegation.md) for sub-task streams, depth limiting, and stream factories.

## Turn lifecycle

Each `await agent.ask(...)` runs through the middleware chain in this order (outermost -> innermost):

```
1. AssemblerMiddleware              (if assembly=[...])
2. _HaltCheckMiddleware             (if assembly=[...] - watches for HaltEvent)
3. _CompactionMiddleware            (if knowledge.compact configured)
4. _AggregationMiddleware           (if knowledge.aggregate configured)
5. User-provided middleware         (retry, rate-limit, logging, ...)
6. LLM client                       (innermost)
```

The internal harness middleware (`_AssemblerMiddleware`, `_HaltCheckMiddleware`, `_CompactionMiddleware`, `_AggregationMiddleware`) are assembled conditionally - you only pay for what you turn on.

Lifecycle events emitted during a turn include `ObserverStarted` / `ObserverCompleted`, `CompactionCompleted`, `AggregationCompleted`, and `HaltEvent`. Subscribe to any of them via an [Observer](advanced/observers.md) or a stream subscriber.

---

# Resuming a Turn

Source: https://docs.ag2.ai/latest/docs/beta/resume/

`agent.ask(...)` starts a turn from a new message on the agent's current stream. `agent.resume(...)` does the same job from a **recorded trajectory**: you hand it a list of past events, and it re-enters the agent loop driven by the **last** event in that list. The trigger can be *any* event (a fresh user message, a tool result, a human reply) so `resume` is the general way to rebuild a conversation from stored state and carry it forward.

!!! tip "Most multi-turn chats do **not** need `resume`"
    To continue a conversation within a running process, just keep using the reply: call `agent.ask(...)` once, then `reply.ask(...)` for each follow-up. That is the normal continuation path - `resume` is **not** a step you run after `ask`.

    ```python linenums="1"
    reply = await agent.ask("Plan a trip to Kyoto.")
    reply = await reply.ask("Make it five days.")   # continue - no resume needed
    ```

    Reach for `resume` only when you **can't** hold onto that reply: the process ended, another worker picks the turn up, or you need to drive the loop from a non-message event such as a tool result. See [How it relates to `ask` / `reply.ask`](#how-resume-works) below.

## How `resume` works

`resume` takes the full trajectory as positional `events`:

```python
await agent.resume(*events, ...)
```

The list is split in two:

- **All events except the last** seed the stream's history - the conversation state the model will see.
- **The last event** is the **trigger**: the event that drives the next LLM call. It can be any [event](advanced/stream.md) - typically a `ModelRequest` (a new user message) or a `ToolResultsEvent` (a tool result the model should react to).

```python
*history, trigger = events
# history  -> replaces the stream's history (the prefix the model sees)
# trigger  -> drives the next LLM call (any BaseEvent)
```

Everything after that is identical to `ask`: the agent calls the model, may issue tool calls, and returns an `AgentReply`. The `resume` signature mirrors `ask` exactly - `stream`, `dependencies`, `variables`, `prompt`, `config`, `tools`, `middleware`, `observers`, `response_schema`, and `hitl_hook` all behave the same way.

!!! note "How it relates to `ask` / `reply.ask`"
    [`reply.ask(...)`](agents.md) continues the **same live stream** the turn ran on - you keep the conversation going as long as you still hold the `AgentReply` object in the running process. That stream may itself be durable (a `RedisStream`, say); what `reply.ask` needs is the in-process handle, not where the history happens to be stored. `resume` is for when you no longer have that handle - a process restart, a worker on another machine, a turn rebuilt from a store - so you supply the events yourself and drive the next one.

## Continue a stored conversation

The most common use is multi-turn that outlives the process: persist a conversation, load it back later, and continue with a new user message as the trigger.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import ModelRequest, TextInput

agent = Agent(
    "assistant",
    prompt="You are a helpful travel assistant.",
    config=OpenAIConfig("gpt-4o-mini"),
)

# A conversation you stored earlier and just loaded back from your database.
past_events = load_conversation("thread-42")

# Drive the next turn with a fresh user message as the trigger.
trigger = ModelRequest([TextInput("And what about getting there by train?")])
reply = await agent.resume(*past_events, trigger)
print(reply.body)
```

This is `reply.ask("...")`, except you drive it from events you reload yourself rather than from a live `AgentReply` handle held in the running process.

## Resume from a tool result

A turn can also stop **mid-loop** - the model asked to run a tool whose result is produced elsewhere: a webhook, a queue worker, a long-running job, or a human approving a request. You record the tool call, run the work separately, then hand the result back as the trigger. The model reacts to the result without the tool being re-executed.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import (
    ModelMessage,
    ModelRequest,
    ModelResponse,
    TextInput,
    ToolCallEvent,
    ToolCallsEvent,
    ToolResultEvent,
    ToolResultsEvent,
)

agent = Agent("support", prompt="Answer using the tool result.", config=OpenAIConfig("gpt-4o-mini"))

# The trajectory recorded earlier: the user asked, and the model responded
# by requesting a tool call (whose result is not yet known).
call = ToolCallEvent(name="lookup_order", arguments='{"id": "A-1001"}', id="call-1")
history = [
    ModelRequest([TextInput("Where is order A-1001?")]),
    ModelResponse(message=ModelMessage(""), tool_calls=ToolCallsEvent([call])),
]

# The tool result, produced out of band, becomes the trigger.
trigger = ToolResultsEvent([ToolResultEvent.from_call(call, "Shipped, arriving Tuesday.")])

# Re-enter the loop: the model sees the history + result and writes the answer.
reply = await agent.resume(*history, trigger)
print(reply.body)  # -> grounded answer using "Shipped, arriving Tuesday."
```

`ToolResultEvent.from_call(call, result)` pairs the result with the original call so the model can match it to its request. Because `resume` re-enters the **live** loop, the model is free to react by calling more tools - those continuation calls execute normally; only the trigger event is replayed.

## Capture, persist, resume

A trajectory is just a list of events, so you can store it anywhere and pass it back to `resume` later - even from another process. Pull the events from a live stream with `await stream.history.get_events()`, or build them yourself, as long as the prefix ends at the point you want to continue from.

```python
# --- First process: capture the trajectory and persist it ---
events = list(await reply.context.stream.history.get_events())
save_to_store(stream_id="thread-42", events=events)

# --- Later / another process: load it back and drive the next event ---
events = load_from_store(stream_id="thread-42")
reply = await agent.resume(*events, ModelRequest([TextInput("Carry on.")]))
print(reply.body)
```

!!! note "Reading back the event log"
    `resume` reads only the events you pass in - it does not load history from a store on its own. The [event log](advanced/knowledge_store.md) written by `KnowledgeConfig(write_event_log=True)` is one place these trajectories can come from: it persists each turn to `/log/{stream_id}.jsonl`, and `EventLogWriter(store).load(stream_id)` reads it back as typed events ready to pass straight to `resume`.

    ```python linenums="1"
    from autogen.beta.knowledge import EventLogWriter

    events = await EventLogWriter(store).load(stream_id)
    reply = await agent.resume(*events, ModelRequest([TextInput("Carry on.")]))
    ```

## The stream is replaced, not appended

`resume` **replaces** the target stream's history with the seeded prefix. If you omit `stream`, a fresh `MemoryStream` is created. If you pass an existing stream, any history it already held is discarded in favour of the trajectory you provide.

```python
from autogen.beta.stream import MemoryStream

stream = MemoryStream()
await stream.history.replace([ModelRequest([TextInput("stale prior turn")])])

# The stale turn is dropped; only the events you pass remain.
reply = await agent.resume(*history, trigger, stream=stream)
```

This makes the recorded trajectory the single source of truth for the resumed turn - the events you pass are exactly what the model sees.

!!! warning "Durable-backed streams are overwritten in place"
    The replacement is applied to the stream's **storage**, not just an in-memory copy. For a persistent stream such as `RedisStream`, seeding the prefix overwrites the durable history stored under that stream id - the backing store clears the key and rewrites it. Resume into a fresh stream, or use a new stream id, when you need to keep the original conversation's stored history intact.

## Caveats - resuming isn't always possible

`resume` replays provider-native events back to the model, and some providers attach opaque, **required** metadata to those events. Gemini 3.x, for example, binds a *thought signature* to each function call and rejects a replayed call that arrives without it:

```
400 INVALID_ARGUMENT - Function call is missing a thought_signature in functionCall parts
```

OpenAI reasoning items and Anthropic thinking signatures carry similar provider-specific state. As long as you resume from the **original** events - the ones the agent emitted, written verbatim by the [event log](#capture-persist-resume) - this metadata travels with them and `resume` round-trips cleanly. The risk appears when a trajectory is rebuilt from a lossy form: a hand-rolled `{name, args, result}` record, or a trace store that keeps only the fields it understands. The required metadata is silently dropped, and the provider rejects the resumed turn.

!!! warning
    Resuming is not guaranteed for every provider and every trajectory. Preserve the events exactly as the agent emitted them; do not assume a trajectory reconstructed from a reduced or normalized form can be resumed.

## Related

- [Agent Communication](agents.md) - `ask` / `reply.ask`, the in-memory entry points.
- [Human in the Loop](context/human_in_the_loop.md) - pausing for human input, a common reason a turn is continued out of band.
- [Stream & Events](advanced/stream.md) - the event types that make up a trajectory and how to subscribe to them.

---

# Structured Output

Source: https://docs.ag2.ai/latest/docs/beta/structured_output/

# Structured Output

Structured output constrains the model's final message so you can parse it into a **typed Python value**-a number, a `dataclass`, a Pydantic model, or the result of your own validator-instead of treating the reply as an opaque string.

## What you get on each turn

Every turn returns an [`AgentReply`](agents.md). Two surfaces matter for structured output:

| Surface | What it is |
|--------|------------|
| `reply.body` | Raw text from the model for that turn (a `str` or `None`). |
| `await reply.content()` | Parsed value according to the **response schema** in effect for that turn. |

If the model's output cannot be parsed or fails validation, `content()` raises an error from the underlying parser (for example Pydantic's validation errors). You can pass `retries` to automatically [re-ask the model](#validation-retries) on failure.

With the default **OpenAI** client, when the schema exposes a JSON Schema to the API, the client sends a structured `response_format` so the model is guided to emit JSON matching that schema. [`PromptedSchema`](#promptedschema-models-without-native-structured-output) is the escape hatch when the provider does not support that mechanism: the schema is injected into the system prompt instead, and `content()` still runs the same way afterward.

## When to use which tool

- **Pass a plain type** (`int`, `YourModel`, ...) when the default schema name and description are enough.
- Use **`ResponseSchema`** when you want a clear **`name`** and **`description`** in the API payload so the model knows the role of the structured payload.
- Use **`@response_schema`** when you need **custom parsing**, normalization, or extra steps after JSON is read.
- Use **`PromptedSchema`** when your **model or endpoint does not support** native structured output.

## Quick start

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig

agent = Agent(
    "assistant",
    prompt="You are a helpful assistant. Answer concisely.",
    config=OpenAIConfig("gpt-4o-mini"),
    response_schema=int,
)

reply = await agent.ask("How many bits are in a byte?")
print(reply.body)       # e.g. '8' - raw model text
result = await reply.content()
print(result)              # 8 - Python int
```

---

## Real-world examples

The following patterns mirror how structured output is used in applications: triage, extraction, and safe normalization.

### Classify a support ticket (Pydantic)

Route incoming text into fields your helpdesk or CRM already understands:

```python
from typing import Annotated
from pydantic import BaseModel, Field

from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig

class TicketTriage(BaseModel):
    """Structured triage for a single support message."""

    category: Annotated[str, Field(description="e.g. billing, bug, account_access")]
    urgency: Annotated[str, Field(description="low, medium, or high")]
    summary_one_line: Annotated[str, Field(description="Max 120 characters", max_length=120)]

agent = Agent(
    "triage",
    prompt="You triage customer support messages. Be conservative with urgency.",
    config=OpenAIConfig("gpt-4o-mini"),
    response_schema=TicketTriage,
)

body = (
    "I was charged twice for Pro last week and I still can't export my reports. "
    "This is blocking our quarter close."
)
reply = await agent.ask(f"Classify this ticket:\n\n{body}")
triage = await reply.content()
# triage.category, triage.urgency, triage.summary_one_line -> use in routing rules
```

### Extract a delivery ETA window (dataclass)

Turn natural language into something your scheduling layer can consume:

```python
from dataclasses import dataclass

from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig

@dataclass
class DeliveryWindow:
    day_label: str
    start_hour_local: int
    end_hour_local: int
    timezone: str

agent = Agent(
    "scheduler",
    prompt="Extract delivery windows as structured data only; use 24h integers for hours.",
    config=OpenAIConfig("gpt-4o-mini"),
    response_schema=DeliveryWindow,
)

reply = await agent.ask(
    "Customer said: drop off Tuesday between 2 and 5pm Pacific, before dinner."
)
window = await reply.content()
```

### Score a review on a fixed scale (primitive + clear prompt)

Use a primitive schema when the payload is a single JSON value and your prompt defines the scale:

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig

agent = Agent(
    "reviews",
    prompt="You output a single integer 1-5 for overall satisfaction. No prose.",
    config=OpenAIConfig("gpt-4o-mini"),
    response_schema=int,
)

reply = await agent.ask(
    "Rate this review: 'Shipped fast, packaging was torn, product works great.'"
)
stars = await reply.content()
```

---

## Supported schema types

You can pass any type the stack can turn into a JSON Schema and parse back: primitives, `dataclass`, Pydantic models, unions, and more. Plain types are wrapped in an internal `ResponseSchema` instance for validation and API schema generation.

### Primitives

```python
agent = Agent("assistant", config=config, response_schema=int)

reply = await agent.ask("What is 2 + 2?")
result = await reply.content()
# 4 - int
```

### Dataclasses

```python
from dataclasses import dataclass

@dataclass
class City:
    name: str
    population: int

agent = Agent("assistant", config=config, response_schema=City)

reply = await agent.ask("Give the city name and approximate population for Kyoto.")
result = await reply.content()
```

### Pydantic models

```python
from pydantic import BaseModel

class Sentiment(BaseModel):
    label: str
    score: float

agent = Agent("assistant", config=config, response_schema=Sentiment)

reply = await agent.ask("Analyze: 'I love this product!'")
result = await reply.content()
```

### Unions

Use a union (`int | str`) or a tuple of types (`(int, str)`) when the model must return **one of several JSON shapes**.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig

config = OpenAIConfig("gpt-4o-mini")

# int | str - e.g. a count, or "unknown" when the text does not say
agent = Agent(
    "extractor",
    prompt='Reply with JSON only: either an integer count or the string "unknown".',
    config=config,
    response_schema=int | str,
)

reply = await agent.ask("How many seats does the venue mention? (no number in text)")
result = await reply.content()
# result is int or str, depending on the model output
```

---

## `ResponseSchema` (named payloads)

For clearer API metadata, construct a `ResponseSchema` with an explicit **`name`** and **`description`**:

```python
from autogen.beta import Agent, ResponseSchema

schema = ResponseSchema(
    int | str,
    name="ByteWidth",
    description="The number of bits in one byte.",
)

agent = Agent("assistant", config=config, response_schema=schema)
```

Those fields are attached to the structured-output payload where the provider supports it, which helps the model treat the JSON as a named contract rather than a generic blob.

---

## Custom validation with `@response_schema`

Use the decorator when you need **logic beyond** "parse this JSON into a type": clamping, regex cleanup, decoding wrapped JSON, or combining fields.

### Sync validator: clamp a numeric rating

```python
from autogen.beta import Agent, response_schema

@response_schema
def parse_rating(content: str) -> int:
    """Parse a rating and clamp it to 1-5."""
    return max(1, min(5, int(content)))

agent = Agent("assistant", config=config, response_schema=parse_rating)

reply = await agent.ask("Rate this movie from 1 to 5.")
result = await reply.content()
```

### Async validator: enrich after JSON parse

```python
import json

@response_schema
async def fetch_and_validate(content: str) -> dict:
    """Validate and enrich the model's JSON response."""
    data = json.loads(content)
    data["validated"] = True
    return data
```

### Validation rules for `@response_schema`

The framework introspects your function with **fast_depends** (the same dependency-injection path as `@tool` callables). Parameters satisfied by injection - [`Variables`](context/variables.md), [`Depends`](depends.md), [`Inject`](context/inject.md), Context and similar-are **not** part of the JSON the model must produce. Every other parameter controls how the completion text is decoded and whether a JSON Schema is attached for native structured output.

#### One non-injected parameter

| Annotated type | What the model's message must look like | JSON Schema sent to the API? |
|----------------|----------------------------------------|------------------------------|
| `str` | Any text. The **raw** completion string is passed in; nothing is parsed as JSON for you. | **No** - there is no derived schema, so clients such as OpenAI do not get a `response_format` schema from this callable alone. |
| Primitive or union (`int`, `float`, `bool`, `int \| str`, ...) | By default (**`embed=True`**), a JSON object `{"data": <value>}`. The framework unwraps it before calling your function. With `embed=False`, a bare JSON value. | **Yes**, when the client supports structured output and emits `json_schema` from the derived schema. |
| Structured type (`dataclass`, Pydantic model, `dict`, ...) | A JSON **object** matching the type's schema. These are never embedded regardless of the `embed` flag. | **Yes**. |

Illustrative shapes (each function would be decorated with `@response_schema` and used as `response_schema=...` on an `Agent`):

```python
# Raw text - parse inside the function (e.g. json.loads).
def only_str(content: str) -> dict:
    pass
```

```python
# Single JSON value at the top level, e.g. 42
def only_int(content: int) -> dict:
    pass
```

```python
from dataclasses import dataclass

@dataclass
class Data:
    content: int

# Single JSON object at the top level, e.g. {"content": 1}
def only_dataclass(content: Data) -> dict:
    pass
```

#### Two or more non-injected parameters

The framework builds one synthetic JSON **object** schema: **Python parameter names are JSON keys**. The completion must be a single object with those keys; values are validated against the annotations and passed into your function as keyword arguments (alongside any injected parameters).

For example:

```python
@response_schema
def create_user(name: str, age: int, email: str) -> dict:
    """Create a validated user record."""
    return {"name": name, "age": age, "email": email, "active": True}

# expected JSON: {"name": "John Doe", "age": 30, "email": "john.doe@example.com"}
agent = Agent("assistant", config=config, response_schema=create_user)

reply = await agent.ask("Create a user for Alice, age 30, alice@example.com")
result = await reply.content()
# {"name": "Alice", "age": 30, "email": "alice@example.com", "active": True}
```

**`pydantic.Field` on each parameter**

Multi-parameter validators are backed by a synthetic Pydantic model, so you can document and constrain each JSON property with [`Field`](https://docs.pydantic.dev/latest/concepts/fields/), just like on a [`BaseModel`](https://docs.pydantic.dev/latest/concepts/models/):

- Use [`typing.Annotated`](https://docs.python.org/3/library/typing.html#typing.Annotated) when the parameter has no default: `Annotated[str, Field(description="...")]`.
- Combine a **default** and metadata with `Field` as the default value, e.g. `score: float = Field(1.0, description="Test score")`.

`description` is surfaced on each property in the generated JSON Schema (and thus in native structured output when the client sends that schema). Other `Field` arguments-`ge`, `le`, `pattern`, and so on-are reflected as the usual JSON Schema keywords.

```python
from typing import Annotated

from pydantic import Field

from autogen.beta import response_schema

@response_schema
def extract_listing(
    title: Annotated[str, Field(description="Product name from the text")],
    price_usd: Annotated[float, Field(description="Price in US dollars", ge=0)],
    in_stock: Annotated[bool, Field(description="True if the listing says it ships now")],
) -> dict:
    return {"title": title, "price_usd": price_usd, "in_stock": in_stock}
```

Parameters with a Python default (plain value or `Field(default, ...)`) are usually **not** listed as required in the schema; callers can omit those keys in the JSON object.

!!! note
    Renaming a parameter changes the key the model is instructed to use. Treat those names as part of your contract with the model.

### Accessing `Context`

Validators participate in the same dependency injection model as tools. Inject [`Context`](context/variables.md) to read `variables`, tie validation to session state, or perform lookups:

```python
from autogen.beta import Context, response_schema

@response_schema
def validate_with_context(content: str, context: Context) -> str:
    """Use context variables during validation."""
    language = context.variables.get("language", "en")
    return f"[{language}] {content}"
```

---

## `PromptedSchema` (models without native structured output)

Some models or providers do not support API-level structured output (no `response_format` JSON schema). `PromptedSchema` **injects the JSON Schema into the system prompt** and sets `json_schema` to `None` on the wire so the client does not request native structured mode. Validation still goes through the inner schema's `validate` method.

!!! note
    **Amazon Bedrock** supports [native structured output](https://docs.aws.amazon.com/bedrock/latest/userguide/structured-output.html) on the Converse API - `BedrockConfig` sends schemas via `outputConfig` automatically. Support is model-dependent (check the Bedrock model pages), and Bedrock compiles each new schema on first use, so the first request with a given schema can take noticeably longer than subsequent ones.

```python
from autogen.beta import Agent, PromptedSchema

agent = Agent(
    "assistant",
    config=config,
    response_schema=PromptedSchema(int),
)

reply = await agent.ask("How many oceans are there on Earth?")
result = await reply.content()
```

You can keep a **single** schema definition (type, `ResponseSchema`, or `@response_schema` callable) and **only wrap it when** you need prompt-based delivery. The inner `validate` logic and JSON shape stay the same; `PromptedSchema` swaps how the schema reaches the model (system-prompt text instead of API `response_format`).

```python
from autogen.beta import Agent, PromptedSchema, ResponseSchema, response_schema
from autogen.beta.config import OpenAIConfig

config = OpenAIConfig("gpt-4o-mini")

# Plain type you already pass as response_schema=int elsewhere
agent_a = Agent("a", config=config, response_schema=PromptedSchema(int))

# Named ResponseSchema reused from a "native structured" setup - wrap for a weaker API
ocean_count = ResponseSchema(
    int,
    name="OceanCount",
    description="Number of oceans on Earth.",
)
agent_b = Agent("b", config=config, response_schema=PromptedSchema(ocean_count))

# Same callable validator as without PromptedSchema - wrap it when the wire format must be prompt-only
@response_schema
def parse_int(content: str) -> int:
    return int(content.strip())

strict_int = PromptedSchema(parse_int)
agent_c = Agent("c", config=config, response_schema=strict_int)
```

### Custom prompt template

The default template asks for raw JSON only. Override it with a string that contains the `{schema}` placeholder:

```python
PromptedSchema(
    int,
    prompt_template="Reply with JSON matching this schema:\n{schema}",
)
```

---

## Override schema per request

Pass `response_schema` to `ask()` (or `AgentReply.ask()`) to change the contract for **one turn** only. The agent's default schema applies again on the next turn unless you override again.

```python
agent = Agent("assistant", config=config)

turn = await agent.ask("How many seconds in a minute?", response_schema=int)
result = await turn.content()
#> 60 - int

turn2 = await turn.ask("Say hello.")
result2 = await turn2.content()
#> "Hello!" - str
```

Pass **`response_schema=None`** to drop a schema that was set on the agent for a single request:

```python
agent = Agent("assistant", config=config, response_schema=int)

reply = await agent.ask("Just say hello in plain text.", response_schema=None)
result = await reply.content()
```

!!! note
    The per-request override applies only to that turn. The conversation history is unchanged; only the schema used for the next completion differs.

## Validation retries

When the model's response fails schema validation, you can automatically **re-ask** the model instead of raising immediately. Pass the `retries` keyword to `content()`:

```python
agent = Agent("assistant", config=config, response_schema=int)

reply = await agent.ask("How many planets in the solar system?")
result = await reply.content(retries=3)
```

The `retries` parameter controls how many **re-asks** are allowed after the initial attempt. With `retries=3`, the initial response is validated; if it fails, the model is re-asked up to **3 more times** before the error is raised.

| Value | Behavior |
|-------|----------|
| `retries=0` (default) | No retries - raise on the first validation failure. |
| `retries=3` | Up to 3 re-asks after the initial attempt (4 total). |
| `retries=math.inf` | Re-ask indefinitely until the model produces a valid response. |

Each retry sends the validation error back to the model as a follow-up message in the **same conversation**, so the model can see what went wrong and correct its output.

!!! warning
    `retries=math.inf` will loop forever if the model consistently produces invalid output. Use a finite count in production, and reserve `math.inf` for interactive or experimental use.

---

## Primitive embedding (`embed`)

When a schema type is a **primitive** (`int`, `float`, `bool`, `list[...]`) or a **union** (`int | str`), the framework wraps it in a one-field JSON object by default. This is called **embedding**.

Instead of asking the model to produce a bare value like `42`, the API schema asks for `{"data": 42}`. The `content()` method transparently unwraps the envelope so your code still receives a plain Python value.

### Why?

Most structured-output APIs (OpenAI, etc.) are designed around JSON **objects**. A bare value (`42`, `true`, `"hello"`) is technically valid JSON but some providers handle it less reliably. Wrapping the value in `{"data": ...}` gives the model a proper object to fill in, which improves reliability without changing your application code.

### Which types are embedded?

| Type | Embedded by default? | Reason |
|------|---------------------|--------|
| `str` | No schema generated | Raw text is passed through as-is. |
| `int`, `float`, `bool` | **Yes** | Bare primitives benefit from the object wrapper. |
| `list[T]`, `tuple[T, ...]` | **Yes** | Array values also benefit from the wrapper. |
| `int \| str`, `Union[T1, T2]`, `(T1, T2)` | **Yes** | Union of primitives. |
| `BaseModel` subclass | No | Already a JSON object. |
| `@dataclass` | No | Already a JSON object. |
| `TypedDict` | No | Already a JSON object. |
| `dict[K, V]` | No | Already a JSON object. |

### Opting out

Pass `embed=False` to `ResponseSchema` or `@response_schema` to disable wrapping. The model must then produce the bare JSON value directly (e.g. `42` instead of `{"data": 42}`).

```python
from autogen.beta import ResponseSchema

schema = ResponseSchema(int, name="RawInt", embed=False)
# Model must produce: 42
# With embed=True (default): model produces {"data": 42}, content() returns 42 either way
```

With the `@response_schema` decorator:

```python
@response_schema(embed=False)
def parse_rating(value: int) -> int:
    return max(1, min(5, value))
```

!!! note
    Embedding is transparent to your code. Whether `embed` is `True` or `False`, `content()` always returns the unwrapped Python value. The only difference is the JSON shape the model is asked to produce.

---

---

# Model Configuration

Source: https://docs.ag2.ai/latest/docs/beta/model_configuration/

# Model Configuration

The AG2 framework provides an explicit, predictable, and type-safe way to configure Large Language Models (LLMs) for your agents. The configuration API is designed to provide a consistent developer experience across different model providers while maintaining strong typing support.

## Supported Providers

AG2 supports multiple LLM providers through dedicated configuration classes. Each provider requires its respective optional dependencies to be installed.

| Provider | Configuration Class | Installation Command |
| :--- | :--- | :--- |
| **[OpenAI Responses](https://developers.openai.com/api/reference/responses/overview)** | `OpenAIResponsesConfig` | `pip install "ag2[openai]"` |
| **[OpenAI](https://developers.openai.com/api/reference/overview)** | `OpenAIConfig` | `pip install "ag2[openai]"` |
| **[Anthropic](https://platform.claude.com/docs/en/build-with-claude/overview)** | `AnthropicConfig` | `pip install "ag2[anthropic]"` |
| **[Gemini](https://ai.google.dev/gemini-api/docs)** | `GeminiConfig` | `pip install "ag2[gemini]"` |
| **[Gemini on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs)** | `VertexAIConfig` | `pip install "ag2[gemini]"` |
| **[Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html)** | `BedrockConfig` | `pip install "ag2[bedrock]"` |
| **[Ollama](https://docs.ollama.com/api/introduction)** | `OllamaConfig` | `pip install "ag2[ollama]"` |
| **[DashScope](https://www.alibabacloud.com/help/en/model-studio/first-api-call-to-qwen)** | `DashScopeConfig` | `pip install "ag2[dashscope]"` |
| **[xAI](https://docs.x.ai/docs/overview)** | `XAIConfig` | `pip install "ag2[xai]"` |

*(Note: `OpenAIConfig` is also available for OpenAI-compatible endpoints).*

---

## How to Configure a Model

### Basic Configuration

To configure a model, import the specific provider's configuration class and initialize it with your desired parameters. The most common parameters are `model`, `api_key`, and `base_url`.

=== "OpenAI Responses"
    ```python linenums="1"
    from autogen.beta.config import OpenAIResponsesConfig

    # Configure an OpenAI Responses API model
    config = OpenAIResponsesConfig(
        model="gpt-4.1-nano",
        api_key="sk-...",
        streaming=True
    )
    ```

=== "OpenAI"
    ```python linenums="1"
    from autogen.beta.config import OpenAIConfig

    # Configure an OpenAI model
    config = OpenAIConfig(
        model="gpt-4o-mini",
        api_key="sk-...",
        temperature=0.2,
        streaming=True
    )
    ```

=== "Anthropic"
    ```python linenums="1"
    from autogen.beta.config import AnthropicConfig

    # Configure an Anthropic model
    config = AnthropicConfig(
        model="claude-haiku-4-5-20251001",
        api_key="sk-ant-...",
        streaming=True
    )
    ```

=== "Gemini"
    ```python linenums="1"
    from autogen.beta.config import GeminiConfig

    # Configure a Gemini model
    config = GeminiConfig(
        model="gemini-3-flash-preview",
        api_key="...",
        streaming=True
    )
    ```

=== "Amazon Bedrock"
    ```python linenums="1"
    from autogen.beta.config import BedrockConfig

    # Configure an Amazon Bedrock model (Converse API)
    config = BedrockConfig(
        model="anthropic.claude-sonnet-4-5-20250929-v1:0",
        region_name="us-east-1",
        streaming=True
    )
    ```

    Credentials follow the standard AWS resolution chain: explicit `aws_access_key_id` / `aws_secret_access_key`, a named `profile_name`, environment variables, shared config files, or instance roles. `model` accepts a Bedrock model id or an inference-profile ARN. See [Amazon Bedrock authentication](#amazon-bedrock-authentication) for the API-key (bearer token) alternative.

=== "Ollama"
    ```python linenums="1"
    from autogen.beta.config import OllamaConfig

    # Configure an Ollama model
    config = OllamaConfig(
        model="qwen3.5:latest",
        streaming=True
    )
    ```

=== "DashScope"
    ```python linenums="1"
    from autogen.beta.config import DashScopeConfig

    # Configure a DashScope model
    config = DashScopeConfig(
        model="qwen-plus",
        api_key="...",
        streaming=True
    )
    ```

=== "xAI"
    ```python linenums="1"
    from autogen.beta.config import XAIConfig

    # Configure an xAI Grok model
    config = XAIConfig(
        model="grok-4",
        api_key="xai-...",
        streaming=True
    )
    ```

!!! tip

    **AG2 Beta** is designed to be async and streaming-first, so for the best user experience it is recommended to enable streaming on the model provider configurations for models that support it. As shown above, `Streaming` has been set to `True` in each config.

### Using Environment Variables

For security and convenience, you don't need to hardcode your API keys. If `api_key` is not explicitly provided, the configuration will automatically attempt to load it from your environment variables.

The system looks for provider-specific keys (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `XAI_API_KEY`).

```python
from autogen.beta.config import OpenAIConfig

# Automatically falls back to OPENAI_API_KEY from the environment
config = OpenAIConfig(model="gpt-5")
```

### Amazon Bedrock Authentication

`BedrockConfig` authenticates in either of two ways, both resolved by the underlying AWS SDK:

**1. AWS credentials (SigV4)** - explicit keys, a `profile_name`, or the standard environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`). Recommended for production; credentials refresh automatically through botocore.

**2. Bedrock API keys (bearer token)** - set the [Amazon Bedrock API key](https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys.html) as an environment variable and boto3 uses bearer-token auth for Bedrock calls automatically (no other credentials needed):

```bash
export AWS_BEARER_TOKEN_BEDROCK=<your-bedrock-api-key>
export AWS_DEFAULT_REGION=us-east-1
```

```python
from autogen.beta.config import BedrockConfig

# Auth from AWS_BEARER_TOKEN_BEDROCK, region from AWS_DEFAULT_REGION
config = BedrockConfig(model="anthropic.claude-sonnet-4-5-20250929-v1:0")
```

A region is always required - pass `region_name=` or set `AWS_DEFAULT_REGION`. Notes on API keys:

- **Short-term keys** expire with the console session that minted them (max 12 hours) and are region-bound - generate the key in the same region you call.
- **Long-term keys** are backed by an auto-created IAM user; AWS recommends them for exploration only.
- API keys work only for Bedrock / Bedrock Runtime actions. If both a bearer token and AWS credentials are present, the bearer token wins for Bedrock calls.

### Google Vertex AI (Gemini)

For Gemini on **Vertex AI** (Google Cloud), use the dedicated `VertexAIConfig` class. `GeminiConfig` covers the public Developer API (`api_key`); `VertexAIConfig` covers the Vertex path (GCP `project`, `location`, and Google-issued credentials).

Authentication accepts any of the following:

=== "Service account key file"
    ```python linenums="1" hl_lines="7"
    from autogen.beta.config import VertexAIConfig

    config = VertexAIConfig(
        model="gemini-3-flash-preview",
        project="my-gcp-project",
        location="us-central1",
        credentials="/path/to/service-account-key.json",
        # Path to a service-account JSON key file downloaded from
        # GCP Console -> IAM & Admin -> Service Accounts -> Keys.
    )
    ```

    The service account needs the **Vertex AI User** (`roles/aiplatform.user`) IAM role on the project.

=== "Application Default Credentials"
    ```python linenums="1"
    from autogen.beta.config import VertexAIConfig

    # Run `gcloud auth application-default login` first, or ensure
    # GOOGLE_APPLICATION_CREDENTIALS points to a key file. With nothing
    # passed to `credentials`, google-genai resolves ADC automatically.
    config = VertexAIConfig(
        model="gemini-3-flash-preview",
        project="my-gcp-project",
        location="us-central1",
    )
    ```

=== "Pre-built Credentials object"
    ```python linenums="1" hl_lines="4-6 12"
    import google.auth
    from autogen.beta.config import VertexAIConfig

    creds, _ = google.auth.default(
        scopes=["https://www.googleapis.com/auth/cloud-platform"],
    )

    config = VertexAIConfig(
        model="gemini-3-flash-preview",
        project="my-gcp-project",
        location="us-central1",
        credentials=creds,
    )
    ```

    Use this path for impersonated credentials, workload identity, or any other `google.auth.credentials.Credentials` source.

#### Environment variables

Instead of passing parameters explicitly, the underlying `google-genai` SDK resolves any field left unset from the following environment variables:

| Environment variable | Used by | Equivalent parameter | Notes |
| :--- | :--- | :--- | :--- |
| `GOOGLE_API_KEY` | `GeminiConfig` | `api_key` | Takes precedence over `GEMINI_API_KEY` if both are set. |
| `GEMINI_API_KEY` | `GeminiConfig` | `api_key` | Developer API key. |
| `GOOGLE_CLOUD_PROJECT` | `VertexAIConfig` | `project` | GCP project ID. |
| `GOOGLE_CLOUD_LOCATION` | `VertexAIConfig` | `location` | GCP region (or `global`). |
| `GOOGLE_APPLICATION_CREDENTIALS` | `VertexAIConfig` | `credentials` | Path to a service-account JSON key file, read via ADC. |

With the three Vertex variables set in the environment, configuration collapses to just the model name:

```bash
export GOOGLE_CLOUD_PROJECT=my-gcp-project
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json
```

```python
from autogen.beta.config import VertexAIConfig

# All Vertex auth parameters resolved from the environment.
config = VertexAIConfig(model="gemini-3-flash-preview")
```

### Controlling Gemini Thinking

Gemini 3 Pro models default to **dynamic / unbounded** thinking, which can cause individual calls to spend a large internal token budget before responding. Both `GeminiConfig` and `VertexAIConfig` accept thinking controls that map directly to [Google's Thinking API](https://ai.google.dev/gemini-api/docs/thinking).

Use `thinking_level` for **Gemini 3** models, or `thinking_budget` for **Gemini 2.5** models:

```python
from autogen.beta.config import GeminiConfig, VertexAIConfig

# Gemini 3 - bound thinking with a level
gemini3 = GeminiConfig(
    model="gemini-3-flash-preview",
    thinking_level="low",  # "low" | "medium" | "high"
)

# Gemini 2.5 - bound thinking with an explicit token budget
gemini25 = VertexAIConfig(
    model="gemini-2.5-pro",
    project="my-gcp-project",
    location="us-central1",
    thinking_budget=512,  # 0 disables thinking entirely
)
```

For full control (e.g. enabling `include_thoughts`), pass a `google.genai.types.ThinkingConfig` directly via `thinking_config`. When set, it takes precedence over the shorthand fields.

The number of thinking tokens consumed is reported on `ModelResponse.usage.thinking_tokens` and emitted as the `gen_ai.usage.thinking_tokens` OpenTelemetry attribute by `TelemetryMiddleware`.

### Self-Hosted and OpenAI-Compatible Models (vLLM, LM Studio, etc.)

If you are using a self-hosted model or an API that is compatible with the OpenAI format (such as **vLLM**, **LM Studio**, **FastChat**, or **Together AI**), you can use the `OpenAIConfig` class and specify a custom `base_url`.

```python
from autogen.beta.config import OpenAIConfig

# Configure a vLLM or other OpenAI-compatible endpoint
config = OpenAIConfig(
    model="qwen-3",
    base_url="http://localhost:8000/v1",
    # Some endpoints don't require an API key, but the client expects a non-empty string
    api_key="NotRequired",
)
```

!!! tip
    If you are running a self-hosted server via HTTPS without a valid SSL certificate (e.g., a local self-signed certificate), you can disable SSL checks by passing a custom `httpx.AsyncClient` with `verify=False` to the configuration:

    ```python linenums="1" hl_lines="8"
    import httpx
    from autogen.beta.config import OpenAIConfig

    config = OpenAIConfig(
        model="qwen-3",
        base_url="https://localhost:8000/v1",
        api_key="NotRequired",
        http_client=httpx.AsyncClient(verify=False)
    )
    ```

### Extra Body Parameters

Some OpenAI API-compatible providers require additional, provider-specific parameters in the request body. Use the `extra_body` parameter on `OpenAIConfig` to pass these through directly to the API call.

This is useful for enabling features like extended thinking on self-hosted or third-party models:

```python
from autogen.beta.config import OpenAIConfig

# NVIDIA NIM
nemotron = OpenAIConfig(
    model="nvidia/nemotron-3-super-120b-a12b",
    base_url="https://integrate.api.nvidia.com/v1",
    extra_body={"chat_template_kwargs": {"thinking": True{{ "}}" }},
)
```

## Reusing and Overriding Configurations

Model configurations are **immutable**. If you need to reuse a configuration for multiple agents with slight variations (e.g., changing the model version or adjusting the temperature), use the `.copy()` method. This creates a new updated instance without mutating the original configuration.

```python
from autogen.agent import Agent
from autogen.beta.config import OpenAIConfig

base_config = OpenAIConfig(model="gpt-5")

agent1 = Agent(
    "Assistant",
    # Create a new configuration with updated temperature
    config=base_config.copy(temperature=0.2),
)

agent2 = Agent(
    "AnotherAssistant",
    # Create a new configuration with updated model and temperature
    config=base_config.copy(model="gpt-5-mini", temperature=0.8),
)
```

## Delaying Model Configuration

In many use cases, you may want to separate the logic of defining your agent (tools, system messages, instructions) from configuring the specific model it uses. This allows you to construct an agent once and dynamically provide the model configuration later during execution.

You can accomplish this by passing the configuration to the `.ask()` method when interacting with the agent. This is especially useful for applications like web servers where the user might bring their own API key or choose a different model on the fly.

```python
from autogen.agent import Agent
from autogen.beta.config import OpenAIConfig

# Define an agent without an initial model config,
# or with a default one you plan to override later
agent = Agent(
    "Assistant",
    prompt="You are a helpful assistant.",
    # other tools and settings...
)

# Ask the agent, passing the explicit model configuration
response = await agent.ask(
    "Hello!",
    config=OpenAIConfig(
        model="gpt-5",
        api_key="sk-user-specific-key"
    )
)
```

!!! warning
    Providing a configuration or client directly to the `ask()` method completely **overrides** the original model configuration assigned to the agent for that specific turn.

    ```python linenums="1" hl_lines="6 12"
    from autogen.agent import Agent
    from autogen.beta.config import OpenAIConfig

    agent = Agent(
        "Assistant",
        config=OpenAIConfig(model="gpt-5"),
    )

    response = await agent.ask(
        "Hello!",
        # overrides the original model configuration
        config=OpenAIConfig(model="gpt-5-mini")
    )
    ```

---

# Prompt Management

Source: https://docs.ag2.ai/latest/docs/beta/system_prompts/

# Prompt Management

## System Prompts

Agents can be initialized with a static system prompt. You can provide a single string or a list of strings:

```python
from autogen.beta import Agent

# Single string prompt
agent = Agent(
    "assistant",
    prompt="You are a helpful agent!"
)

# List of strings prompt
agent2 = Agent(
    "assistant2",
    prompt=[
        "You are an expert in Python.",
        "Be concise."
    ]
)
```

## Dynamic Prompts

### On conversation startup

System prompts can be generated dynamically when a conversation starts. This is useful when the prompt depends on external state or initial context. You can achieve this by using the `@my_agent.prompt` decorator or passing a synchronous or asynchronous function.

Dynamic prompt functions support the same powerful execution context capabilities as Agent Tools. For more detailed information on specific context features, see [Dependency Injection](context/inject.md), [Context Variables](context/variables.md), and [Human-in-the-loop](context/human_in_the_loop.md).

Dynamic prompts are evaluated only once at the beginning of the conversation, and their results are appended to the static prompts and reused for subsequent turns.

```python
from autogen.beta import Agent, Context

agent = Agent("assistant")

@agent.prompt
async def dynamic_sysprompt(ctx: Context) -> str:
    # Generate prompt dynamically based on the initial event or context
    return (
        "You are a helpful agent. "
        f"The current context is {ctx.variables}."
    )
```

Alternatively, you can pass a callable directly to the `prompt` parameter, or mix static strings and callables in a list:

```python
from autogen.beta import Agent

def get_sysprompt() -> str:
    # Returns a string for the prompt, evaluated at the beginning of the conversation
    return "This is dynamically generated."

agent = Agent(
    "assistant",
    prompt=["Static prompt part.", get_sysprompt]
)
```

### On each conversation turn

While dynamic prompt hooks are evaluated once per conversation, you might need to update the prompt dynamically on each turn. You can do this by mutating the `prompt` list within the `Context` directly between calls to `reply.ask()`.

```python
# Initial conversation turn
reply = await agent.ask("Hi, agent!")

# Change the prompt for the next turn
reply.context.prompt = ["You are now a funny agent!"]
await reply.ask("Tell me a joke")
```

You can also completely override the agent's default prompt for a specific run or turn by passing the `prompt` parameter directly to `ask()`:

```python
# Overrides the default prompt for this conversation
reply = await agent.ask(
    "Hi!",
    prompt=["Temporary prompt for this run"]
)
```

## Prompt updates

For continuous and event-driven prompt updates, you can mutate `context.prompt` dynamically from an event subscriber. This allows you to respond to specific events in the stream and adjust the agent's behavior on the fly during an ongoing conversation. See the [Stream](advanced/stream.md) documentation for more details on this advanced feature.

```python
from autogen.beta import Agent, Context, MemoryStream
from autogen.beta.events import ModelRequest

agent = Agent("assistant", prompt="You are a helpful assistant.")
stream = MemoryStream()

@stream.where(ModelRequest).subscribe()
async def mutate_prompt(event: ModelRequest, context: Context) -> None:
    # Update the prompt dynamically when a ModelRequest is triggered
    if "joke" in event.content.lower():
        context.prompt = ["You are now a comedian."]

await agent.ask("Tell me a joke", stream=stream)
```

---

# Tasks

Source: https://docs.ag2.ai/latest/docs/beta/tasks/

A `Task` is a framework-core wrapper any `Agent` can use to give a unit of work a trackable lifecycle. While the task is active, the framework emits `TaskStarted`, `TaskProgress`, `TaskCompleted`, `TaskFailed`, and `TaskExpired` events on a stream - so observers (UIs, watchers, mirrors, test harnesses) can follow along without participating in execution.

!!! note
    Tasks are **agent-owned**. The framework does not assign or schedule them. Standalone usage requires no observers - events fly past harmlessly if nothing subscribes.

## When to Use a Task

Use a Task whenever a unit of work has a beginning, an end, and observable progress that you want to surface beyond your own function's return value:

- **Long-running pipelines** where downstream consumers want progress checkpoints.
- **HITL approvals** where a UI needs to know the task is waiting on a human.
- **Test harnesses** that assert a sequence of lifecycle events.

For lightweight LLM-driven sub-agent delegation see [Sub-task Delegation](task_delegation.md) - that's a different feature that wraps an Agent in a `run_subtask` tool.

## Quick Start

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig

agent = Agent("indexer", config=AnthropicConfig(model="claude-sonnet-4-6"))

async with agent.task("index documents") as task:
    await task.progress({"stage": "discover", "files": 12})
    await task.progress({"stage": "index", "indexed": 12})
    await task.complete({"indexed": 12})

print(task.state)        # TaskState.COMPLETED
print(task.metadata.result)  # {'indexed': 12}
```

The `async with` block opens the lifecycle. On clean exit the task auto-completes with `result=None` if you didn't call `complete()` or `fail()` yourself.

## Lifecycle States

```
CREATED -> RUNNING -> COMPLETED   (terminal, success)
                   -> FAILED      (terminal, exception or explicit fail)
                   -> EXPIRED     (terminal, TTL elapsed)
```

| State | Meaning |
|---|---|
| `CREATED` | The Task object exists but `__aenter__` has not run. `task.task_id` and `task.metadata` raise. |
| `RUNNING` | Inside the `async with` block. Progress events allowed. |
| `COMPLETED` | Reached `complete()` or clean block exit. |
| `FAILED` | Reached `fail()` or block exited via exception. |
| `EXPIRED` | TTL elapsed; emitted by an external observer (e.g. a network hub's TTL sweeper). |

The terminal states are immutable - once set, further `complete() / fail() / progress()` calls are silent no-ops. The set is exported as `autogen.beta.task.TERMINAL_TASK_STATES`.

## API Reference

### `Agent.task(...)`

```python
agent.task(
    title: str,
    *,
    description: str = "",
    payload: dict[str, Any] | None = None,
    capability: str | None = None,
    ttl_seconds: int | None = None,
    context: ConversationContext | None = None,
) -> Task
```

| Parameter | Type | Description |
|---|---|---|
| `title` | `str` | Short objective shown on every event. |
| `description` | `str` | Optional longer description. |
| `payload` | `dict[str, Any] \| None` | Initial payload merged into `TaskSpec`. |
| `capability` | `str \| None` | Tags the task with a capability name; used by network mirrors. |
| `ttl_seconds` | `int \| None` | Sets `metadata.expires_at`. The Task does not self-expire - an external observer must call `task.expire()` when the TTL elapses. |
| `context` | `ConversationContext \| None` | If supplied, events flow on `context.stream` and `ag2.task` is stamped into `context.dependencies` for the duration of the block. If omitted, the Task creates a private `MemoryStream` on entry. |

Returns an unentered `Task`. Use as `async with agent.task(...) as task:`.

### `Task` instance methods

| Method | Description |
|---|---|
| `await task.progress(payload)` | Emits `TaskProgress`; merges `payload` into `metadata.progress` and stamps `last_progress_at`. No-op if already terminal. |
| `await task.complete(result=None)` | Terminal. Emits `TaskCompleted`; sets `metadata.result` and `state = COMPLETED`. |
| `await task.fail(error)` | Terminal. Accepts a string (wrapped in `RuntimeError`) or any `BaseException`. Emits `TaskFailed`; sets `state = FAILED`. |
| `await task.expire()` | Terminal. Emits `TaskExpired`; sets `state = EXPIRED`. Called by external TTL observers. |

### Properties

| Property | Available before `__aenter__`? |
|---|---|
| `task.state` | Yes - returns `TaskState.CREATED`. |
| `task.task_id` | No - raises `RuntimeError`. |
| `task.metadata` | No - raises `RuntimeError`. |
| `task.context` | No - raises `RuntimeError`. |

## Bound Context vs. Standalone

```python
from autogen.beta.context import ConversationContext
from autogen.beta.stream import MemoryStream

ctx = ConversationContext(stream=MemoryStream())

async with agent.task("with-ctx", context=ctx) as task:
    ...
```

Passing a `ConversationContext` shares the stream with the rest of your agent's run, so observers and middleware already attached to that stream see the lifecycle events.

Without a context, the Task creates a private stream on entry. Events still fire - but only observers attached to that private stream see them. Useful for one-off background work that doesn't need to surface anywhere.

## Auto-Complete and Auto-Fail

The `async with` block has these guarantees:

- **Clean exit, no terminal call** -> auto `complete(result=None)`.
- **Exception inside the block** -> auto `fail(exc)`, then the exception propagates.
- **Already terminal at exit time** -> nothing further happens.

```python
try:
    async with agent.task("flaky") as task:
        raise ValueError("boom")
except ValueError as exc:
    print(task.state)            # TaskState.FAILED
    print(task.metadata.error)   # 'boom'
```

## Observing the Lifecycle

Subscribe directly on the bound stream to capture every lifecycle event in order.

```python
from autogen.beta.events import TaskCompleted, TaskProgress, TaskStarted

stream = MemoryStream()
ctx = ConversationContext(stream=stream)

stream.subscribe(
    lambda ev: print(type(ev).__name__, getattr(ev, "payload", "")),
    sync_to_thread=False,
)

async with agent.task("watched", context=ctx) as task:
    await task.progress({"step": "fetch"})
    await task.complete({"ok": True})
```

!!! note
    `TaskProgress` is marked transient - it is delivered live to subscribers but **not** persisted to the stream's storage. Subscribe before the events fire to capture them. `TaskStarted`, `TaskCompleted`, `TaskFailed`, and `TaskExpired` are persisted normally.

## Reading the Active Task with `TaskInject`

Inside an `async with agent.task(...)` block, the framework stamps the active Task into `context.dependencies["ag2.task"]`. Two ways to read it:

### Direct access

```python
async with agent.task("work", context=ctx) as task:
    active = ctx.dependencies["ag2.task"]
    assert active is task
```

### `TaskInject` annotation

`TaskInject` is a fast_depends-resolvable annotation that injects the active Task into any function the dependency-injection machinery resolves - most usefully a `@tool` body.

```python
from autogen.beta import tool
from autogen.beta.task import TaskInject

@tool
async def report(message: str, task: TaskInject) -> str:
    if task is None:
        return "no active task"
    await task.progress({"tool_message": message})
    return f"reported on task {task.task_id}"
```

The injection has `default=None`, so always treat `task` as possibly `None` and null-check before use.

## TTL and Expiry

Setting `ttl_seconds=N` populates `metadata.expires_at` but **does not** start a timer. The Task primitive itself never self-expires - that's by design, so a standalone Task with no observer doesn't spawn a background task. Instead, an external observer (e.g. a network hub's TTL sweeper, a periodic watch) checks `expires_at` and calls `task.expire()` when due.

For self-contained TTL behaviour, wire up a sweeper in your application:

```python
async def sweep(task: Task, deadline: datetime) -> None:
    while task.state == TaskState.RUNNING:
        if datetime.now(timezone.utc) >= deadline:
            await task.expire()
            return
        await asyncio.sleep(1.0)
```

## TaskSpec and TaskMetadata

Two small dataclasses surface around a Task:

- `TaskSpec` - what the task is doing: `title`, `description`, `payload`, optional `capability`. Created by `Agent.task(...)`.
- `TaskMetadata` - mutable lifecycle record updated on each transition: `task_id`, `owner_id`, `spec`, `state`, ISO-8601 timestamps, `progress`, `result`, `error`, optional `session_id`.

```python
async with agent.task("survey", description="probe upstream", payload={"region": "us"}) as task:
    print(task.metadata.spec.title)        # 'survey'
    print(task.metadata.spec.payload)      # {'region': 'us'}
    print(task.metadata.owner_id)          # 'researcher'
    print(task.metadata.started_at)        # ISO 8601 string
```

---

# Sub-task Delegation

Source: https://docs.ag2.ai/latest/docs/beta/task_delegation/

Sub-task delegation allows agents to delegate work to other agents through tool calling. The calling agent's LLM decides when and what to delegate, and each sub-task runs on its own isolated stream with independent history.

## Why Use Subagents

Breaking work across multiple agents gives you:

- **Separation of concerns** - each agent has a focused prompt, tools, and config tuned for its role.
- **Independent context** - sub-tasks run on fresh streams, so history doesn't grow unboundedly.
- **LLM-driven orchestration** - the calling agent decides when to delegate, what context to pass, and how to use the result.

!!! note
    When the LLM returns multiple tool calls in a single response, the framework dispatches them concurrently. Each concurrent sub-task gets its own copy of variables, so they don't interfere with each other.

!!! tip
    For lightweight self-delegation where the parent doesn't need a *named* delegate, opt in to the auto-injected `run_subtask` / `run_subtasks` tools by passing `tasks=TaskConfig(...)` - see [`tasks=` in The Agent Harness](agent_harness.md#tasks-taskconfig). Use `Agent.as_tool()` (below) when you want a distinct, purpose-named tool exposed to the LLM.

## Subagents API

Use `Agent.as_tool()` to make one agent available as a tool for another.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig

config = AnthropicConfig("claude-sonnet-4-6")

researcher = Agent(
    "researcher",
    prompt="You are a thorough researcher. Provide concise factual findings.",
    config=config,
    tools=[search_tool],
)

writer = Agent(
    "writer",
    prompt="You are a skilled writer. Turn research into clear prose.",
    config=config,
)

coordinator = Agent(
    "coordinator",
    prompt="First delegate research, then pass findings to the writer.",
    config=config,
    tools=[
        researcher.as_tool(description="Research a topic and return findings."),
        writer.as_tool(description="Write an article. Pass research notes in the context parameter."),
    ],
)

reply = await coordinator.ask("Write a short article about the history of Python.")
print(await reply.content())
```

The coordinator's LLM sees two tools - `task_researcher` and `task_writer` - and calls them as needed. Each call spawns the target agent on a fresh stream, runs it to completion, and returns the result.

The calling agent's LLM sees a tool named `task_{agent.name}` with `objective` (required) and `context` (optional) parameters.

The `context` tool parameter is how the calling LLM shares relevant information with the sub-task:

```python
task_writer(
    objective="Write an article about Python's history",
    context="Key findings: Created by Guido van Rossum in 1991. Named after Monty Python."
)
```

`as_tool()` accepts these parameters:

| Parameter | Type | Description |
|---|---|---|
| `description` | `str` | Tool description shown to the LLM (required) |
| `name` | `str | None` | Override the default `task_{agent.name}` tool name |
| `stream` | `StreamFactory | None` | Factory to create custom streams for sub-tasks (see [Sub-Task Streams](#sub-task-streams)) |
| `middleware` | `Iterable[ToolMiddleware]` | Tool middleware applied to the delegate tool (e.g., `approval_required`) |

You can also use `subagent_tool()` directly for more control:

```python
from autogen.beta.tools.subagents import subagent_tool

coordinator = Agent(
    "coordinator",
    config=config,
    tools=[
        subagent_tool(researcher, description="Research a topic."),
    ],
)
```

## Self-Delegation

An agent can delegate to itself to break complex work into independent sub-tasks. Each sub-task runs as a fresh copy of the agent with its own stream and history.

```python
analyst = Agent(
    "analyst",
    prompt=(
        "You have search and sub_task tools. "
        "Only use sub_task when the task has clearly independent parts. "
        "Otherwise handle it directly with search."
    ),
    config=config,
    tools=[search_tool],
)

analyst.add_tool(
    analyst.as_tool(
        description="Break work into a focused sub-task for independent analysis.",
        name="sub_task",
    )
)

reply = await analyst.ask("Compare Python vs Rust for web APIs: performance, DX, and ecosystem.")
```

The analyst's LLM may call `sub_task` multiple times - one per aspect - then synthesise the results.

## Dynamic Agents

`dynamic_agent()` lets the calling LLM **construct** an ephemeral agent at runtime - picking a name, system prompt, and a subset of available tools per objective - instead of pre-defining each delegate as a named `as_tool()`. Use it when the orchestrator should compose a focused worker on demand for each sub-task.

```python
from autogen.beta import Agent
from autogen.beta.tools.dynamic import dynamic_agent
from autogen.beta.config import OpenAIConfig

config = OpenAIConfig(model="gpt-4o-mini")

orchestrator = Agent(
    "orchestrator",
    config=config,
    prompt=(
        "You orchestrate sub-tasks by calling create_and_run_agent. "
        "For each task, invent a focused agent name and system prompt, "
        "include only the tools the child genuinely needs, "
        "and pass a clear objective."
    ),
    tools=[
        dynamic_agent(available_tools=[calc, web_search], config=config),
    ],
)

reply = await orchestrator.ask("What is 17 * 25 + 4? Use a child agent.")
```

The orchestrator's LLM sees one tool, `create_and_run_agent(spec, objective)`. Each call spawns an ephemeral child `Agent` on a fresh stream, runs the objective via `run_task`, and returns the reply string.

`dynamic_agent()` accepts these parameters:

| Parameter | Type | Description |
|---|---|---|
| `available_tools` | `Iterable[Tool | Callable[..., Any]]` | Pool of tools the spawned child may pick from by name |
| `config` | `ModelConfig` | Model configuration used for every spawned child |
| `middleware` | `Iterable[ToolMiddleware]` | Tool middleware applied to `create_and_run_agent` |

### The AgentSpec the LLM constructs

The LLM builds an `AgentSpec` on every call. It is JSON-serializable and captures the declarative parts of the child:

| Field | Type | Purpose |
|---|---|---|
| `name` | `str` | Display name of the spawned child |
| `prompt` | `list[str]` | System prompt for the child |
| `tool_names` | `list[str]` | Subset of `available_tools` names to give the child |
| `response_schema` | `ResponseSchemaSpec | None` | Optional structured output schema |

### How the LLM discovers available tool names

The pool's names and descriptions are **rendered into the `create_and_run_agent` tool description automatically** when the factory is built. The calling agent's system prompt does not need to enumerate them - the LLM discovers the valid names from the tool schema.

!!! tip
    If the LLM picks a name not in the pool, the framework returns `Error: unknown tools [...]. Available: [...]` as a recoverable string. The LLM reads the hint and retries with a corrected spec - no exception propagates to the caller. The auto-rendered menu prevents this in the common case.

!!! warning
    Spawned children are themselves constructed **without** `dynamic_agent`, so they cannot recursively spawn further dynamic agents. Recursion is structurally impossible - no depth limit needed.

## Sub-Task Streams

### Default Behavior

By default, each sub-task creates a fresh `MemoryStream`. The sub-task's history is isolated - it doesn't carry over between invocations.

It means that subagent has no information about previous calls or results. It just sees the current call and the context.

| What | Behavior | Why |
|---|---|---|
| **Dependencies** | Copied | Isolated - child mutations don't affect parent |
| **Variables** | Copied; synced back on success | Concurrent-safe - user variable mutations propagate back |
| **History** | Fresh stream | Clean context - the LLM passes relevant info via `context` parameter |
| **Depth counter** | Incremented in child; excluded from sync-back | Internal bookkeeping - never leaks to parent |
| **Agent prompt, tools, config** | Inherited | The sub-agent brings its own capabilities |

### Persistent Stream

`persistent_stream()` gives the same agent a consistent stream across multiple invocations within a context. The sub-task's history accumulates across calls rather than starting fresh each time:

```python
from autogen.beta.tools.subagents import persistent_stream

researcher.as_tool(
    description="Research a topic",
    stream=persistent_stream(),
)
```

It stores the stream ID in `context.dependencies` keyed by `f"ag:{agent.name}:stream"` and reuses the parent stream's storage backend. This is useful when the sub-agent benefits from seeing its own prior work - for example, a researcher that should avoid repeating searches.

### Custom Factory

For full control, pass any callable matching `StreamFactory = Callable[[Agent, Context], Stream]`:

```python
from autogen.beta import Agent, Context
from autogen.beta.streams.redis import RedisStream

def make_redis_stream(agent: Agent, ctx: Context) -> RedisStream:
    return RedisStream(MY_REDIS_URL, prefix=f"ag2:sub:{agent.name}")

researcher.as_tool(
    description="Research a topic",
    stream=make_redis_stream,
)
```

---

# Skills

Source: https://docs.ag2.ai/latest/docs/beta/skills/

Skills let an agent load specialized instructions on demand instead of carrying every capability in its system prompt. They follow the [agentskills.io](https://agentskills.io) convention: each skill is a directory that an agent discovers, reads, and runs only when a task actually calls for it. Skills can also be [defined inline in code](#code-defined-skills) when a directory on disk isn't a good fit.

AG2 ships three entry points, from highest-level to lowest:

| Entry point | Use it when |
| :--- | :--- |
| [`SkillPlugin`](#skillplugin) | **Recommended.** Injects the catalog into the system prompt and wires the activation tools, so the model knows what's available from the first turn. |
| [`SkillsToolkit`](#skillstoolkit) | You want the activation tools (plus an explicit `list_skills`) without prompt injection, or full control over runtimes. |
| [`SkillSearchToolkit`](#skillsearchtoolkit) | The agent should discover and install new skills from the [skills.sh](https://skills.sh) registry at runtime. |

## Skill structure

A skill is a directory with a `SKILL.md` at its root and, optionally, scripts and resource files:

```text
pdf-processing/
├── SKILL.md            # required: YAML frontmatter + instructions
├── scripts/            # optional: runnable .py / .sh files
│   └── extract.py
└── references/         # optional: any bundled resource files
    └── form-fields.md
```

`SKILL.md` carries YAML frontmatter (`name`, `description`, and optionally `version`, `license`, `compatibility`) followed by the instruction body. For the full authoring format, see the [agentskills.io documentation](https://agentskills.io/).

When surfacing a skill's files, AG2 draws one firm line:

- A **script** lives under `scripts/` and is *executed* via `run_skill_script`.
- A **resource** is any other file (not `SKILL.md`, not under `scripts/`) and is *read* via `read_skill_resource`.

The two are disjoint - a file is one or the other, never both.

## Progressive disclosure

Skills exist to keep context small. The agent only pays the token cost of the detail it actually uses, in three tiers:

| Tier | What's loaded | When |
| :--- | :--- | :--- |
| **1. Catalog** | Name + description + location, per skill | At startup (or via `list_skills`) |
| **2. Instructions** | The full `SKILL.md` body | When the model calls `load_skill` |
| **3. Resources & scripts** | Bundled files and script output | When the instructions reference them (`read_skill_resource` / `run_skill_script`) |

An agent with 20 installed skills doesn't load 20 full instruction sets upfront - only the ones a given conversation activates.

## SkillPlugin

`SkillPlugin` is the recommended way to give an agent local skills. Instead of spending a `list_skills` tool round-trip, it injects the catalog into the system prompt at startup, so the model can decide which skill is relevant immediately.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import SkillPlugin

agent = Agent(
    "assistant",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    plugins=[SkillPlugin()],
)
```

By default it scans `.agents/skills` relative to the current working directory. Pass a path or a `LocalRuntime` to point elsewhere:

```python
plugins=[SkillPlugin("./my-skills")]
```

### Activation flow

1. **Startup** - the plugin discovers every skill and injects an `<available_skills>` block into the system prompt. Each entry lists the skill's `name`, `description`, and `location`:

    ```xml
    <available_skills>
      <skill>
        <name>pdf-processing</name>
        <description>Extract PDF text, fill forms, merge files. Use when handling PDFs.</description>
        <location>/home/user/.agents/skills/pdf-processing/SKILL.md</location>
      </skill>
    </available_skills>
    ```

2. **Load** - when a task matches a description, the model calls `load_skill(name)`. The `name` parameter is constrained to the discovered skills, so the model can't invent one. The tool returns the `SKILL.md` body wrapped in `<skill_content>`, along with the skill directory and a listing of bundled resources.

3. **Use resources** - the model reads a listed resource with `read_skill_resource(name, resource)` or executes a script with `run_skill_script(name, script, args)` only when the instructions call for it.

### Capability gating

`SkillPlugin` only registers the activation tools the installed skills can actually use:

- `load_skill` is always registered (when at least one skill exists).
- `read_skill_resource` is registered **only if some skill has resources**.
- `run_skill_script` is registered **only if some skill has scripts**.

So an agent whose skills are pure instructions never sees a `run_skill_script` tool it can't use. When no skills are found at all, the plugin contributes nothing - no catalog and no tools.

!!! tip
    `SkillPlugin` is a snapshot taken at construction time: the catalog, the `name` constraint, and the gated tools always describe the same set of skills. Rebuild the agent (or the plugin) after installing new skills.

## SkillsToolkit

`SkillsToolkit` is the lower-level building block behind `SkillPlugin`. It exposes the same activation tools plus an explicit `list_skills` tool, but does **not** inject anything into the prompt - the model discovers skills by calling `list_skills` itself. Prefer [`SkillPlugin`](#skillplugin) unless you need that control.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import SkillsToolkit

agent = Agent(
    "assistant",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[SkillsToolkit()],
)
```

It exposes four tools:

| Tool | Description |
| :--- | :--- |
| `list_skills` | Return a catalog of installed skills with name, description, and location |
| `load_skill` | Fetch the full `SKILL.md` content for a specific skill |
| `read_skill_resource` | Read a bundled resource file from a skill's directory |
| `run_skill_script` | Execute a `.py` or `.sh` script from a skill's `scripts/` directory |

Every tool is also available as a method, so you can hand-pick a subset:

```python
skills = SkillsToolkit()

agent = Agent(
    "assistant",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[skills.list_skills(), skills.load_skill()],
)
```

### Runtimes

A **runtime** owns where skills live and how their scripts run - it discovers skills, reads their content, and executes their scripts. `LocalRuntime` is the default: it backs skills with the filesystem and runs scripts in a local subprocess (or a sandbox you supply). Pass one to a toolkit or plugin as a path string or an explicit `LocalRuntime`:

```python
from autogen.beta.tools import SkillsToolkit
from autogen.beta.tools.skills import LocalRuntime

skills = SkillsToolkit(LocalRuntime("./my-skills"))
# or just a path string
skills = SkillsToolkit("./my-skills")
```

`LocalRuntime` takes the install directory plus optional execution and discovery settings. `extra_paths` adds read-only directories that are scanned for skills but never written to - installed skills always go to the primary `dir`:

```python
skills = SkillsToolkit(
    LocalRuntime(
        "./my-skills",
        extra_paths=["./shared-skills"],   # read-only, also scanned
        timeout=30,                        # per-script timeout (seconds)
        blocked=["rm -rf"],                # best-effort command blocklist
    )
)
```

## Composing multiple runtimes

`SkillPlugin` and `SkillsToolkit` both accept **more than one runtime**, so you can serve skills from several locations at once - each with its own configuration. A common pattern is a read-only global library plus a writable project directory:

```python
from autogen.beta.tools import SkillPlugin
from autogen.beta.tools.skills import LocalRuntime

plugins = [
    SkillPlugin(
        LocalRuntime("~/.agents/skills"),   # global
        LocalRuntime(".agents/skills"),     # project
    ),
]
```

When the same skill name exists in more than one runtime, the **last runtime wins** - here the project skill shadows the global one. The rule is applied uniformly: the catalog, the `name` constraint, `load_skill`, `read_skill_resource`, and `run_skill_script` all resolve to the same winning skill.

!!! note
    Because each runtime can carry its own execution and storage settings (sandbox, timeout, install directory), composition lets global skills run one way and project skills another - something a single runtime can't express.

## Code-defined skills

Not every skill needs to live on disk. A `MemorySkill` defines a skill **inline in code** - its instructions, resources, and scripts are Python values rather than files. It's backed by an in-memory runtime (`MemoryRuntime`) instead of `LocalRuntime`, but the agent activates it through the same progressive-disclosure flow.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import SkillPlugin, MemorySkill

unit_converter = MemorySkill(
    name="unit-converter",
    description="Convert between common units. Use when asked to convert miles, kilometers, pounds, or kilograms.",
    instructions="Use the convert script, passing the value and a factor from the conversion_table resource.",
)

@unit_converter.resource
def conversion_table() -> str:
    """Multiplication factors for common conversions."""
    return "miles->km: 1.60934\npounds->kg: 0.453592"

@unit_converter.script
def convert(value: float, factor: float) -> str:
    """Multiply a value by a conversion factor."""
    return str(round(value * factor, 4))

agent = Agent(
    "assistant",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    plugins=[SkillPlugin(unit_converter)],
)
```

Pass a `MemorySkill` straight to `SkillPlugin` (or `SkillsToolkit`) - it is wrapped in a `MemoryRuntime` automatically and appears in the catalog alongside any file-based skills.

### Resources and scripts as callables

The `@skill.resource` and `@skill.script` decorators register Python callables (sync or async). By default the **function name** becomes the resource/script name and the **docstring** becomes its description - pass `name=` or `description=` only to override:

- A **resource** callable runs every time it is read, so it can return live data - current config, a roster, a database lookup - rather than a static file.
- A **script** callable runs **in-process**: no subprocess, no `scripts/` directory. Its parameter JSON-schema is generated from the signature and disclosed inside the loaded skill content, so the model calls `run_skill_script` with named arguments (`{"value": 10, "factor": 2}`) matching that schema. Arguments are validated and coerced exactly as a regular tool's are.

Both forms support dependency injection - a callable can declare `Context`, `Variable`, or `Inject` parameters and they resolve from the live run context, just like a tool:

```python
from typing import Annotated
from autogen.beta import Variable
from autogen.beta.tools import MemorySkill

project = MemorySkill(name="project-info", description="Project status and configuration.")

@project.resource
def environment(region: Annotated[str, Variable("region")]) -> str:
    return f"Region: {region}"
```

### Composing with file-based skills

A `MemorySkill` composes with paths and runtimes like any other source - declaration order decides precedence, and the last source wins on a name clash:

```python
plugins=[SkillPlugin(".agents/skills", unit_converter, project)]
```

Grouping is associative: passing loose `MemorySkill`s, wrapping each in its own `MemoryRuntime`, or grouping several into one `MemoryRuntime(...)` all behave identically.

!!! note
    `MemoryRuntime` is read-only - its skills are defined in code, so it cannot be an install target for `SkillSearchToolkit`.

## SkillSearchToolkit

`SkillSearchToolkit` extends `SkillsToolkit` with three tools for discovering and installing skills from the [skills.sh](https://skills.sh) registry. It uses the GitHub Tarball API directly - no Node.js required.

```python
import asyncio
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import SkillSearchToolkit

agent = Agent(
    "coder",
    "You are a helpful coding assistant. Use skills to extend your capabilities.",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[SkillSearchToolkit()],  # adds search_skills, install_skill, remove_skill
)

async def main() -> None:
    reply = await agent.ask(
        "Find and install a skill for React best practices, then tell me the top 3 rules."
    )
    print(await reply.content())

asyncio.run(main())
```

It inherits all of `SkillsToolkit`'s tools and adds:

| Tool | Description |
| :--- | :--- |
| `search_skills` | Search the skills.sh registry by keyword |
| `install_skill` | Download and install a skill by its registry identifier |
| `remove_skill` | Remove an installed skill by name |

### GitHub rate limits

By default the GitHub API allows 60 unauthenticated requests per hour. Setting a `GITHUB_TOKEN` environment variable raises this to 5,000 per hour:

```bash
export GITHUB_TOKEN=ghp_...
```

You can also pass the token (and other settings) directly via `SkillsClientConfig`:

```python
from autogen.beta.tools import SkillSearchToolkit
from autogen.beta.tools.skills import LocalRuntime, SkillsClientConfig

skills = SkillSearchToolkit(
    LocalRuntime(dir="./my-skills", timeout=30),
    client=SkillsClientConfig(
        github_token="ghp_...",
        proxy="http://proxy.company.com:8080",
    ),
)
```

!!! note
    `SkillSearchToolkit` installs into a single runtime. To combine installed skills with skills from other locations, serve them through a `SkillPlugin` or `SkillsToolkit` that lists multiple runtimes.

---

# Depends

Source: https://docs.ag2.ai/latest/docs/beta/depends/

# Depends

The `Depends` mechanism allows you to calculate and inject dependencies dynamically at execution time.

The key difference with [Dependency Injection](../context/inject){.internal-link} is their execution model. `Inject` is used to retrieve static objects or configurations that have already been created (like an existing database connection or API key). `Depends`, on the other hand, executes a callable function *during* the tool's invocation to resolve the dependency.

Under the hood, `Depends` uses the exact same mechanism and design philosophy as [FastAPI's dependency injection system](https://fastapi.tiangolo.com/tutorial/dependencies/).

## Side-execution

You can use `Depends` to execute side-effects before your tool runs-even if your tool doesn't actually need the return value of the dependency. This is extremely useful for things like authentication, logging, or permission verification.

To do this, simply declare the dependency in your tool's signature. The framework will execute it, and you can safely ignore the injected value.

```python
from typing import Annotated
from autogen.beta import Depends, tool

def verify_permissions(user_id: int) -> None:
    # Perform complex verification here
    # Raises an exception if permissions are invalid
    raise PermissionDenied(user_id)

@tool
def delete_user(
    user_id: int,
    # The dependency is executed, acting as a gatekeeper
    auth: Annotated[None, Depends(verify_permissions)]
) -> str:
    return f"User {user_id} deleted."
```

!!! note "Sync/Async"
    `Depends` can be used with both synchronous and asynchronous functions.

## Depends with yield

Just like in [FastAPI](https://fastapi.tiangolo.com/), you can create dependencies that use `yield` instead of `return`. This allows you to execute "teardown" or "cleanup" code *after* the tool has finished executing.

This is the recommended approach for managing resource lifecycles, such as opening and closing database sessions or file handlers.

```python
def get_db_session():
    print("Opening database session...")
    session = "db_session_object"

    # The tool execution happens here
    yield session

    # This runs after the tool finishes
    print("Closing database session...")

@tool
def fetch_records(
    db: Annotated[str, Depends(get_db_session)],
) -> str:
    return "Records fetched."
```

### Combining Depends and Inject

A powerful pattern is to combine `Depends` with `Inject`. You can use `Inject` to retrieve a static configuration or persistent resource (like a database connection pool), and then use `Depends` to manage a short-lived resource (like a database session) based on that configuration.

```python
from typing import Annotated
from autogen.beta import Depends, Inject, tool, Agent

def get_db_session(
    db_pool: Annotated[Pool, Inject("database_pool")],
) -> Session:
    session = db_pool.acquire()
    yield session
    session.release()

@tool
def fetch_records(
    db_session: Annotated[object, Depends(get_db_session)],
) -> str:
    return "Records fetched."

agent = Agent(
    "TestAgent",
    tools=[fetch_records],
    dependencies={"database_pool": Pool()},
)
```

## Dependencies caching

By default, if multiple parameters in your tool (or multiple sub-dependencies) depend on the exact same `Depends` function, the framework will only execute that function **once** per tool call. The result is cached and reused for any subsequent injections within that specific execution step.

```python
def get_expensive_config() -> dict:
    print("Calculating config...") # This will only print once!
    return {"timeout": 30}

def get_timeout(
    config: Annotated[dict, Depends(get_expensive_config)],
) -> int:
    return config["timeout"]

@tool
def process_data(
    timeout: Annotated[int, Depends(get_timeout)],
    # cached dependency
    config: Annotated[dict, Depends(get_expensive_config)],
) -> str:
    return "Done"
```

If you explicitly want the dependency to be re-calculated every single time it is injected, you can disable the cache by passing `use_cache=False`:

```python
@tool
def random_tool(
    val1: Annotated[int, Depends(get_random_number, use_cache=False)],
    val2: Annotated[int, Depends(get_random_number, use_cache=False)]
) -> str:
    # val1 and val2 will be different numbers
    pass
```

## Dependencies Overrides

During testing, you often need to mock or override complex dependencies (like replacing a production database with a mock test database).

You can easily override any `Depends` function at the agent level using the `dependency_provider`. When the agent executes, it will automatically route all requests for the original dependency to your override function.

```python
from autogen.beta import Agent, tool

def get_production_db():
    raise Exception("Do not call this in tests!")

@tool
def read_data(db: Annotated[object, Depends(get_production_db)]) -> str:
    return "Data"

agent = Agent("TestAgent", tools=[read_data])

# Create a mock function
def get_test_db():
    return "mock_database"

# Override the production dependency with the test dependency
agent.dependency_provider.override(get_production_db, get_test_db)

# When the tool is called, it will use `get_test_db` instead
await agent.ask("Read some data")
```

To override `Inject` dependencies, you can just set `dependencies={...}` in the `ask` call.

```python
agent = Agent("TestAgent", tools=[read_data])

# Override the production `Inject` dependency with the test dependency
await agent.ask(
    "Read some data",
    dependencies={"database_pool": Pool()},
)
```

---

# Middleware

Source: https://docs.ag2.ai/latest/docs/beta/middleware/

# Beta Middleware

Middleware lets you intercept and customize how an **AG2 Beta** agent runs a turn.
It's the right tool when you want to add cross-cutting behavior such as logging, retries, history trimming, request mutation, tool auditing, or guardrails without changing the agent, model client, or tools themselves.

At a high level, middleware can wrap four parts of the runtime:

- the full agent turn
- each LLM call
- each tool execution
- each human input request

This makes it a good fit for behavior that should apply consistently across many runs.

## What is Middleware

Middleware is an object that receives the current turn's initial event and `Context`, then participates in one or more lifecycle hooks.

Each middleware instance is created at the beginning of a turn and can keep per-turn state on `self`.
That same instance can then observe or modify the turn, the LLM call, tool execution, and human input as the run progresses.

In practice, you use middleware to:

- add observability such as logging, tracing, and timing
- enforce policies before a tool runs
- retry transient model failures
- trim conversation history before sending it to the model
- normalize tool inputs or outputs
- short-circuit or reshape a response

## Middleware Hooks

`BaseMiddleware` exposes four async hooks.
You can implement just one of them or mix several in the same class.

### `on_turn()`

```python
class BaseMiddleware:
    async def on_turn(
        self,
        call_next: Callable[[BaseEvent, Context], Awaitable[ModelResponse]],
        event: BaseEvent,
        context: Context,
    ) -> ModelResponse:
        return await call_next(event, context)
```

`on_turn()` wraps the whole agent turn.
It receives the incoming event and the final `ModelResponse`.

Use `on_turn()` when you want to:

- measure total turn latency
- inspect or rewrite the initial request before anything else happens
- inspect or rewrite the final response before it is returned
- implement turn-level policies, approvals, or short-circuit behavior

Conceptually, this is the outermost hook around a single `ask(...)` call.

### `on_llm_call()`

```python
class BaseMiddleware:
    async def on_llm_call(
        self,
        call_next: Callable[[Sequence[BaseEvent], Context], Awaitable[ModelResponse]],
        events: Sequence[BaseEvent],
        context: Context,
    ) -> ModelResponse:
        return await call_next(events, context)
```

`on_llm_call()` wraps the call to the configured model client.
It receives the event history that will be sent to the LLM.

Use `on_llm_call()` when you want to:

- retry transient client failures
- log prompts and responses
- trim history before it reaches the model
- sanitize context / model response
- inject additional request-time instructions through event mutation
- implement caching or request deduplication around model calls

This is the hook used by built-in history and token limiting middleware.

### `on_tool_execution()`

```python
class BaseMiddleware:
    async def on_tool_execution(
        self,
        call_next: Callable[[ToolCallEvent, Context], Awaitable[ToolResultType]],
        event: ToolCallEvent,
        context: Context,
    ) -> ToolResultType:
        return await call_next(event, context)
```

`on_tool_execution()` wraps each tool invocation triggered during the turn.
It receives the current `ToolCallEvent` and returns a `ToolResultType`.

Use `on_tool_execution()` when you want to:

- validate or rewrite tool arguments before execution
- log tool usage
- transform tool results before they go back into the event stream
- capture tool failures and replace them with safer fallback results
- enforce access control around specific tools

For wrapping **one** tool at definition time with async hooks, see [Tool middleware](tools/tool_middleware.md).

### `on_human_input()`

```python
class BaseMiddleware:
    async def on_human_input(
        self,
        call_next: Callable[[HumanInputRequest, Context], Awaitable[HumanMessage]],
        event: HumanInputRequest,
        context: Context,
    ) -> HumanMessage:
        return await call_next(event, context)
```

`on_human_input()` wraps each human-in-the-loop (HITL) request triggered during a turn.
It receives the `HumanInputRequest` emitted by a tool via `ctx.input(...)` and can intercept or modify both the request and the `HumanMessage` response.

Use `on_human_input()` when you want to:

- log or audit human input requests and responses
- rewrite or enrich the prompt shown to the human
- transform the human's reply before it reaches the tool
- short-circuit the request with an automated response instead of asking a human
- enforce policies or rate limits on human input requests

## Registering Middleware

### On an Agent

To make middleware apply to every turn for an agent, pass it through the `middleware` argument when constructing the agent.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.middleware import LoggingMiddleware, RetryMiddleware

agent = Agent(
    "assistant",
    prompt="Be helpful.",
    config=OpenAIConfig("gpt-4o-mini"),
    middleware=[
        LoggingMiddleware(),
        RetryMiddleware(max_retries=2),
    ],
)
```

Use agent-level registration for behavior that should always be present, such as logging, tracing, or default retry policy.

### On a Single Call

You can also add middleware just for a specific turn.
This is useful when you want temporary behavior without changing the agent's defaults.

Both `Agent.ask(...)` and `AgentReply.ask(...)` accept a `middleware` argument.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.middleware import LoggingMiddleware, TokenLimiter

agent = Agent(
    "assistant",
    prompt="Be helpful.",
    config=OpenAIConfig("gpt-4o-mini"),
)

reply = await agent.ask(
    "Summarize the latest messages.",
    middleware=[LoggingMiddleware()],
)

next_turn = await reply.ask(
    "Now answer in one paragraph.",
    middleware=[TokenLimiter(max_tokens=4000)],
)
```

Call-level middleware is appended after the middleware list defined on the agent.

## Middleware Ordering

Middleware runs in the order you register them.
If you register `[A, B, C]`, they enter in the order `A -> B -> C` and unwind in reverse order `C -> B -> A`.

This matters when you combine behaviors such as logging, mutation, and retries.

```python
from autogen.beta import Agent, Context
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import BaseEvent, ModelResponse
from autogen.beta.middleware import AgentTurn, BaseMiddleware

class A(BaseMiddleware):
    async def on_turn(
        self,
        call_next: AgentTurn,
        event: BaseEvent,
        context: Context,
    ) -> ModelResponse:
        print("enter A")
        response = await call_next(event, context)
        print("exit A")
        return response

class B(BaseMiddleware):
    async def on_turn(
        self,
        call_next: AgentTurn,
        event: BaseEvent,
        context: Context,
    ) -> ModelResponse:
        print("enter B")
        response = await call_next(event, context)
        print("exit B")
        return response

class C(BaseMiddleware):
    async def on_turn(
        self,
        call_next: AgentTurn,
        event: BaseEvent,
        context: Context,
    ) -> ModelResponse:
        print("enter C")
        response = await call_next(event, context)
        print("exit C")
        return response

agent = Agent(
    "assistant",
    prompt="Be helpful.",
    config=OpenAIConfig("gpt-4o-mini"),
    middleware=[A, B],
)

await agent.ask(
    "Hello",
    middleware=[C],
)

# Output:
# enter A
# enter B
# enter C
# exit C
# exit B
# exit A
```

## Writing Your Own Middleware

To create custom middleware, subclass `BaseMiddleware` and implement the hooks you need.

If your middleware does not need extra constructor arguments, you can register the class directly.
If it does need configuration, wrap it with `Middleware(...)` when registering it.

```python
import logging
from collections.abc import Sequence

from autogen.beta import Agent, Context
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import BaseEvent, ModelResponse, ToolCallEvent
from autogen.beta.middleware import BaseMiddleware, LLMCall, Middleware, ToolExecution

class AuditMiddleware(BaseMiddleware):
    def __init__(
        self,
        event: BaseEvent,
        context: Context,
        logger: logging.Logger,
    ) -> None:
        super().__init__(event, context)
        self.logger = logger

    async def on_llm_call(
        self,
        call_next: LLMCall,
        events: Sequence[BaseEvent],
        context: Context,
    ) -> ModelResponse:
        self.logger.info("Calling model with %d events", len(events))
        response = await call_next(events, context)
        self.logger.info("Model returned: %s", response)
        return response

    async def on_tool_execution(
        self,
        call_next: ToolExecution,
        event: ToolCallEvent,
        context: Context,
    ):
        self.logger.info("Executing tool: %s", event.name)
        return await call_next(event, context)

agent = Agent(
    "assistant",
    prompt="Be helpful.",
    config=OpenAIConfig("gpt-4o-mini"),
    middleware=[
        Middleware(AuditMiddleware, logger=logging.getLogger("ag2.audit")),
    ],
)
```

### Guidelines for Custom Middleware

- Keep hook behavior focused. Middleware that does one job well is easier to reason about than one that handles, for example, logging, retries, mutation, and policy checks together.
- Prefer `on_turn()` for whole-run behavior, `on_llm_call()` for model-facing behavior, `on_tool_execution()` for tool-facing behavior, and `on_human_input()` for human-in-the-loop behavior.
- Be deliberate when mutating `event`, `events`, or tool results. Later executing middleware and the rest of the runtime will observe those changes.
- Register zero-config middleware classes directly, and use `Middleware(YourMiddleware, ...)` when the constructor needs additional options.

## Built-In Middleware

AG2 Beta currently includes four built-in middleware in `autogen.beta.middleware`:

### `LoggingMiddleware`

```python
from autogen.beta import Agent
from autogen.beta.middleware import LoggingMiddleware

agent = Agent(..., middleware=[LoggingMiddleware()])
```

Logs the lifecycle of a turn, including:

- when a turn starts and finishes
- each LLM call and its response time
- each tool execution and its result

Use it for quick debugging or application-level observability.

### `RetryMiddleware`

```python
from autogen.beta import Agent
from autogen.beta.middleware import RetryMiddleware

agent = Agent(..., middleware=[RetryMiddleware(max_retries=2)])
```

Retries failed LLM calls up to `max_retries` times.
By default it retries any `Exception`, but you can narrow that with `retry_on=...`.

Use it for transient failures such as provider timeouts or flaky network issues.

### `HistoryLimiter`

```python
from autogen.beta import Agent
from autogen.beta.middleware import HistoryLimiter

agent = Agent(..., middleware=[HistoryLimiter(max_events=100)])
```

Trims the event history to a maximum number of events before the model call.
It preserves the first `ModelRequest` when possible and avoids leaving leading orphaned tool results in the trimmed history.

Use it when you want a simple, deterministic cap on context length by event count.

### `TokenLimiter`

```python
from autogen.beta import Agent
from autogen.beta.middleware import TokenLimiter

agent = Agent(..., middleware=[TokenLimiter(max_tokens=1000)])
```

Trims the event history to fit within an approximate token budget before the model call.
It uses a character-based estimate controlled by `chars_per_token`.

Use it when you need lightweight context budgeting without depending on a model-specific tokenizer.

## Conditional Middleware

`ConditionalMiddleware` lets you gate any middleware so each hook only activates when a condition matches the hook's own event. When the condition is not met, that hook passes through to the next middleware in the chain.

This is useful when you have middleware that should only run for certain event types - for example, approval middleware that only fires for a specific tool - without writing condition checks inside every hook method.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import ToolCallEvent
from autogen.beta.middleware import ConditionalMiddleware, Middleware

agent = Agent(
    "assistant",
    prompt="Be helpful.",
    config=OpenAIConfig("gpt-4o-mini"),
    middleware=[
        ConditionalMiddleware(
            Middleware(ApprovalMiddleware),
            condition=ToolCallEvent.name == "execute_code",
        ),
    ],
)
```

In this example, `ApprovalMiddleware` only activates during `on_tool_execution` when the tool name is `"execute_code"`. The `on_turn` hook receives a different event type (`ModelRequest`), so the condition does not match there and the middleware passes through.

Each hook - `on_turn`, `on_tool_execution`, `on_human_input` - checks the condition against its own event. `on_llm_call` checks against the initial turn event, since it receives a sequence of events rather than a single one. When a condition targets a specific event type like `ToolCallEvent`, hooks that receive a different type will naturally pass through.

### Composing Conditions

Conditions support `&` (and), `|` (or), and `~` (not) operators, so you can build expressive gates:

```python
from autogen.beta.events import ToolCallEvent
from autogen.beta.middleware import ConditionalMiddleware, Middleware

# Activate only for tool calls named "search" or "browse"
conditional = ConditionalMiddleware(
    Middleware(MyAuditMiddleware, logger=logger),
    condition=(ToolCallEvent.name == "search") | (ToolCallEvent.name == "browse"),
)

# Activate for all tool calls EXCEPT "calculate"
conditional = ConditionalMiddleware(
    Middleware(MyAuditMiddleware, logger=logger),
    condition=~(ToolCallEvent.name == "calculate"),
)
```

You can also pass a bare event type as the condition - it is automatically wrapped:

```python
# Activate only when the hook receives a ToolCallEvent
conditional = ConditionalMiddleware(
    Middleware(MyAuditMiddleware, logger=logger),
    condition=ToolCallEvent,
)
```

`ConditionalMiddleware` wraps any `MiddlewareFactory` - including `Middleware(...)` instances, bare `BaseMiddleware` subclasses, and other `ConditionalMiddleware` wrappers.

## Choosing the Right Hook

If you are unsure where a behavior belongs, use this rule of thumb:

- Use `on_turn()` when the behavior is about the entire request/response lifecycle.
- Use `on_llm_call()` when the behavior is about what goes into or comes out of the model.
- Use `on_tool_execution()` when the behavior is about tool safety, auditing, or result shaping across tools (or when branching on `event.name` is acceptable).
    - Use **tool-scoped** `middleware=[...]` on `@tool` / `Agent.tool` / `Toolkit.tool` when the behavior applies only to that tool's definition; see [Tool middleware](tools/tool_middleware.md){.internal-link}.
- Use `on_human_input()` when the behavior is about intercepting, logging, or transforming human-in-the-loop requests and responses.

For related runtime customization patterns, see [Tools](tools/tools.md), [Tool middleware](tools/tool_middleware.md), [Prompt Management](../system_prompts), and [Events Streaming](../advanced/stream).

---

# Telemetry

Source: https://docs.ag2.ai/latest/docs/beta/telemetry/

# Beta Telemetry

AG2 Beta includes a `TelemetryMiddleware` that emits [OpenTelemetry](https://opentelemetry.io/) spans for agent turns, LLM calls, tool executions, and human-in-the-loop interactions.

The middleware follows the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/), so traces can be exported to **any compatible backend** -- Jaeger, Grafana Tempo, Datadog, Honeycomb, Langfuse, and others.

## Installation

```bash
pip install "ag2[openai,tracing]"
```

## Quick Start

```python
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter

from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.middleware.builtin import TelemetryMiddleware

# 1. Configure OpenTelemetry
resource = Resource.create(attributes={"service.name": "ag2-beta-quickstart"})
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(tracer_provider)

# 2. Create agent with telemetry middleware
agent = Agent(
    "assistant",
    prompt="You are a helpful assistant.",
    config=OpenAIConfig(model="gpt-4o-mini"),
    middleware=[
        TelemetryMiddleware(
            tracer_provider=tracer_provider,
            agent_name="assistant",
        ),
    ],
)

# 3. Run -- spans are emitted automatically
import asyncio
reply = asyncio.run(agent.ask("What is the capital of France?"))
```

## Trace Hierarchy

Each `ask()` call produces a root span with child spans for LLM calls, tool executions, and human input:

```
invoke_agent assistant
  |-- chat gpt-4o-mini              # LLM API call
  |-- execute_tool get_weather      # tool execution
  |-- chat gpt-4o-mini              # LLM call after tool result
  +-- await_human_input assistant   # human-in-the-loop
```

## Span Types

Every span includes an `ag2.span.type` attribute:

| `ag2.span.type` | Operation name | Triggered by |
|---|---|---|
| `agent` | `invoke_agent` | `on_turn` -- wraps the full agent turn |
| `llm` | `chat` | `on_llm_call` -- each LLM API call |
| `tool` | `execute_tool` | `on_tool_execution` -- each tool invocation |
| `human_input` | `await_human_input` | `on_human_input` -- human-in-the-loop |

## Semantic Attributes

Spans carry standard [OpenTelemetry GenAI attributes](https://opentelemetry.io/docs/specs/semconv/gen-ai/):

| Attribute | Span types | Description |
|---|---|---|
| `gen_ai.operation.name` | All | Operation: `invoke_agent`, `chat`, `execute_tool`, `await_human_input` |
| `gen_ai.agent.name` | agent, human_input | Agent name |
| `gen_ai.provider.name` | agent, llm | LLM provider (e.g. `openai`, `anthropic`) -- auto-detected |
| `gen_ai.request.model` | agent, llm | Model name (e.g. `gpt-4o-mini`) -- auto-detected |
| `gen_ai.response.model` | llm | Resolved model name from response |
| `gen_ai.response.finish_reasons` | llm | Finish reasons (e.g. `["stop"]`, `["tool_calls"]`) |
| `gen_ai.usage.input_tokens` | llm | Prompt token count |
| `gen_ai.usage.output_tokens` | llm | Completion token count |
| `gen_ai.usage.cache_creation_input_tokens` | llm | Tokens used to create prompt cache (Anthropic) |
| `gen_ai.usage.cache_read_input_tokens` | llm | Tokens read from prompt cache (Anthropic, OpenAI, Gemini) |
| `gen_ai.tool.name` | tool | Tool function name |
| `gen_ai.tool.call.id` | tool | Tool call ID |
| `gen_ai.tool.type` | tool | Tool type (always `function`) |

## Content Capture

By default, message content, tool arguments, and results **are** included in spans. To disable content capture for privacy-sensitive environments:

```python
TelemetryMiddleware(
    tracer_provider=tracer_provider,
    agent_name="assistant",
    capture_content=False,  # omits messages, tool args, and results
)
```

When content capture is enabled (the default), spans include these additional attributes:

| Attribute | Span type | Content |
|---|---|---|
| `gen_ai.input.messages` | llm | JSON request messages |
| `gen_ai.output.messages` | llm | JSON response messages |
| `gen_ai.tool.call.arguments` | tool | Tool call arguments (JSON) |
| `gen_ai.tool.call.result` | tool | Tool execution result |
| `ag2.human_input.prompt` | human_input | Prompt shown to human |
| `ag2.human_input.response` | human_input | Human's response |

!!! warning
    With `capture_content=True`, message content, tool arguments, and human input will appear in your tracing backend. Ensure your backend has appropriate access controls.

## Custom Span Attributes

Use `span_attributes` to stamp custom key-value pairs onto spans the middleware emits. This is useful for routing or filtering traces by tenant, environment, deployment, or any other label your backend supports. Every span will carry these attributes.

```python
TelemetryMiddleware(
    tracer_provider=tracer_provider,
    agent_name="assistant",
    span_attributes={
        "deployment": "production",
        "ag2.org.id": "org-abc123",
    },
)
```

!!! tip
    This is the right place to add tenant or organization identifiers when your tracing backend filters traces by span-level labels (for example, Google Cloud Trace label filters or Datadog tags).

!!! note
    If a key in `span_attributes` collides with an intrinsic attribute set by the middleware (such as `ag2.span.type` or `gen_ai.usage.input_tokens`), the middleware's value always wins.

## Configuration

`TelemetryMiddleware` accepts:

| Parameter | Type | Default | Description |
|---|---|---|---|
| `tracer_provider` | `TracerProvider \| None` | Global provider | OpenTelemetry TracerProvider |
| `capture_content` | `bool` | `True` | Include message/tool content in spans |
| `agent_name` | `str \| None` | `"unknown"` | Agent name for span attributes |
| `provider_name` | `str \| None` | `None` | LLM provider name (auto-detected from response if not set) |
| `model_name` | `str \| None` | `None` | Model name (auto-detected from response if not set) |
| `span_attributes` | `dict[str, str] \| None` | `None` | Extra key-value pairs stamped onto every span (tenant IDs, environment tags, etc.) |

## Tool Execution Example

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.middleware.builtin import TelemetryMiddleware
from autogen.beta.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get weather information for a city."""
    return f"Sunny, 72F in {city}"

agent = Agent(
    "weather_agent",
    prompt="Use the get_weather tool to answer weather questions.",
    config=OpenAIConfig(model="gpt-4o-mini"),
    tools=[get_weather],
    middleware=[
        TelemetryMiddleware(
            tracer_provider=tracer_provider,
            agent_name="weather_agent",
        ),
    ],
)
```

## Backend Integration

Since `TelemetryMiddleware` uses standard OpenTelemetry, any OTLP-compatible backend works. See the [V1 tracing documentation](../user-guide/tracing/opentelemetry.md) for setup guides for Grafana Tempo, Jaeger, Langfuse, and other backends. The setup is identical -- only the agent instrumentation method differs.

---

# Testing

Source: https://docs.ag2.ai/latest/docs/beta/testing/

AG2 provides a built-in `TestConfig` utility in the `autogen.beta.testing` module to help you write unit tests for your agents. It allows you to mock LLM responses and simulate tool execution scenarios without making actual API calls.

## How to mock LLM answers

To mock LLM answers, you can use `TestConfig` in place of a standard model configuration. Pass the expected responses as arguments to `TestConfig`. Each argument represents the mocked response for a sequential turn in the conversation.

```python
import pytest

from autogen.beta import Agent
from autogen.beta.testing import TestConfig

@pytest.mark.asyncio
async def test_mock_llm_answer():
    # Provide a TestConfig with the mocked string response
    agent = Agent("test_agent")

    # Ask the agent, passing the TestConfig
    res = await agent.ask(
        "Hi!",
        config=TestConfig("This is a mocked response."),
    )

    # The agent returns the mocked response
    assert res.body == "This is a mocked response."
```

## How to test tool execution

You can also use `TestConfig` to yield tool calls. This allows you to test both successful tool execution and error handling. By providing a `ToolCallEvent` as the first response and a string as the final response, you can simulate a complete agent-tool interaction loop.

### Success case

To test a successful tool execution, pass a `ToolCallEvent` followed by the final answer you expect the LLM to provide after the tool executes.

```python
import pytest

from autogen.beta import Agent
from autogen.beta.events import ToolCallEvent
from autogen.beta.testing import TestConfig

@pytest.mark.asyncio
async def test_tool_success():
    # Define a tool
    def my_tool() -> str:
        return "tool execution result"

    agent = Agent("test_agent", tools=[my_tool])

    # Configure TestConfig to first return a ToolCallEvent, then a final string answer
    test_config = TestConfig(
        ToolCallEvent(name="my_tool"),
        "final result",
    )

    res = await agent.ask("Please use my_tool", config=test_config)

    # After the tool is called and succeeds, the agent returns the second mocked event
    assert res.body == "final result"
```

### Errors

You can test how your agent reacts when a tool raises an exception, or when an unregistered tool is requested by the LLM.

If a tool raises an exception during execution, it will propagate up to the `ask` method. You can catch and assert this exception in your tests.

```python
import pytest

from autogen.beta import Agent
from autogen.beta.events import ToolCallEvent
from autogen.beta.testing import TestConfig

@pytest.mark.asyncio
async def test_tool_raise_exc():
    # Define a tool that raises an error
    def failing_tool() -> str:
        raise ValueError("Something went wrong")

    test_config = TestConfig(
        ToolCallEvent(name="failing_tool"),
        "result",
    )

    agent = Agent(
        "test_agent",
        config=test_config,
        tools=[failing_tool],
    )

    with pytest.raises(ValueError, match="Something went wrong"):
        await agent.ask("Hi!")
```

#### Tool not found

If the LLM attempts to call a tool that hasn't been registered with the agent, a `ToolNotFoundError` is raised.

```python
import pytest

from autogen.beta import Agent
from autogen.beta.events import ToolCallEvent
from autogen.beta.exceptions import ToolNotFoundError
from autogen.beta.testing import TestConfig

@pytest.mark.asyncio
async def test_tool_not_found():
    # Mock the LLM returning a tool call for "unregistered_tool"
    test_config = TestConfig(ToolCallEvent(name="unregistered_tool"))

    # Agent is created WITHOUT any tools
    agent = Agent("test_agent", config=test_config)

    with pytest.raises(ToolNotFoundError, match="Tool `unregistered_tool` not found"):
        await agent.ask("Hi!")
```

---

# AG2 Compatibility

Source: https://docs.ag2.ai/latest/docs/beta/ag2_compatibility/

The `autogen.beta.Agent` is designed to be fully compatible with existing AG2 architectures, including [Group Chats](../user-guide/advanced-concepts/orchestration/group-chat/introduction.md) and [sequential workflows](../user-guide/advanced-concepts/orchestration/sequential-chat.md). By calling the `as_conversable()` method, you can seamlessly integrate beta agents with traditional `ConversableAgent` instances.

This guide explains how to use the new Beta Agents across various chat topologies.

## One-to-one chats

You can initiate a standard chat between a `ConversableAgent` and a Beta `Agent` by converting the beta agent into a conversable format. This enables direct, two-way communication.

```python
from autogen import ConversableAgent, LLMConfig
from autogen.beta import Agent, config

# Define the beta agent
beta_agent = Agent(
    "beta_agent",
    config=config.OpenAIConfig(model="gpt-4o"),
)

# Define a traditional local agent
local_agent = ConversableAgent(
    "local_agent",
    llm_config=LLMConfig({"model": "gpt-4o"}),
)

# Initiate one-to-one chat
result = await local_agent.a_run(
    recipient=beta_agent.as_conversable(),
    message="Hello beta agent!",
    max_turns=2,
)

await result.process()
```

## Sequential chats

You can chain multiple chats together sequentially using `a_initiate_chats` (see the [Sequential Chat](../user-guide/advanced-concepts/orchestration/sequential-chat.md) guide). The beta agents handle their respective tasks in order, acting as recipients in the chat sequence.

```python
from autogen import ConversableAgent, LLMConfig
from autogen.beta import Agent, config

model_config = config.OpenAIConfig(model="gpt-4o")
agent1 = Agent("agent1", config=model_config)
agent2 = Agent("agent2", config=model_config)

local_agent = ConversableAgent(
    "local_manager",
    llm_config=LLMConfig({"model": "gpt-4o"}),
)

chat_results = await local_agent.a_initiate_chats([
    {
        "recipient": agent1.as_conversable(),
        "message": "Analyze this data.",
        "max_turns": 1,
        "chat_id": "analysis-chat",
    },
    {
        "recipient": agent2.as_conversable(),
        "message": "Summarize the analysis.",
        "max_turns": 1,
        "chat_id": "summary-chat",
    },
])
```

## Handoffs

Beta agents fully support AG2's pattern-based [handoff mechanisms](../user-guide/advanced-concepts/orchestration/group-chat/handoffs.md). You can use `AgentTarget` to explicitly dictate which agent should take over when the current agent completes its work.

```python
from autogen import ConversableAgent, LLMConfig
from autogen.agentchat.group.multi_agent_chat import a_run_group_chat
from autogen.agentchat.group import AgentTarget
from autogen.agentchat.group.patterns import DefaultPattern
from autogen.beta import Agent, config

original_agent = ConversableAgent(
    "manager", llm_config=LLMConfig({"model": "gpt-4o"})
)

model_config = config.OpenAIConfig(model="gpt-4o")

agent1 = Agent(
    "researcher", config=model_config
).as_conversable()

agent2 = Agent(
    "reviewer", config=model_config
).as_conversable()

# Define handoffs
original_agent.handoffs.set_after_work(AgentTarget(agent1))
agent1.handoffs.set_after_work(AgentTarget(agent2))
agent2.handoffs.set_after_work(AgentTarget(original_agent))

pattern = DefaultPattern(
    initial_agent=original_agent,
    agents=[original_agent, agent1, agent2],
)

result = await a_run_group_chat(
    pattern=pattern,
    messages="Start the research process.",
    max_rounds=5,
)

await result.process()
```

## Tool-driven handoffs

Beta agent tools can trigger a handoff directly from inside a tool by returning a `ToolResult` with a `target` in its `metadata`. When a `ConversableAdapter` detects this, it forwards the target to the group manager, which routes execution to the specified agent on the next turn. See the [AG2 handoffs guide](../user-guide/advanced-concepts/orchestration/group-chat/handoffs.md) for the full list of available targets.

Use `final=True` alongside the target to end the agent's turn immediately after the tool runs, without invoking the LLM again for a follow-up reply.

```python
from autogen import ConversableAgent, LLMConfig
from autogen.agentchat import a_run_group_chat
from autogen.agentchat.group import AgentTarget
from autogen.agentchat.group.patterns import RoundRobinPattern
from autogen.beta import Agent, ToolResult, config

model_config = config.OpenAIConfig(model="gpt-4o")

reviewer = Agent("reviewer", config=model_config).as_conversable()
writer = Agent("writer", config=model_config).as_conversable()

router = Agent("router", config=model_config)

@router.tool
def submit_for_review(content: str) -> ToolResult[str]:
    """Submit the draft content for review."""
    return ToolResult(
        f"Draft submitted: {content}",
        metadata={"target": AgentTarget(reviewer)},
        final=True,
    )

conversable_agent = ConversableAgent("coordinator", llm_config=LLMConfig({"model": "gpt-4o"}))

pattern = RoundRobinPattern(
    initial_agent=conversable_agent,
    agents=[
        conversable_agent,
        router.as_conversable(),
        writer,
        reviewer,
    ],
)

result = await a_run_group_chat(
    pattern=pattern,
    messages="Write and review a short summary.",
    max_rounds=6,
)

await result.process()
```

## Group chats (autopattern)

You can build dynamic [group chats](../user-guide/advanced-concepts/orchestration/group-chat/introduction.md) using `AutoPattern`, where multiple beta agents and standard agents participate in a shared environment.

```python
from autogen.agentchat.group.multi_agent_chat import a_run_group_chat
from autogen.agentchat.group.patterns import AutoPattern
from autogen.llm_config.config import LLMConfig
from autogen.beta import Agent, config

# Create beta agents
model_config = config.OpenAIConfig(model="gpt-4o")

researcher = Agent(
    "researcher", config=model_config
).as_conversable()

writer = Agent(
    "writer", config=model_config
).as_conversable()

pattern = AutoPattern(
    initial_agent=researcher,
    agents=[researcher, writer],
    group_manager_args={"llm_config": LLMConfig({"model": "gpt-4o"})},
)

result = await a_run_group_chat(
    pattern=pattern,
    messages="Research quantum computing and write a summary.",
    max_rounds=10,
)

await result.process()
```

## Context Variables support

Beta agents deeply integrate with AG2's [`ContextVariables`](../user-guide/advanced-concepts/orchestration/group-chat/context-variables.md), allowing state to be shared effortlessly across group chats and seamlessly accessed inside beta agent tools.

You can inject global variables into the group chat pattern, and read/modify them within any tool via the `Context` object or `Variable()` annotations.

```python
from typing import Annotated
from autogen import ConversableAgent, LLMConfig
from autogen.agentchat.group import ContextVariables
from autogen.agentchat.group.multi_agent_chat import a_run_group_chat
from autogen.agentchat.group.patterns import RoundRobinPattern
from autogen.beta import Agent, Context, Variable, config

beta_agent = Agent(
    "tracker_agent",
    config=config.OpenAIConfig(model="gpt-4o"),
)

# Define a tool that accesses and modifies ContextVariables
@beta_agent.tool
def issue_tracker(
    context: Context,
    issue_count: Annotated[int, Variable(default=0)]
) -> str:
    # Update the shared context variable
    issue_count += 1
    context.variables["issue_count"] = issue_count
    return f"Issue tracked. Total issues: {issue_count}"

local_agent = ConversableAgent(
    "local_agent",
    llm_config=LLMConfig({"model": "gpt-4o"}),
)

# Initialize the pattern with ContextVariables
pattern = RoundRobinPattern(
    initial_agent=local_agent,
    agents=[local_agent, beta_agent.as_conversable()],
    context_variables=ContextVariables({"issue_count": 0}),
)

async def main():
    result = await a_run_group_chat(
        pattern=pattern,
        messages="Please track this new issue.",
        max_rounds=3,
    )

    await result.process()

    # context_variables["issue_count"] will now be updated globally!
    context_variables = await result.context_variables
    print("Final issue count:", context_variables.data["issue_count"])
```

---

# Next Steps

Source: https://docs.ag2.ai/latest/docs/beta/roadmap/

## Completed

- Agent
- Agent Harness (including knowledge, subtasks)
- LLM Configs
- LLM Providers
    - OpenAI
    - Anthropic
    - Gemini (incl. Vertex AI)
    - Ollama
    - DashScope
- Function Tools
- Context Variables
- Dependency Injection
- System Prompt management
- Human-in-the-loop
- Middleware
- AG2 Group Chat compatibility
- OpenTelemetry support
- Structured Output (static / callable / prompted / transformable)
- Built-in Tools
- Multimodality inputs (images, audio, video)
- Shell Tool
- Skills support
- AG-UI
- Subagent delegation
- Evaluation framework (offline mode)
- Multi-Agent Orchestration (aka Network)
- MCP
- Multimodality streaming input: audio / video
- Multimodality output: audio / video output
- Third-party contributions policy
- Multimodality output: images
- Local Code execution tool
- A2A

## In Progress

- Messages aggregation
- Dynamic agents
- Prometheus metrics
- Background agents
- Multi-Agent Network supports multimodality
- Multi-Agent Network supports streaming

## Future Priorities

### P1

- Evaluation (online mode + LLM-as-judge scorers + trajectory scorers + CLI)
- First-class Usage tracking
- Scheduler support
- Repository development skills (code, structure, documentation, tests)

### P2

- A2UI Plugin
- NLIP
- Deferred tools (Tool Search Tool)
- Builtin Memory toolkit

### P3

- Builtin RAG toolkit
- Crawl4AI tool
- Skills Plugin
- Update AG-UI integration
- Allow models to configure tool search params

### P4

- Use codegen to optimize tools configuration
- Msgspec serialization support
- TUI runtime
- Checkpoints and snapshots

---

# Coding with AI Assistants

Source: https://docs.ag2.ai/latest/docs/beta/coding_with_ai/

Set up your AI coding assistant (Claude Code, Cursor, Copilot, Codex, Windsurf, or any agent) so it can build **AG2 Beta** apps with you using current, accurate APIs and examples.

!!! tip "Point your agent at this page"
    Paste this page's link into your assistant and ask it to follow the setup. It can install the AG2 skills and configure itself. Everything below is written to be runnable copy-paste.

## Why this matters

AG2 Beta (`autogen.beta`) is a new, async, protocol-driven API. Models were largely trained on the older `autogen` / `pyautogen` surface, so out of the box an assistant will reach for stale patterns such as synchronous `ConversableAgent`, `initiate_chat`, and the like. The setup on this page gives your assistant three things it otherwise lacks: **AG2-specific skills**, a **Beta-only docs reference**, and **project rules** that keep it on the Beta API.

!!! warning "Beta only"
    This page targets `autogen.beta`. The pre-Beta API (known as the classic API) is being retired at v1.0, so point your assistant at the Beta docs and skills, not legacy `autogen` examples it may have memorized.

## Step 1: Install the AG2 Skills

The most critical and dev-accelerating step.

[ag2-skills](https://github.com/ag2ai/ag2-skills) is a catalog of [Agent Skills](https://agentskills.io/), on-demand instruction packs that teach an assistant how to build with AG2 Beta. Each skill loads only its name and description until it's relevant, then pulls in the full recipe. Skills cover quickstart, custom tools, the multi-agent network, middleware, memory, structured output, evaluation, and more.

The fastest path uses the [`skills` CLI](https://skills.sh):

=== "All skills"
    ```bash
    # Install the full AG2 Beta skill catalog
    npx skills add ag2ai/ag2-skills
    ```

=== "One skill"
    ```bash
    # Install just the quickstart (good first taste)
    npx skills add ag2ai/ag2-skills@ag2-quickstart
    ```

=== "Manual (Claude Code)"
    ```bash
    # Clone and copy individual skills into your user skills directory
    git clone https://github.com/ag2ai/ag2-skills.git
    cp -r ag2-skills/skills/ag2-overview   ~/.claude/skills/
    cp -r ag2-skills/skills/ag2-quickstart ~/.claude/skills/
    ```

!!! tip "Where to start"
    After installing, tell your assistant to load **`ag2-overview`** (a map of AG2 Beta capabilities) and **`ag2-quickstart`** (a minimal working agent). From there it can pull in the specific skill it needs, such as `ag2-add-custom-tool` or `ag2-network-quickstart`.

## Step 2: Point your assistant at the Beta docs

Skills teach patterns; the docs keep your assistant honest about the *exact* current signatures. The biggest risk is your assistant falling back on the **classic** `autogen` API it was trained on (`ConversableAgent`, `initiate_chat`, `GroupChat`) - all retiring at v1.0. Give it Beta-only ground truth:

- **Live Beta docs:** [`https://docs.ag2.ai/latest/docs/beta/agents/`](https://docs.ag2.ai/latest/docs/beta/agents/) - point your agent at this section (most assistants can fetch a URL).
- **Beta docs source (Markdown):** [`ag2ai/ag2/website/docs/beta`](https://github.com/ag2ai/ag2/tree/main/website/docs/beta) - the raw `.mdx` your agent can read directly from GitHub.
- **Beta `llms.txt`:** [`https://docs.ag2.ai/latest/llms.txt`](https://docs.ag2.ai/latest/llms.txt) - a machine-readable index of the Beta docs following the [llms.txt standard](https://llmstxt.org/), plus [`llms-full.txt`](https://docs.ag2.ai/latest/llms-full.txt) for the entire Beta docs in one file. Both are Beta-scoped, so they never point your agent at the classic API.

Then anchor your prompt: *"Build with `autogen.beta` only. If a signature is unfamiliar, check the Beta docs before writing code - do not use the classic `autogen` API."*

!!! warning "Be careful with generic docs indexers for AG2 right now"
    Third-party docs-MCP servers and code indexers typically ingest the **entire** `ag2` repository, which still has the classic API. Pointed at AG2 today they surface `ConversableAgent` / `initiate_chat` examples - the opposite of what you want for Beta. Until a Beta-scoped index exists, rely on the AG2 Skills and the Beta docs links above, which are Beta-only by construction.

## Step 3: Add project rules to your repo

A rules file pins AG2 Beta conventions for every session in your project. The open [AGENTS.md](https://agents.md/) standard is read by Cursor, Copilot, Codex, Gemini CLI, Windsurf, and others; Claude Code reads `CLAUDE.md`. Drop one (or both, a symlink works) at your repo root:

```markdown
# AGENTS.md

This project is built on **AG2 Beta** (`autogen.beta`). Follow these rules.

## API surface
- Import only from `autogen.beta` and its submodules (`autogen.beta.config`,
  `autogen.beta.tools`, ...). Do NOT use the legacy `autogen` / `pyautogen`
  API (`ConversableAgent`, `initiate_chat`) - it is retired at v1.0.
- Agents are async. Use `await agent.ask(...)` and `await reply.ask(...)`.

## Conventions
- Do not use `from __future__ import annotations`.
- Public signatures accept `str | os.PathLike[str]`; use `pathlib.Path` internally.
- Prefer top-level imports; no imports inside functions.

## Docs & skills
- Install and use the AG2 skills: `npx skills add ag2ai/ag2-skills`.
- Beta docs: https://docs.ag2.ai/latest/docs/beta/agents/ (machine-readable index at /latest/llms.txt).
- When unsure of a signature, check the Beta docs before writing code.
- Do NOT trust generic docs indexers - they surface the retired classic API.
```

=== "Claude Code"
    ```bash
    # Claude Code reads CLAUDE.md - symlink it to a single source of truth
    ln -s AGENTS.md CLAUDE.md
    ```

=== "Cursor"
    ```bash
    # Cursor reads AGENTS.md, or project rules under .cursor/rules/
    mkdir -p .cursor/rules
    cp AGENTS.md .cursor/rules/ag2-beta.md
    ```

=== "Copilot"
    ```bash
    # GitHub Copilot reads .github/copilot-instructions.md
    mkdir -p .github
    cp AGENTS.md .github/copilot-instructions.md
    ```

## Per-tool summary

| Assistant | Skills | Beta docs | Project rules |
|---|---|---|---|
| **Claude Code** | `npx skills add ag2ai/ag2-skills` or copy to `~/.claude/skills/` | point at the docs URL | `CLAUDE.md` (symlink to `AGENTS.md`) |
| **Cursor** | `npx skills add ag2ai/ag2-skills` | point at the docs URL | `AGENTS.md` or `.cursor/rules/` |
| **Copilot / Codex / Windsurf** | `npx skills add ag2ai/ag2-skills` | point at the docs URL | `AGENTS.md` (Copilot: `.github/copilot-instructions.md`) |
| **Any agent** | paste `SKILL.md` contents into context | paste the docs (or `llms-full.txt`) | paste the rules above into your prompt |

## Tips & caveats

- **Start from the quickstart.** Have your assistant scaffold from `ag2-quickstart` rather than inventing structure - see [Agent Communication](agents.md) and [Model Configuration](model_configuration.md).
- **Prefer the documented high-level API** (`autogen.beta`, `autogen.beta.tools`, `autogen.beta.config`) over reaching into internals.
- **Always review generated code.** Models still drift toward legacy `autogen` patterns; verify imports come from `autogen.beta` and that calls are `await`ed.
- **Run the tests.** AG2 is async throughout - encourage your assistant to write and run [tests](testing.md) with `TestConfig` instead of hitting a live model.

!!! note "Going further"
    The skill catalog mirrors these docs section-for-section - `ag2-network-quickstart` for [multi-agent networks](network/overview.md), `ag2-structured-output` for [typed responses](structured_output.md), `ag2-evaluation` for the [eval framework](evaluation/evaluation.md), and more. Install the set once and your assistant can pull in whichever it needs.

---

# Contribution Policy

Source: https://docs.ag2.ai/latest/docs/beta/contribution_policy/

AG2 is what it is today because of its contributors. This page explains how contributions will work in the v1.0 era - what lives in Core, what lives as an Extension, and what we ask of contributors in each.

!!! note
    This policy applies to the **AG2 Beta** track (`autogen.beta`), which becomes the official AG2 framework at v1.0. See the [Release Roadmap](../user-guide/release-roadmap.md) for the full v1.0 plan.

## Why a new policy

**AG2 v1.0 is focused on production and scalability.** Teams building real systems on AG2 need a foundation that is small, predictable, and dependable under load. Core is what those teams depend on, and keeping it tight is critical to the v1.0 release.

Over the years AG2 has grown to include many agents, tools, and integrations - many contributed by the community. That is a great strength, but has made the framework heavier than it needs to be and harder to evolve and support.

For v1.0, we want a focused **Core** that AG2 actively maintains, alongside a vibrant set of **Extensions** maintained by the contributors who care about them most.

## Core and Extensions

**Core** is the foundational AG2 Beta framework - maintained by AG2 with contributions from the community and AG2 team, kept lightweight, and held to a strict quality bar.

Today, everything shipped under `autogen.beta` is Core: the Agent runtime, primary LLM providers, common built-in tools, middleware, structured output, telemetry, history management, and the context and dependency-injection primitives.

**Extensions** are first-class components of the AG2 ecosystem - agents, tools, providers, middleware, and integrations that extend what you can build with Core. Most are community-contributed, but AG2 may also author Extensions where it makes sense (for example, integrations we want to ship without expanding Core's dependency surface). The defining characteristic of an Extension is **who maintains it** - every Extension has a named maintainer (AG2 or community) who commits to keeping it working. Typical Extensions include:

- Additional middleware
- LLM providers and third-party model gateways
- Third-party code executors and sandboxing backends
- Domain-specific tools and integrations
- Adapters for external frameworks and services

Reliance on a third-party SDK is a strong signal that something belongs in Extensions, but the deciding question is who is best placed to own it long-term - AG2, or a contributor closer to the integration. An Extension can be just as widely used and just as polished as anything in Core; what differs is the maintenance model.

!!! note "Extensions are first-class"
    An Extension is not a second-tier component. The bar for documentation, tests, and code quality is the same one Core is held to - the difference is who commits to maintaining it over time, not how good it has to be on the day it ships.

## Phase 1 - until v1.0

Core and Extensions live in the same repository. Extensions go under `autogen/beta/extensions/` and ship as part of `pip install ag2`. The contribution flow stays familiar during the transition.

## Phase 2 - at v1.0

- **Core** stays in this repository.
- **Extensions** move to a dedicated repository, with their own documentation alongside.
- A separate **Extensions-archived** repository holds Extensions that are no longer maintained. Archived Extensions remain available and can be revived (see [Maintainership and inactivity](#maintainership-and-inactivity)).

## Contributing to Core

Core is held to the rigour AG2 contributors are used to:

- A core maintainer must agree to support the change.
- Tests with meaningful coverage of new behaviour.
- Type hints, docstrings, and the development standards outlined in documentation and `AGENTS.md`.
- Documentation updated in the same PR where applicable.
- New dependencies added sparingly - Core is intentionally lightweight.
- Security considered - changes must not weaken Core's security, and contributors are expected to assess the security impact of what they ship.

If you are unsure whether a change belongs in Core or as an Extension, open an Issue or draft PR first.

## Contributing an Extension

Extensions are held to the same quality bar as Core; the difference is that the maintenance commitment sits with the Extension's maintainer rather than with AG2. An Extension contribution must:

- Include **documentation** that lets a new user understand and use it - at minimum a module-level docstring summarising the Extension, docstrings on public classes and functions, and a page under the **Extensions** section of the docs. Following standards as noted in [Code Documentation example](#code-documentation-example).
- Include **tests** covering the Extension's behaviour.
- Declare third-party packages as **additional dependencies**, not as `pip install ag2[...]` extras - Extensions are not shipped as pyproject extras. Guard the optional imports with a `try/except ImportError` block in the Extension's `__init__.py` and fall back via `missing_additional_dependency`, so the install hint points at the upstream package directly (for example `pip install "daytona>=0.171.0,<1"`). Use `missing_optional_dependency` (the `ag2[...]` extra form) only for Core modules.
- Pass **code review by AG2** so we can confirm it is safe to ship alongside Core.
- Have a **named maintainer** (you, your team, or a designated successor) who agrees to:
    - respond to issues and PRs,
    - keep the Extension working with current Core releases, and
    - **let us know if you can no longer maintain it**, so we can find a new maintainer or move it to the archive.

AG2 facilitates Extension PR reviews, but ongoing bug fixes and evolution are the maintainer's role.

## Contributing Examples and Showcases

Full, runnable, examples and showcases, using Core or active Extensions, belong in the [build-with-ag2](https://github.com/ag2ai/build-with-ag2) repository, not in Core or Extension repos. This keeps each codebase focused on framework code and gives examples a single, discoverable home.

When Core functionality is deprecated or Extensions are archived, corresponding Examples/Showcases will be removed from the Build-with-AG2 repository.

!!! note
    The responsibility to maintain an Example or Showcase in the Build-with-AG2 repository **is the same** as contributing an Extension, with the exception that they will be removed, not archived, when functionality is deprecated or related Extensions archived.

## Maintainership and inactivity

We do not expect anyone to maintain an Extension forever. What we ask is that maintainers tell us when they cannot continue, so users are not left guessing.

In Phase 2, an Extension is considered for archival if **all** of the following hold:

- It has been broken (failing CI against the latest Core, or an unresolved critical bug) for **30+ days**,
- The maintainer has not responded to issues or PRs for **30 days**, and
- A **30-day notice** has been posted asking for a new maintainer.

If no one steps up, the Extension moves to **Extensions-archived**. Archived Extensions can be revived at any time (with a commitment to maintain it) - open an Issue and we will help bring it back.

## Promoting an Extension to Core

Promotion to Core is an exception, not a planned path. It requires a core maintainer willing to own the component, a track record of stability and adoption as an Extension, and compatibility with Core's dependency and quality expectations. If you think your Extension is a candidate, open an Issue and we will discuss.

## Working together

We are grateful for everyone who has contributed to AG2 - past, present, and future. This policy exists so AG2 can keep growing without losing the stability production users depend on, and so contributors have clear, fair expectations on both sides.

If anything is unclear or you have ideas to improve it, open an Issue or join us on [Discord](https://discord.com/invite/pAbnFJrkgZ).

!!! note
    The repository's [`CONTRIBUTING.md`](https://github.com/ag2ai/ag2/blob/main/CONTRIBUTING.md) will be updated to reflect this policy as we move toward v1.0.

## Code documentation example

Below is a minimal sketch of AG2's coding documentation standard. It includes the AG2 copyright header and [Google-style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) docstrings on the module, public classes, and public functions.

```python
# Copyright (c) 2026, AG2ai, Inc., AG2ai open-source projects maintainers and core contributors
#
# SPDX-License-Identifier: Apache-2.0

"""Acme search Extension for AG2.

Provides a tool that lets agents query the Acme Search API and
receive ranked search results with cited sources. Designed for agents that
need up-to-date web information without managing the HTTP client themselves.

Maintainer: <github-handle>
Docs: https://docs.ag2.ai/extensions/search-tools/acme
Examples: https://github.com/ag2ai/build-with-ag2/extensions/acme-search-tool
"""

from autogen.beta.tools import Tool

class AcmeSearch(Tool):
    """Tool that performs web searches via the Acme Search API.

    The tool issues a single ranked-search request per call. Deduplication
    and re-ranking of results are the agent's responsibility.

    Attributes:
        api_key: Acme Search API key. Sent in the ``Authorization`` header.
        max_results: Maximum number of results to return per query. Must be
            between 1 and 20 inclusive.
    """

    def __init__(self, api_key: str, max_results: int = 5) -> None:
        ...

    async def search(self, query: str) -> list["SearchResult"]:
        """Run a search against the Acme Search API.

        Args:
            query: The natural-language query to send.

        Returns:
            A list of ``SearchResult`` objects, ordered by relevance.

        Raises:
            AcmeSearchAPIError: If the upstream service returns a non-2xx
                status code or a malformed response payload.
        """
        ...
```

---

# Multimodal Inputs

Source: https://docs.ag2.ai/latest/docs/beta/multimodal/inputs/

# Multimodal Inputs

AG2 agents can process images, audio, video, and documents alongside text. The input event system provides a unified API across providers - you create inputs the same way regardless of which model you use.

## Input Types

| Factory Function | Creates | Description |
| :--- | :--- | :--- |
| `ImageInput(...)` | Image input | JPEG, PNG, GIF, WebP |
| `AudioInput(...)` | Audio input | WAV, MP3, OGG, FLAC, AAC |
| `VideoInput(...)` | Video input | MP4, WebM, MOV, MKV, MPEG |
| `DocumentInput(...)` | Document input | PDF, TXT, HTML, Markdown, CSV, JSON, Office formats |

Each factory function supports multiple ways to provide the data:

```python
from autogen.beta.events import ImageInput, AudioInput, VideoInput, DocumentInput

# From a URL
image = ImageInput("https://example.com/photo.jpg")

# From a local file path
image = ImageInput(path="photo.jpg")

# From raw bytes
image = ImageInput(data=raw_bytes, media_type="image/png")

# From a pre-uploaded file ID (provider-specific)
image = ImageInput(file_id="file-abc123")
```

---

## Using Inputs with Agents

Pass inputs directly to `agent.ask()` as positional arguments alongside text:

```python
from autogen.beta import Agent
from autogen.beta.config import GeminiConfig
from autogen.beta.events import ImageInput

agent = Agent(
    "vision_agent",
    "You are a helpful assistant that describes images.",
    config=GeminiConfig(model="gemini-3-flash-preview"),
)

image = ImageInput("https://example.com/photo.jpg")
reply = await agent.ask("Describe this image in detail.", image)
print(reply.body)
```

You can pass multiple inputs in a single request:

```python
image1 = ImageInput("https://example.com/before.jpg")
image2 = ImageInput("https://example.com/after.jpg")

reply = await agent.ask("Compare these two images.", image1, image2)
```

---

## Provider Support

Not all providers support all input types. The table below shows what each provider accepts:

| Input Type | OpenAI | OpenAI Responses | Gemini | Anthropic | xAI | Bedrock |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **Text** | Yes | Yes | Yes | Yes | Yes | Yes |
| **Image (URL)** | Yes | Yes | Yes | Yes | Yes | - |
| **Image (binary)** | Yes | Yes | Yes | Yes | Yes | Yes |
| **Audio (URL)** | - | - | Yes | - | - | - |
| **Audio (binary)** | Yes | - | Yes | - | - | - |
| **Video (URL)** | - | - | Yes | - | - | - |
| **Video (binary)** | - | - | Yes | - | - | Yes |
| **Document (URL)** | - | Yes | Yes | Yes | Yes | - |
| **Document (binary)** | - | - | Yes | Yes | Yes | Yes |
| **File ID** | - | Yes | - | Yes | Yes | - |

If you pass an unsupported input type to a provider, an `UnsupportedInputError` is raised with a clear message indicating what is not supported and by which provider.

---

## Provider-Specific Details

### Gemini

Gemini has the broadest multimodal support - it accepts images, audio, video, and documents in all forms (URL, binary, and local file path).

**YouTube URLs** are supported directly:

```python
from autogen.beta.events import VideoInput

video = VideoInput("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
reply = await agent.ask("Summarize this video.", video)
```

**Google Files API** - for large files (>20MB), upload via the Google Files API first and pass the returned URI:

```python
from google import genai
from autogen.beta.events import VideoInput

client = genai.Client()
uploaded = client.files.upload(file="large_video.mp4")

# Wait for processing to complete
import time
while uploaded.state.name == "PROCESSING":
    time.sleep(2)
    uploaded = client.files.get(name=uploaded.name)

video = VideoInput(uploaded.uri)
reply = await agent.ask("Describe this video.", video)
```

#### Vendor Metadata

Gemini supports provider-specific settings via `vendor_metadata` on binary inputs. These map to Gemini Part fields:

| Key | Type | Description |
| :--- | :--- | :--- |
| `media_resolution` | `str` | Controls token allocation per image/video frame |
| `video_metadata` | `dict` | Video clipping (`start_offset`, `end_offset`) and frame rate (`fps`) |
| `display_name` | `str` | Display name for the file |

**Media resolution** - control quality vs cost tradeoff for images and video frames:

```python
from autogen.beta.events import ImageInput

# Lower resolution = fewer tokens = lower cost
image = ImageInput(
    data=raw_bytes,
    media_type="image/jpeg",
    vendor_metadata={"media_resolution": "MEDIA_RESOLUTION_LOW"},
)
```

Available values: `MEDIA_RESOLUTION_LOW`, `MEDIA_RESOLUTION_MEDIUM`, `MEDIA_RESOLUTION_HIGH`, `MEDIA_RESOLUTION_ULTRA_HIGH`.

**Video clipping and frame rate** - process only a portion of a video or adjust the sampling rate:

```python
from autogen.beta.events import VideoInput

video = VideoInput(
    path="lecture.mp4",
    vendor_metadata={
        "video_metadata": {
            "start_offset": "60s",
            "end_offset": "120s",
            "fps": 0.5,
        },
    },
)
reply = await agent.ask("Summarize this section of the video.", video)
```

**Display name** - attach a name to the file for reference:

```python
from autogen.beta.events import DocumentInput

doc = DocumentInput(
    path="report.pdf",
    vendor_metadata={"display_name": "Q4 Financial Report"},
)
```

### OpenAI

OpenAI supports images via both the Completions and Responses APIs. Audio binary input (WAV, MP3) is supported in the Completions API. The Responses API additionally supports file IDs and document URLs.

#### Vendor Metadata

OpenAI supports `vendor_metadata` for image detail control:

```python
from autogen.beta.events import ImageInput

image = ImageInput(
    data=raw_bytes,
    media_type="image/png",
    vendor_metadata={"detail": "low"},  # "low", "high", or "auto"
)
```

### Anthropic

Anthropic supports images (JPEG, PNG, GIF, WebP) and documents (PDF) via URL, base64, or File ID. Audio and video are not supported.

**File ID** - upload files via the Anthropic Files API (beta) and reference by ID:

```python
import anthropic
from autogen.beta.events import ImageInput, DocumentInput

client = anthropic.Anthropic()

# Upload an image
uploaded = client.beta.files.upload(
    file=("photo.jpg", open("photo.jpg", "rb"), "image/jpeg"),
)

# Reference by file_id - filename determines block type (image vs document)
image = ImageInput(file_id=uploaded.id, filename="photo.jpg")
reply = await agent.ask("Describe this image.", image)
```

#### Vendor Metadata

Anthropic supports `vendor_metadata` for prompt caching on content blocks:

```python
from autogen.beta.events import DocumentInput

doc = DocumentInput(
    path="report.pdf",
    vendor_metadata={"cache_control": {"type": "ephemeral"{{ "}}" }},
)
```

### xAI

xAI supports images (URL and binary), documents (URL and binary), and pre-uploaded file IDs. Audio and video are not currently supported - passing them raises `UnsupportedInputError`.

**File ID** - reference a file previously uploaded via the xAI Files API:

```python
from autogen.beta.events import ImageInput, DocumentInput

image = ImageInput(file_id="file-abc123", filename="photo.jpg")
doc = DocumentInput(file_id="file-xyz789", filename="report.pdf")
```

#### Vendor Metadata

xAI reads `detail` for image quality control from two **different** attributes depending on the input source - `vendor_metadata` for binary, `metadata` for URL. Mixing them up means the value is silently ignored and xAI falls back to `"auto"`.

**Binary image** - set `detail` via `vendor_metadata`:

```python
from autogen.beta.events import BinaryInput, BinaryType

image = BinaryInput(
    raw_bytes,
    media_type="image/png",
    kind=BinaryType.IMAGE,
    vendor_metadata={"detail": "low"},  # "low", "high", or "auto"
)
```

**URL image** - set `detail` via `metadata` (not `vendor_metadata`):

```python
from autogen.beta.events import UrlInput, BinaryType

image = UrlInput(
    "https://example.com/photo.jpg",
    kind=BinaryType.IMAGE,
    metadata={"detail": "low"},
)
```

!!! note
    The factory `ImageInput(url=...)` does not forward `metadata`. To configure `detail` on a URL image, construct `UrlInput` directly as shown above.

**Document filename** - xAI requires a filename for binary documents. When sending raw bytes, either provide one via `vendor_metadata={"filename": ...}`, or rely on the auto-derived fallback (`file.<subtype>` from the media type, e.g. `file.pdf` for `application/pdf`):

```python
from autogen.beta.events import BinaryInput, BinaryType

doc = BinaryInput(
    pdf_bytes,
    media_type="application/pdf",
    kind=BinaryType.DOCUMENT,
    vendor_metadata={"filename": "Q4-report.pdf"},
)
```

### Amazon Bedrock

The Bedrock Converse API accepts **binary sources only** - images (JPEG, PNG, GIF, WebP), documents (PDF, CSV, DOC, DOCX, XLS, XLSX, HTML, TXT, Markdown), and video (MP4, WebM, MOV, MKV, and more; Amazon Nova models). URL inputs and file IDs raise `UnsupportedInputError` - Bedrock has no Files API, so source data from a URL must be downloaded and passed as bytes:

```python
from autogen.beta import Agent
from autogen.beta.config import BedrockConfig
from autogen.beta.events import DocumentInput, ImageInput

agent = Agent(
    "vision_agent",
    "You describe images and summarize documents.",
    config=BedrockConfig(model="us.amazon.nova-lite-v1:0", region_name="us-east-1"),
)

image = ImageInput(path="photo.jpg")
doc = DocumentInput(data=pdf_bytes, media_type="application/pdf")
reply = await agent.ask("Describe the image and summarize the document.", image, doc)
```

!!! note
    Modality support also depends on the **model** behind the Converse API: Amazon Nova models accept images, documents, and video; many others (e.g. DeepSeek) are text-only and return a `ValidationException` from AWS for non-text blocks. The provider raises `UnsupportedInputError` only for inputs the Converse API itself cannot carry.

**Document name** - Converse requires a name for document blocks. It is taken from `vendor_metadata={"filename": ...}` (set automatically when using `path=`), sanitized to the characters Converse allows (alphanumerics, single spaces, hyphens, parentheses, brackets), and falls back to `"document"` when absent:

```python
from autogen.beta.events import BinaryInput, BinaryType

doc = BinaryInput(
    pdf_bytes,
    media_type="application/pdf",
    kind=BinaryType.DOCUMENT,
    vendor_metadata={"filename": "Q4 report.pdf"},
)
```

---

# Image Generation

Source: https://docs.ag2.ai/latest/docs/beta/multimodal/image_generation/

# Image Generation

Some providers can produce images as part of an agent's reply. Generated images are always returned the same way - as a list of `BinaryResult` objects on `reply.files` - regardless of which provider produced them.

AG2 exposes two different mechanisms, because the providers expose two different APIs:

| Provider | Mechanism | How to enable |
| :--- | :--- | :--- |
| OpenAI | `ImageGenerationTool` (server-side tool, Responses API) | Add the tool to the agent |
| Gemini | `IMAGE` response modality on an image model | Set `response_modalities=["TEXT", "IMAGE"]` |

!!! note
    OpenAI's `ImageGenerationTool` is **not** supported on Gemini, and Gemini's image modality is **not** available on OpenAI. Each provider uses its own mechanism below.

## Reading generated images

Every generated image is a [`BinaryResult`](inputs.md) on `reply.files`. It carries the raw bytes and a `metadata` dict; the image's media type is stored under the `media_type` key.

```python
reply = await agent.ask("Generate an image of a red bicycle on a beach.")

for index, image in enumerate(reply.files):
    media_type = image.metadata.get("media_type", "image/png")
    extension = media_type.split("/")[-1]
    with open(f"image_{index}.{extension}", "wb") as file:
        file.write(image.data)
```

`reply.files` is empty when the model returns only text, so it is safe to iterate even when no image was produced.

## OpenAI

Add `ImageGenerationTool` to an agent configured with the **Responses API** (`OpenAIResponsesConfig`). The model decides when to call the tool and the generated image is appended to `reply.files`.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIResponsesConfig
from autogen.beta.tools import ImageGenerationTool

agent = Agent(
    "designer",
    config=OpenAIResponsesConfig(model="gpt-4.1"),
    tools=[
        ImageGenerationTool(
            quality="high",
            size="1024x1024",
            output_format="png",
            background="transparent",
        ),
    ],
)

reply = await agent.ask("Generate a logo for a coffee shop.")
image = reply.files[0]
```

| Parameter | Description |
| :--- | :--- |
| `quality` | `"low"`, `"medium"`, `"high"`, or `"auto"` |
| `size` | e.g. `"1024x1024"`, `"1536x1024"`, or `"auto"` |
| `background` | `"transparent"`, `"opaque"`, or `"auto"` |
| `output_format` | `"png"`, `"jpeg"`, or `"webp"` |
| `output_compression` | 0-100, for jpeg/webp only |
| `partial_images` | 1-3, number of partial images to stream |

!!! warning
    `ImageGenerationTool` requires the Responses API. Using it with the Chat Completions API (`OpenAIConfig`) raises an `UnsupportedToolError`.

## Gemini

Gemini does not use a tool for image generation. Instead, you select an **image-capable model** and request the `IMAGE` response modality via `response_modalities`. The model returns the image inline and AG2 surfaces it on `reply.files`.

```python
from autogen.beta import Agent
from autogen.beta.config import GeminiConfig

config = GeminiConfig(
    model="gemini-3.1-flash-image",
    response_modalities=["TEXT", "IMAGE"],
)

agent = Agent("designer", config=config)

reply = await agent.ask("Generate an image of a friendly robot waving hello.")
image = reply.files[0]
```

!!! note
    Image output requires an image-capable Gemini model (for example `gemini-3.1-flash-image`). Requesting the `IMAGE` modality on a text-only model returns no image. Include `"TEXT"` alongside `"IMAGE"` so the model can still return any accompanying text in `reply.body`.

`response_modalities` is also available on `VertexAIConfig` for Gemini models served through Vertex AI.

### Controlling size and aspect ratio

Gemini does not take a pixel `size` string like OpenAI. Instead, pass a `types.ImageConfig` through `image_config` to set the aspect ratio and a resolution tier:

```python
from google.genai import types

from autogen.beta.config import GeminiConfig

config = GeminiConfig(
    model="gemini-3.1-flash-image",
    response_modalities=["TEXT", "IMAGE"],
    image_config=types.ImageConfig(aspect_ratio="16:9", image_size="2K"),
)
```

| Field | Values |
| :--- | :--- |
| `aspect_ratio` | e.g. `"1:1"`, `"4:3"`, `"3:4"`, `"16:9"`, `"9:16"`, `"21:9"` |
| `image_size` | resolution tier - `"1K"`, `"2K"` (higher tiers are model-dependent) |

`image_config` is a full passthrough of the SDK's `types.ImageConfig`, so any other field it supports (such as `person_generation`) is available too. It is also accepted on `VertexAIConfig`.

## Editing an existing image

To edit an image instead of generating one from scratch, pass it in as an [`ImageInput`](inputs.md) alongside your instruction. The edited image is returned on `reply.files`, exactly like a freshly generated one - so the same image can be sent back in for further rounds of editing.

=== "OpenAI"
    ```python linenums="1"
    from autogen.beta import Agent
    from autogen.beta.config import OpenAIResponsesConfig
    from autogen.beta.events import ImageInput
    from autogen.beta.tools import ImageGenerationTool

    agent = Agent(
        "editor",
        config=OpenAIResponsesConfig(model="gpt-4.1"),
        tools=[ImageGenerationTool(size="1024x1024", output_format="png")],
    )

    reply = await agent.ask(
        "Put a party hat on the robot. Keep everything else the same.",
        ImageInput(path="robot.png"),
    )
    edited = reply.files[0]
    ```

=== "Gemini"
    ```python linenums="1"
    from autogen.beta import Agent
    from autogen.beta.config import GeminiConfig
    from autogen.beta.events import ImageInput

    config = GeminiConfig(model="gemini-3.1-flash-image", response_modalities=["TEXT", "IMAGE"])
    agent = Agent("editor", config=config)

    reply = await agent.ask(
        "Put a party hat on the robot. Keep everything else the same.",
        ImageInput(path="robot.png"),
    )
    edited = reply.files[0]
    ```

`ImageInput` also accepts raw bytes - `ImageInput(data=image.data, media_type="image/png")` - which lets you feed a generated image straight back in for another edit without writing it to disk.

---

# Voice & Realtime Overview

Source: https://docs.ag2.ai/latest/docs/beta/live/live/

`autogen.beta.live` is the AG2 Beta module for building voice-enabled agents. It covers two complementary patterns: a turn-by-turn **STT -> Agent -> TTS** pipeline built on top of a regular `Agent`, and a full-duplex **`LiveAgent`** that streams audio to and from a provider's realtime API.

## When to use which

| Pattern | Class | Latency | Use when... |
|---|---|---|---|
| Turn-by-turn voice | `Agent` + `OpenAITranscriber` + `TTSObserver` | ~1-3 s per turn | You already have a text agent and want to add a voice front-end; you need tool execution, middleware, structured output, or any other text-agent feature. |
| Realtime full-duplex | `LiveAgent` + `OpenAIRealTimeConfig` / `GeminiRealTimeConfig` | <500 ms | You want barge-in, interruption, semantic VAD, or a phone-call-like UX. |

!!! note
    Both patterns share the same audio I/O primitives - [`SoundDevicePlayer`](stt_tts.md) and [`SoundDeviceRecorder`](stt_tts.md) - and the same event stream. You can mix observers (TTS, logging, persistence) across both.

## Installation

The audio I/O classes depend on `sounddevice` and `numpy`; the OpenAI and Gemini integrations need their respective SDKs.

```bash
pip install "ag2[openai,gemini] sounddevice[numpy]"
```

!!! warning
    `SoundDevicePlayer` and `SoundDeviceRecorder` require `sounddevice[numpy]` as an **additional** dependency (not optional)

## The two flows at a glance

=== "Turn-by-turn (STT -> Agent -> TTS)"
    ```python linenums="1"
    import asyncio

    from autogen.beta import Agent, config
    from autogen.beta.live import (
        OpenAITTSConfig,
        OpenAITranscriber,
        SoundDevicePlayer,
        SoundDeviceRecorder,
        TTSObserver,
    )

    agent = Agent(
        name="assistant",
        prompt="You are a helpful voice assistant.",
        config=config.OpenAIResponsesConfig(model="gpt-5", streaming=True),
        observers=[TTSObserver(config=OpenAITTSConfig(model="gpt-4o-mini-tts"))],
    )

    async def main() -> None:
        pipeline = OpenAITranscriber("gpt-4o-mini-transcribe").pipe(agent)

        async with SoundDevicePlayer() as player:
            voice = SoundDeviceRecorder().record(duration=3)
            reply = await pipeline.ask(voice, stream=player.stream)
            print(reply.body)

    if __name__ == "__main__":
        asyncio.run(main())
    ```

=== "Realtime (LiveAgent)"
    ```python linenums="1"
    import asyncio

    from autogen.beta.live import (
        LiveAgent,
        SoundDevicePlayer,
        SoundDeviceRecorder,
        openai,
    )

    agent = LiveAgent(
        name="assistant",
        prompt="You are a helpful voice assistant.",
        config=openai.RealTimeConfig(
            "gpt-realtime-2",
            output=openai.AudioOutput(voice="ballad", speed=1.2),
        ),
    )

    async def main() -> None:
        async with (
            agent.run() as context,
            SoundDevicePlayer(context=context),
            SoundDeviceRecorder(context=context),
        ):
            print("Starting...")
            await asyncio.Future()

    if __name__ == "__main__":
        asyncio.run(main())
    ```

## What's next

- **[STT & TTS](stt_tts.md)** - wrap any `Agent` with speech input and output.
- **[LiveAgent](live_agent.md)** - realtime, low-latency voice agents with OpenAI or Gemini.

---

# Speech-to-Text and Text-to-Speech

Source: https://docs.ag2.ai/latest/docs/beta/live/stt_tts/

The STT -> Agent -> TTS flow turns any existing `Agent` into a voice agent without changing the agent itself. Speech-to-text is added as a **pipeline wrapper**; text-to-speech is added as an **observer** that listens to the model's streamed message chunks.

## Audio I/O primitives

`SoundDeviceRecorder` captures microphone input and `SoundDevicePlayer` plays synthesized speech. Both are thin wrappers around the [`sounddevice`](https://python-sounddevice.readthedocs.io/) library and share the same event stream.

```python
from autogen.beta.live import SoundDevicePlayer, SoundDeviceRecorder

recorder = SoundDeviceRecorder()
voice = recorder.record(duration=5)  # blocks for 5s, returns VoiceInput
```

The recorder produces a `VoiceInput` containing 16-bit PCM bytes plus the sample rate and channel count. The player subscribes to `SynthesizedAudioEvent` on its context's stream and plays each chunk on a background thread.

!!! note
    `Recorder.record(duration=...)` is a one-shot, blocking helper for the turn-by-turn flow. For continuous streaming (used by `LiveAgent`), use the recorder as an async context manager - see [LiveAgent](live_agent.md).

## Speech-to-Text

`OpenAITranscriber` implements the `STTConfig` protocol and exposes a `.pipe(agent)` method that wraps an `Agent` in a `VoicePipeline`. Calling `pipeline.ask(voice)` transcribes the audio and forwards the text to the agent's normal `ask()` flow.

```python
import asyncio

from autogen.beta import Agent, config
from autogen.beta.live import OpenAITranscriber, SoundDeviceRecorder

agent = Agent(
    "assistant",
    config=config.OpenAIConfig("gpt-5", streaming=True),
)

async def main():
    # pipe STT model to agent input
    pipeline = OpenAITranscriber("gpt-4o-mini-transcribe").pipe(agent)
    recorder = SoundDeviceRecorder()

    print("Say something...")
    voice_input = recorder.record(duration=5)
    reply = await pipeline.ask(voice_input)
    print(reply.body)

    print("Say something...")
    voice_input = recorder.record(duration=5)
    # continue the same conversation
    reply = await reply.ask(voice_input)
    print(reply.body)

if __name__ == "__main__":
    asyncio.run(main())
```

`pipeline.ask(...)` returns a `VoiceReply` that exposes the same surface as `AgentReply` (`.body`, `.response`, `.history`, `.ask(...)`) plus `.ask(voice_input)` for the next voice turn. The agent's history is preserved across turns.

!!! tip
    The transcriber emits `TranscriptionChunkEvent` and `TranscriptionCompletedEvent` on the agent's stream as soon as the transcription server starts producing tokens. Subscribe to them to display live captions.

### Translation

If you want the user's speech transcribed **into English** regardless of input language, swap in `OpenAITranslationTranscriber`. It has the same API as `OpenAITranscriber` but uses OpenAI's translation endpoint.

```python
from autogen.beta.live import OpenAITranslationTranscriber

pipeline = OpenAITranslationTranscriber("whisper-1").pipe(agent)
```

## Text-to-Speech

`TTSObserver` is an observer that listens to `ModelMessageChunk` events as the agent streams its response, batches them into sentence-sized chunks, calls a TTS provider, and emits `SynthesizedAudioEvent`s onto the stream. A `SoundDevicePlayer` attached to the same stream then plays them.

```python
import asyncio

from autogen.beta import Agent, config
from autogen.beta.live import OpenAITTSConfig, SoundDevicePlayer, TTSObserver

agent = Agent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=config.OpenAIResponsesConfig(model="gpt-5", streaming=True),
    observers=[
        TTSObserver(config=OpenAITTSConfig(model="gpt-4o-mini-tts")),
    ],
)

async def main() -> None:
    async with SoundDevicePlayer() as player:
        # pass the player's stream so synthesized audio reaches the speakers
        await agent.ask("Hello, agent!", stream=player.stream)

if __name__ == "__main__":
    asyncio.run(main())
```

!!! warning
    The agent's `config` must be set up for **streaming output** (e.g., `streaming=True`). `TTSObserver` works at the `ModelMessageChunk` granularity - if the model emits a single non-streaming `ModelMessage`, the observer will still synthesize it, but you lose the sentence-level pipelining that keeps latency low.

### Voice and speed

`OpenAITTSConfig` accepts the standard OpenAI TTS parameters:

```python
from autogen.beta.live import OpenAITTSConfig

config = OpenAITTSConfig(
    model="gpt-4o-mini-tts",
    voice="ballad",  # alloy, ash, ballad, coral, echo, sage, shimmer, verse...
    speed=1.1,
)
```

## Combining STT and TTS

The full round-trip - voice in, voice out - is just both halves wired up at once: pipe the agent through the transcriber, attach a `TTSObserver`, and share a stream with the player.

```python
import asyncio

from autogen.beta import Agent, config
from autogen.beta.live import (
    OpenAITTSConfig,
    OpenAITranscriber,
    SoundDevicePlayer,
    SoundDeviceRecorder,
    TTSObserver,
)

agent = Agent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=config.OpenAIResponsesConfig(model="gpt-5", streaming=True),
    observers=[
        TTSObserver(config=OpenAITTSConfig(model="tts-1")),
    ],
)

async def main():
    pipeline = OpenAITranscriber("gpt-4o-mini-transcribe").pipe(agent)
    recorder = SoundDeviceRecorder()

    async with SoundDevicePlayer() as player:
        print("Say something...")
        voice_input = recorder.record(duration=3)
        reply = await pipeline.ask(voice_input, stream=player.stream)
        print(reply.body)

        # wait for the audio to finish playing
        player.join()

        print("Say something...")
        voice_input = recorder.record(duration=3)
        reply = await reply.ask(voice_input)
        print(reply.body)

if __name__ == "__main__":
    asyncio.run(main())
```

!!! tip
    `player.join()` blocks the main task until the synthesized audio queue has drained. Use it between turns when you want the assistant to finish speaking before the next recording starts - otherwise the recorder will capture the tail of the assistant's voice.

## What's next

- **[LiveAgent](live_agent.md)** - drop the turn-by-turn round-trip in favor of a streaming, full-duplex realtime session.
- **[Observers](../advanced/observers.md)** - `TTSObserver` is one of many observer patterns; see the harness docs for logging, persistence, and custom observers.

---

# LiveAgent - Realtime Voice Sessions

Source: https://docs.ag2.ai/latest/docs/beta/live/live_agent/

`LiveAgent` is a full-duplex voice agent backed by a provider's realtime API. Unlike the [turn-by-turn STT/TTS pipeline](stt_tts.md), it opens a single bidirectional session for the entire conversation - audio flows in and out continuously, with built-in voice activity detection and barge-in.

## Quick start

A `LiveAgent` holds a `RealtimeConfig` and is opened via `agent.run()`, which yields a `ConversationContext`. Peers (player, recorder, observers) share that context so they all read from and write to the same event stream.

```python
import asyncio

from autogen.beta.live import (
    LiveAgent,
    SoundDevicePlayer,
    SoundDeviceRecorder,
    openai,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=openai.RealTimeConfig(
        "gpt-realtime-2",
        output=openai.AudioOutput(voice="ballad", speed=1.2),
    ),
)

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDevicePlayer(context=context),
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        await asyncio.Future()  # run until cancelled

if __name__ == "__main__":
    asyncio.run(main())
```

!!! note
    The three context managers must share the **same** `context` so the recorder's `RecordedAudioEvent`s reach the provider session and the provider's `SynthesizedAudioEvent`s reach the player.

## Watching the transcript

The realtime provider streams both audio and a text transcript. Subscribe to `ModelMessageChunk` to receive the assistant's transcript token-by-token.

```python
import asyncio

from autogen.beta.events import ModelMessageChunk
from autogen.beta.live import (
    LiveAgent,
    OpenAIRealTimeConfig,
    SoundDevicePlayer,
    SoundDeviceRecorder,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=OpenAIRealTimeConfig("gpt-realtime-2"),
)

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDevicePlayer(context=context),
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        with context.stream.where(ModelMessageChunk).join() as events:
            async for event in events:
                print(event)

if __name__ == "__main__":
    asyncio.run(main())
```

!!! tip
    `stream.where(EventType).join()` gives you an async iterator that yields filtered events. It's the idiomatic way to consume a single event type from the live session without writing a subscriber.

## Text-only output

To keep the realtime session for its low-latency turn detection but disable audio output entirely, swap `AudioOutput` for `TextOutput`. The model returns raw text via `ModelMessageChunk` and never produces synthesized audio.

```python
import asyncio

from autogen.beta.events import ModelMessageChunk
from autogen.beta.live import (
    LiveAgent,
    SoundDeviceRecorder,
    openai,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=openai.RealTimeConfig(
        "gpt-realtime-2",
        output=openai.TextOutput(),
    ),
)

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        with context.stream.where(ModelMessageChunk).join() as events:
            async for event in events:
                print(event)

if __name__ == "__main__":
    asyncio.run(main())
```

## Tools in a realtime session

`LiveAgent` supports the same `@agent.tool` decorator as a regular `Agent`. Tool calls are routed through AG2's normal tool executor, and results are sent back to the provider's realtime session automatically.

```python
import asyncio

from autogen.beta.live import (
    LiveAgent,
    OpenAIRealTimeConfig,
    SoundDevicePlayer,
    SoundDeviceRecorder,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=OpenAIRealTimeConfig("gpt-realtime-2"),
)

@agent.tool
async def sum_numbers(a: int, b: int) -> int:
    """You can use this tool to sum two numbers."""
    print(f"Summing {a} and {b}")
    return a + b

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDevicePlayer(context=context),
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        await asyncio.Future()

if __name__ == "__main__":
    asyncio.run(main())
```

## Providers

`LiveAgent` is provider-neutral - it accepts any `RealtimeConfig`. AG2 Beta ships with two implementations.

=== "OpenAI"
    ```python linenums="1"
    from autogen.beta.live import openai

    config = openai.RealTimeConfig(
        "gpt-realtime-2",
        output=openai.AudioOutput(voice="ballad", speed=1.2),
        input=openai.InputConfig(
            # semantic VAD with interruption is the default
            turn_detection={
                "type": "semantic_vad",
                "create_response": True,
                "interrupt_response": True,
            },
        ),
    )
    ```

    Available voices: `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, `cedar`.

=== "Gemini"
    ```python linenums="1"
    from autogen.beta.live import gemini

    config = gemini.RealTimeConfig(
        "gemini-3.1-flash-live-preview",
        output=gemini.AudioOutput(voice="Puck", language_code="en-US"),
        input=gemini.InputConfig(transcribe=True),
    )
    ```

    Available voices: `Aoede`, `Charon`, `Fenrir`, `Kore`, `Leda`, `Orus`, `Puck`, `Zephyr`.

    !!! warning
        Gemini Live's audio I/O is fixed by the API: **16 kHz mono PCM** input, **24 kHz mono PCM** output. Configure the recorder accordingly:

        ```python
        SoundDeviceRecorder(context=context, sample_rate=16000)
        ```

    **Full Gemini example with a tool**

    ```python linenums="1" hl_lines="14-18 25-26"
    import asyncio

    from autogen.beta.events import ModelMessageChunk, TranscriptionChunkEvent
    from autogen.beta.live import (
        LiveAgent,
        SoundDevicePlayer,
        SoundDeviceRecorder,
        gemini,
    )

    agent = LiveAgent(
        name="assistant",
        prompt="You are a helpful voice assistant. Always respond in English.",
        config=gemini.RealTimeConfig(
            "gemini-3.1-flash-live-preview",
            output=gemini.AudioOutput(voice="Puck", language_code="en-US"),
            input=gemini.InputConfig(transcribe=True),
        ),
    )

    async def main() -> None:
        async with (
            agent.run() as context,
            SoundDevicePlayer(context=context),
            # Gemini Live requires 16 kHz mono PCM input
            SoundDeviceRecorder(context=context, sample_rate=16000),
        ):
            print("Starting...")
            with context.stream.where(ModelMessageChunk | TranscriptionChunkEvent).join() as events:
                async for event in events:
                    print(event)

    if __name__ == "__main__":
        asyncio.run(main())
    ```

## LiveAgent vs Agent

`LiveAgent` mirrors `Agent`'s constructor surface - `name`, `prompt`, `tools`, `middleware`, `observers`, `dependencies`, `variables`, `plugins`, `hitl_hook` - so most agent-level concepts carry over. The differences:

| Feature | `Agent` | `LiveAgent` |
|---|---|---|
| Entry point | `await agent.ask(input)` | `async with agent.run() as context` |
| History | Returned via `AgentReply` | Lives on the session's stream |
| Turn detection | Application-driven (you call `ask`) | Provider-driven (VAD) |
| Structured output | Supported | Not supported |
| `tasks` / `run_subtask` | Supported | Not supported |

If you need both - for example, a realtime voice front-end that hands off to a tasking agent - drive the handoff through a tool on the `LiveAgent` that delegates to a separate `Agent` using `Agent.as_tool()`.

## What's next

- **[STT & TTS](stt_tts.md)** - the lower-latency turn-by-turn alternative.
- **[Tools](../tools/tools.md)** - tool authoring, middleware, and approval flows that all work inside a `LiveAgent`.

---

# Multi-Agent Network Overview

Source: https://docs.ag2.ai/latest/docs/beta/network/overview/

The `autogen.beta.network` module turns one or more `Agent` instances into a **multi-agent network** - a hub-and-spoke topology where a central registry coordinates channel-based, protocol-driven exchanges between named agents.

It is fully **opt-in**. Bare `Agent` continues to work standalone with no behavioural change when this package is not imported. Adopt the network when you need any of the following:

- Multiple agents coordinating on the same task with **enforceable turn order**, expectations, and audit trails.
- A **registry of named agents** that can find each other by capability.
- **Durable messaging** with replayable channel transcripts (write-ahead log).
- **Governance**: per-agent rules (access, rate, inbox, limits), per-adapter expectations, and a hub-side audit log.
- **Sub-task observability**: `agent.task(...)` lifecycle events automatically forwarded to the hub when run inside a network turn.
- **Distributed / cross-process deployment**: replace `LocalLink` with `WsLink` to run the hub and each agent as separate OS processes connected over WebSocket.

## The Mental Model

```
                          ┌────────────────┐
                          │      Hub       │  <-── audit log, registry,
                          │ ── adapters ── │       channels, sweepers,
                          │ ── channels ── │       expectation evaluators
                          │ ── audit log ──│
                          └────────┬───────┘
                                   │  in-process duplex (LocalLink)
                ┌──────────────────┼──────────────────┐
                ▼                  ▼                  ▼
          ┌──────────┐       ┌──────────┐       ┌──────────┐
          │AgentClient│      │AgentClient│      │AgentClient│
          │  alice    │      │   bob     │      │   carol   │
          │  Agent    │      │  Agent    │      │  Agent    │
          └──────────┘       └──────────┘       └──────────┘
```

Each `Agent` is wrapped by an `AgentClient`, which lives behind a `HubClient` and connects to the hub through a `LinkClient`. The hub holds the authoritative state; clients are thin frontends that send and receive `Envelope`s through framed messages on their link.

!!! note
    The default transport is `LocalLink` - same-process duplex queues. Swap it for `WsLink` to run each agent in its own process or on a separate host; the `HubClient` API is unchanged. See [Distributed Deployment](distributed.md).

## Core Concepts

| Concept | Lives in | Purpose |
|---|---|---|
| `Hub` | One per network | Authoritative state: registry, audit log, channel table, write-ahead logs, expectation evaluators, sweepers |
| `Passport` | Hub registry | Stable identity (`name`, `agent_id`, owner, model) - identifies who's on the network |
| `Resume` | Hub registry | Capability claims (`claimed_capabilities`, `domains`, `summary`) plus hub-mutated `observed` track record |
| `Rule` | Hub registry | Per-agent governance: access lists, rate limits, inbox caps, channel-type allowlists |
| `HubClient` | One per process | Process-side connection to the hub; manages registration and bookkeeping |
| `AgentClient` | One per agent | Wraps a single `Agent`; sends envelopes, receives notify frames, runs handlers |
| `HumanClient` | One per human | Non-LLM participant (HITL) - same envelope plumbing, no `Agent`; your UI drives it via push (`on_envelope`) / pull (`next_envelope`) |
| `HubListener` / `HubArbiter` | Registered on the hub | `HubListener` observes state transitions after the fact (the audit log is one); `HubArbiter` is the gatekeeper consulted *before* register / open / send / dispatch decisions |
| `Envelope` | The wire format | Hub-stamped record of one event in one channel: `event_type`, `event_data`, `sender_id`, `audience`, `causation_id` |
| `Channel` | Created by `agent_client.open(...)` | A bounded multi-party exchange governed by an adapter |
| Channel adapter | One of `ConsultingAdapter` / `ConversationAdapter` / `DiscussionAdapter` / `WorkflowAdapter` | Defines the channel's allowed sends, default view policy, expectations, and termination rules |
| `TransitionGraph` | Workflow only | Declarative orchestration: who speaks first, what conditions fire, when to terminate |
| `TaskMirror` | Auto-attached per turn | Bridges `Task` lifecycle events into the hub so capability tags update `Resume.observed` |

## Lifecycle of a Channel

```
agent_client.open(type=..., target=...)
    │
    ▼
INVITED ──┬─ all targets ack ──-> ACTIVE ──┬─ adapter terminates ──-> CLOSED
          │                                │
          └─ ack timeout ──-> CLOSED       └─ explicit channel.close() ──-> CLOSING ──-> CLOSED
                  (expectation violation)         or TTL expired
```

The four built-in adapters differ in how they govern the `ACTIVE -> CLOSED` transition:

| Adapter | Participants | Turn order | Termination |
|---|---|---|---|
| `conversation` | Exactly 2 | Free-form (either side, any time) | Explicit close or TTL |
| `consulting` | Exactly 2 | Strict 1Q1R (initiator -> respondent) | Auto-closes after respondent's reply |
| `discussion` | 2+ | Round-robin via `ordering="round_robin"` | Explicit close or TTL |
| `workflow` | 2+ | Declarative `TransitionGraph` | Graph terminates (`TerminateTarget` / `max_turns`) |

See the [Channel Adapters](adapters_overview.md) overview for the full picture.

## Reading Order

If you're new to AG2 Beta:

1. Start with the [Quick Start](quick_start.md) page. It's the smallest end-to-end network scenario.
2. Then [Hub & Identity](hub_and_identity.md) for the registry-side primitives.
3. Then [Agent Clients](agent_clients.md) for the agent-side primitives.
4. Finally choose an adapter from the [Channel Adapters](adapters_overview.md) overview based on what you're building.

If you're migrating from the classic `GroupChat` / handoff orchestrations:

1. Skim the overview and quick start.
2. Read [Migrating from Group Chat](migration_from_group_chat.md). It maps every classic concept onto its `WorkflowAdapter` counterpart with side-by-side code.

If you're operating an existing deployment:

- [Governance, Audit & Observability](expectations_and_audit.md) covers expectations, the audit log, `HubListener` / `HubArbiter`, and turn-failure handling.
- [Task Observation](task_observation.md) covers per-capability track records.
- [Views & Skills](views_and_skills.md) covers what each agent sees of its own channel and how its capabilities are advertised.
- [Distributed Deployment](distributed.md) covers `WsLink`, `serve_ws`, auth, at-least-once delivery, reconnect, and task durability.

---

# Quick Start

Source: https://docs.ag2.ai/latest/docs/beta/network/quick_start/

The smallest possible end-to-end network scenario: one in-process hub, two agents, a `consulting` channel that auto-closes after a single Q-and-A.

```python
import asyncio

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_CHANNEL_CLOSED,
    EV_TEXT,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
)

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    # Hub: registry + WAL + audit log + adapters live here.
    hub = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub)  # in-process duplex transport

    # Each agent gets its own HubClient (its own duplex pair to the hub).
    alice_hc = HubClient(link, hub=hub)
    bob_hc = HubClient(link, hub=hub)

    alice = await alice_hc.register(
        Agent("alice", prompt="Ask one focused question and stop.", config=config),
        Passport(name="alice"),
        Resume(),
    )
    bob = await bob_hc.register(
        Agent("bob", prompt="Answer in one short sentence.", config=config),
        Passport(name="bob"),
        Resume(),
    )

    # Strict 1Q1R; the adapter auto-closes on bob's reply.
    channel = await alice.open(type="consulting", target="bob")
    await channel.send(
        "What's the single most important property of a distributed system?",
        audience=[bob.agent_id],
    )

    # Bob's default handler runs Agent.ask on the inbound EV_TEXT, sends the
    # reply, and ConsultingAdapter posts EV_CHANNEL_CLOSED.
    close_env = await alice.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=60.0,
    )
    print(f"closed: {close_env.event_data.get('reason')!r}")

    # Replay the conversation from the hub's write-ahead log.
    wal = await hub.read_wal(channel.channel_id)
    for env in wal:
        if env.event_type == EV_TEXT:
            speaker = "alice" if env.sender_id == alice.agent_id else "bob"
            print(f"{speaker}: {env.event_data['text']}")

    await alice_hc.close()
    await bob_hc.close()
    await hub.close()

asyncio.run(main())
```

Expected output (Sonnet's exact words will differ on each run):

```text
closed: 'consulting_complete'
alice: What's the single most important property of a distributed system?
bob: Fault tolerance - because a system that can't survive partial failures defeats its entire purpose.
```

## What Just Happened

In order:

1. **`Hub.open(MemoryKnowledgeStore())`** - boots an in-process hub. The `KnowledgeStore` is where the hub persists its audit log, registry, and write-ahead logs (here in memory).
2. **`LocalLink(hub)`** - a transport factory. Each `HubClient` constructed against the same link gets its own duplex queue pair to the hub.
3. **`HubClient(link, hub=hub)`** - one per process boundary. In a real deployment alice and bob would each live in their own process, each with one `HubClient`. Here they share a process for clarity.
4. **`hc.register(agent, passport, resume)`** - registers an `Agent` with the hub. Returns an `AgentClient` whose `agent_id` is hub-stamped.
5. **`alice.open(type="consulting", target="bob")`** - alice creates a consulting channel with bob as respondent. Internally: hub posts `EV_CHANNEL_INVITE` to bob -> bob's default handler auto-acks -> hub posts `EV_CHANNEL_OPENED` and `alice.open(...)` returns with `channel.state == ACTIVE`.
6. **`channel.send(text, audience=...)`** - alice sends an `EV_TEXT` envelope.
7. **Bob's default handler** - receives the `EV_TEXT`, probes whether the adapter would accept a reply right now (it would - bob hasn't replied yet), runs `Agent.ask(text)`, and sends bob's reply back through bob's own channel handle.
8. **`ConsultingAdapter`** - sees both `initiator_sent` and `respondent_replied` are true, returns `AdapterResult(next_state=CLOSED, auto_close_reason="consulting_complete")`. Hub posts `EV_CHANNEL_CLOSED`.
9. **`alice.wait_for_channel_event(...)`** - alice's loop wakes when she receives the close envelope.
10. **`hub.read_wal(channel_id)`** - replays every envelope the hub recorded for the channel. Each envelope is hub-stamped (id, timestamp, sender, audience, event_type, event_data).

## Mental Hooks

- The `Hub` is the **only authoritative state**. Every send goes through it; every observer reads from it. Clients are thin.
- A `channel_id` is the unit of conversation. The hub's WAL is keyed by channel id; expectation evaluators evaluate per channel; views project per channel.
- Each `AgentClient` carries a `default_handler` that auto-acks invites and runs `Agent.ask` on inbound text. You can replace it with `agent_client.on_envelope(callback)` when you need custom logic.
- The hub assigns the `agent_id` at registration. Use it (`alice.agent_id`) for routing rather than the human-readable name. The name may not be unique under a multi-tenant deployment.

## Where to Next

- [Hub & Identity](hub_and_identity.md) - the registry side: `Hub.open`, `Passport`, `Resume`, `Rule`, auth.
- [Agent Clients](agent_clients.md) - the agent side: `HubClient.register`, default handler, custom handlers.
- [Channel Adapters](adapters_overview.md) - pick the right one: free-form, 1Q1R, round-robin, or graph-driven.

---

# Hub and Identity

Source: https://docs.ag2.ai/latest/docs/beta/network/hub_and_identity/

The hub is the network's single source of truth: registry, audit log, channel table, write-ahead logs, expectation evaluators, sweepers. Every send, observation, and rule check goes through it.

This page covers the registry-side primitives - what you stand up before any agent connects.

## Hub.open

```python
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import Hub

hub = await Hub.open(
    MemoryKnowledgeStore(),
    ttl_sweep_interval=30.0,        # default: 30s
    expectation_sweep_interval=10.0,# default: 10s
    invite_ack_timeout=30.0,        # default: 30s
)
```

| Parameter | Type | Description |
|---|---|---|
| `store` | `KnowledgeStore` | Persistent backing for the audit log, registry, and WAL. `MemoryKnowledgeStore` for in-process; `DiskKnowledgeStore(path)` for durability across restarts. |
| `auth` | `AuthRegistry` \| `None` | Authentication adapters. Defaults to `AuthRegistry.default()` (which has `NoAuth` only). |
| `clock` | `Callable[[], str]` \| `None` | ISO-8601 clock. Tests override with a `MockClock`; production uses the default UTC clock. |
| `ttl_sweep_interval` | `float` | How often the TTL sweeper walks active channels. Set to `0` to disable. |
| `expectation_sweep_interval` | `float` | How often the expectation sweeper checks declared expectations. Set to `0` to disable. |
| `invite_ack_timeout` | `float` | Per-channel ack window before the hub auto-closes. |

`Hub.open(...)` is the factory; the constructor `Hub(...)` exists but doesn't start sweepers. Use `open()` unless you have a reason not to.

!!! note
    All side effects (sweeper tasks, store initialisation) happen inside `Hub.open()`, never in the constructor - this matches the AGENTS.md "no side effects in `__init__`" rule. The same pattern applies throughout the package.

## Identity Model

Three small dataclasses describe an agent on the network:

```python
from autogen.beta.network import Passport, Resume, ResumeExample

passport = Passport(
    name="alice",
    owner="acme",
    model="claude-sonnet-4-6",
)

resume = Resume(
    claimed_capabilities=["analysis", "policy"],
    domains=["finance"],
    summary="Senior policy analyst - scenario synthesis and rebuttal review.",
    examples=[ResumeExample(title="Q3 risk brief", note="...")],
)
```

### Passport - stable identity

| Field | Required | Notes |
|---|---|---|
| `name` | ✓ | Unique within the hub. The human-readable handle. |
| `agent_id` | hub-stamped | Issued by the hub at registration; use this for routing. |
| `owner` | optional | Tenant id for multi-tenant deployments. |
| `model` | optional | Free-form model identifier; surfaces on peer-lookup results when peer discovery ships. |
| `kind` | optional | One of `"agent"`, `"human"`, `"remote_agent"`, or `None`. `None` is the back-compat alias for `"agent"`. `"human"` is set automatically by `hc.register_human(...)`; `"remote_agent"` is reserved for a future federation / A2A bridge. `__post_init__` rejects any other value. Exported as `PassportKind` / `PASSPORT_KINDS`. |
| `created_at` | hub-stamped | ISO-Z timestamp. |

`hub.list_agents(kind=...)` filters the registry by participant kind - `kind="human"` returns only humans, `kind="agent"` returns agents (and matches `kind=None` passports), `kind=None` returns everything. Non-LLM participants are registered via `hc.register_human(...)` - see [HumanClient (HITL)](human_client.md).

### Resume - capability claims and observed track record

| Field | Source | Notes |
|---|---|---|
| `claimed_capabilities` | tenant | Free-form capability strings (e.g. `"research"`, `"summarisation"`). |
| `domains` | tenant | Coarse subject-matter areas. |
| `summary` | tenant | One-line description, indexed for peer lookup. |
| `examples` | tenant | Optional `ResumeExample` records - exemplar past work. |
| `observed` | **hub-mutated** | Per-capability `ObservedStat`: counts of completed/failed/expired tasks plus a `p50_latency_ms`. Updated automatically when an agent runs a capability-tagged `agent.task(...)` inside a network turn - see [Task Observation](task_observation.md). |
| `last_updated` | hub-stamped | ISO-Z, refreshed on every mutation. |

`Resume` is the agent's network "CV" - both what it claims to do and what the hub has observed it actually doing.

## Rule - per-agent governance

Optional per-agent policy. Pass to `hc.register(agent, passport, resume, rule=...)`.

```python
from autogen.beta.network import Rule, AccessBlock, LimitsBlock, RateBlock, InboxBlock

rule = Rule(
    access=AccessBlock(
        outbound_to=["bob", "carol"],   # whitelist who this agent can address
        channel_types_allowed=["consulting", "discussion"],
    ),
    limits=LimitsBlock(
        channel_ttl_default="4h",
        delegation_depth=2,
    ),
    rate=RateBlock(
        envelopes_per_minute=60,
    ),
    inbox=InboxBlock(
        max_pending=100,
    ),
)
```

| Block | Controls |
|---|---|
| `AccessBlock` | Who this agent can talk to; which channel types it can create or join. |
| `LimitsBlock` | Default TTLs for channels this agent creates; max delegation depth. |
| `RateBlock` | Rate limits on outbound envelopes. |
| `InboxBlock` | Inbound queue cap - protects an agent from being flooded. |

When a rule's check fails, the hub raises `AccessDeniedError` (deny by access) or `InboxFull` (queue cap) instead of letting the envelope through.

## Authentication

```python
from autogen.beta.network import AuthAdapter, AuthRegistry, NoAuth

# default: NoAuth registered for the empty scheme
hub = await Hub.open(store)

# explicit:
auth = AuthRegistry()
auth.register("hmac", MyHMACAdapter())
hub = await Hub.open(store, auth=auth)
```

`AuthAdapter` is a `Protocol`:

```python
class AuthAdapter(Protocol):
    async def verify(self, passport: Passport, credentials: AuthBlock) -> None: ...
```

Hub calls `verify(...)` at registration. Raise `AuthError` to reject. The default registry registers `NoAuth` for the empty scheme so the simple in-process flow works with no setup.

## Audit Log

Every governance-relevant event the hub processes lands in `hub.audit_log` (an `AuditLog`, which is itself a registered `HubListener`):

```python
records = await hub.audit_log.read_all()
for r in records:
    print(r["kind"], r["at"], r)
```

Audit kinds (re-exported from `autogen.beta.network`):

| Kind | When |
|---|---|
| `AUDIT_KIND_AGENT_REGISTERED` | `hc.register(...)` |
| `AUDIT_KIND_AGENT_UNREGISTERED` | `hc.unregister(...)` |
| `AUDIT_KIND_RESUME_SET` | `hub.set_resume(...)` |
| `AUDIT_KIND_SKILL_SET` | `hub.set_skill(...)` |
| `AUDIT_KIND_RULE_SET` | `hub.set_rule(...)` |
| `AUDIT_KIND_CHANNEL_CREATED` | `alice.open(...)` |
| `AUDIT_KIND_CHANNEL_CLOSED` | adapter close, explicit close, or `auto_close` violation |
| `AUDIT_KIND_CHANNEL_EXPIRED` | TTL sweep |
| `AUDIT_KIND_TASK_TERMINATED` | A task observed via `TaskMirror` reached a terminal state |
| `AUDIT_KIND_EXPECTATION_VIOLATED` | An expectation evaluator's threshold elapsed |
| `AUDIT_KIND_TURN_FAILED` | A notify handler crashed while processing an inbound envelope (channel stays alive - see resilience note below) |

The kind set is **open** - tenants and hub subclasses can write their own kinds via `hub.audit_log.append(record)`. The audit log is the single trail for compliance and debugging - see [Governance, Audit & Observability](expectations_and_audit.md).

## Reading Hub State

| Call | Returns |
|---|---|
| `await hub.get_channel(channel_id)` | `ChannelMetadata` snapshot. |
| `await hub.get_resume(agent_id)` | Current `Resume`. |
| `await hub.get_passport(agent_id)` | Current `Passport`. |
| `await hub.read_wal(channel_id)` | Ordered list of `Envelope`s in that channel. |
| `hub.audit_log` | The `AuditLog` instance - `await hub.audit_log.read_all()` for every record, `hub.audit_log.subscribe(cb)` to live-tail. |
| `hub.health()` | Cheap in-memory operational snapshot (`active_channels`, `registered_agents`, `pending_inbox_total`, `max_pending_inbox_depth`, `registered_listeners`, `adapters_loaded`, `audit_log_bytes`). |

## Hub TTL & Sweepers

Two background tasks run as long as the hub is open:

- **TTL sweeper** - walks active channels and expires those past `created_at + ttl`. Default TTL comes from `Rule.limits.channel_ttl_default` or the adapter's manifest.
- **Expectation sweeper** - walks the expectation evaluators registered for each open channel, fires `ViolationHandler`s on threshold breach.

Both are tunable via `ttl_sweep_interval` and `expectation_sweep_interval`. Set to `0` to disable for tests; in those cases call `await hub._ttl_tick()` or `await hub._expectation_tick()` manually for deterministic timing.

### Custom sweepers

Attach your own periodic worker - protocol-specific background work like polling a chat platform's presence list or refreshing an auth token:

```python
async def heartbeat() -> None:
    ...  # runs every 30s

hub.register_sweeper("heartbeat", 30.0, heartbeat)
# later, e.g. on a config reload:
await hub.unregister_sweeper("heartbeat")
```

- `register_sweeper(name, interval_seconds, fn)` is **synchronous** (it only updates bookkeeping). If the hub has already started (`Hub.open(...)` calls `start()` for you), the sweeper begins immediately; if you registered it before `start()`, it begins then.
- `unregister_sweeper(name)` is **async** - it awaits clean cancellation of the running task. Unknown names are a no-op.
- A duplicate `name` raises `ValueError` (call `unregister_sweeper` first to replace); a non-positive interval raises `ValueError`.
- `hub.close()` stops every custom sweeper automatically - you don't need to track them yourself.

## Closing Down

```python
await hub.close()
```

Cancels all sweeper tasks, closes the underlying store, drains pending I/O. Always pair `Hub.open(...)` with `hub.close()` (typically in a `try/finally`).

---

# Agent Clients and Handlers

Source: https://docs.ag2.ai/latest/docs/beta/network/agent_clients/

The agent-side of the network. A `HubClient` represents one process's connection to the hub; it produces `AgentClient`s - one per registered `Agent`. Each `AgentClient` carries a notify handler that decides what the agent does when an envelope arrives.

!!! tip
    For a **non-LLM participant** - e.g. a human in the loop - register a `HumanClient` instead via `hc.register_human(...)`; see [HumanClient (HITL)](human_client.md).

## HubClient - one per process

```python
from autogen.beta.network import HubClient, LocalLink

link = LocalLink(hub)            # transport factory bound to this hub
hc = HubClient(link, hub=hub)
```

A `HubClient` is the per-process registration broker. In a single-process script you may construct one per agent (each gets its own duplex link); in a real deployment you'd typically have one `HubClient` per process and register all of that process's agents on it.

| Method | Notes |
|---|---|
| `await hc.register(agent, passport, resume, ...)` | Registers an `Agent` with the hub. Returns an `AgentClient`. |
| `await hc.unregister(agent_id)` | Tears down the registration; emits `AUDIT_KIND_AGENT_UNREGISTERED`. |
| `await hc.close()` | Closes this process's link to the hub. |
| `hc.read_wal(channel_id)` | Read the WAL for any channel this process can see. |
| `hc.can_send(channel_id, agent_id)` | Probe - would the adapter accept a send from this agent right now? |
| `hc.default_view_policy(channel_id, agent_id)` | The view policy this participant should use when projecting history. |

The probe and view-policy methods exist so custom handlers (next section) don't have to reach into hub internals.

## Registering an Agent

```python
agent_client = await hc.register(
    agent,                     # autogen.beta.Agent
    passport,                  # autogen.beta.network.Passport
    resume,                    # autogen.beta.network.Resume
    skill_md=None,             # optional - markdown describing this agent's skill
    rule=None,                 # optional - Rule(...) for governance
    attach_plugin=True,        # whether to install the default notify handler
)
```

`attach_plugin=True` (the default) installs the `default_handler` on this client. Pass `False` if you want full control over inbound envelope handling - typical for headless workers, gateways, or custom orchestration logic.

## AgentClient

| Attribute / Method | Notes |
|---|---|
| `agent_client.agent_id` | Hub-stamped id; use for routing. |
| `agent_client.agent` | The wrapped `Agent`. |
| `agent_client.passport` / `.resume` | Snapshot at registration time. |
| `await agent_client.open(type=..., target=..., knobs=...)` | Open a new channel. |
| `await agent_client.send_envelope(envelope)` | Direct envelope send (for custom event types). |
| `await agent_client.wait_for_channel_event(channel_id, predicate, timeout=...)` | Block until a matching envelope lands in this client's inbox. |
| `agent_client.on_envelope(callback)` | Replace the handler. |

`channel = await agent_client.open(...)` returns a `Channel` handle scoped to one channel id. Use `channel.send(text, audience=...)` for ordinary text sends, `channel.close()` for explicit termination, `channel.info()` for the latest metadata.

## The Default Handler

When you register with `attach_plugin=True`, the client installs `default_handler`. It routes inbound envelopes:

| Inbound event | Behaviour |
|---|---|
| `EV_CHANNEL_INVITE` | Auto-ack with `EV_CHANNEL_INVITE_ACK`. |
| `EV_TEXT` / `EV_HANDOFF` | If `hc.can_send(...)` says it's our turn: read the WAL up to this envelope, project it through this participant's view policy, pre-populate a fresh `MemoryStream`, attach a `TaskMirror`, run `agent.ask(text, stream=stream, dependencies=...)`, and send any non-empty reply via `channel.send(...)`. |
| `EV_CHANNEL_*` (other) | No-op - bookkeeping is reflected in the next `channel.info()`. |
| `ag2.task.*` | No-op at the handler level - `TaskMirror` handles these separately when attached. |

The handler wraps the **entire** turn path (WAL slice, view projection, `extract_turn_input`, `agent.ask`, round-envelope build, outbound send) in a single trap: a crash is routed through `HubClient.report_turn_failure` -> `Hub.report_turn_failure`, which fans `on_turn_failed` out to every `HubListener` (the built-in `AuditLog` records `AUDIT_KIND_TURN_FAILED`). No reply is posted, but the channel stays active and the next envelope flows normally - see [Turn-failure resilience](expectations_and_audit.md).

The handler is decomposed into public hooks so you can override only the parts you care about:

```python
from autogen.beta.network import (
    read_wal_until,
    resolve_view_policy,
    stamp_dependencies,
)
```

| Hook | Purpose |
|---|---|
| `read_wal_until(client, envelope)` | Slice the WAL up to but excluding the given envelope. |
| `resolve_view_policy(client, metadata)` | The `ViewPolicy` this participant should use. |
| `stamp_dependencies(client, channel)` | The `context.dependencies` dict the LLM turn will see (`CHANNEL_DEP`, `AGENT_CLIENT_DEP`, `HUB_DEP`). |

## Custom Handlers

```python
from autogen.beta.network import Envelope, EV_TEXT

async def gateway_handler(envelope: Envelope) -> None:
    if envelope.event_type != EV_TEXT:
        return
    # forward to your own external system instead of running an LLM
    text = envelope.event_data.get("text", "")
    await my_external_queue.put({"from": envelope.sender_id, "text": text})

agent_client.on_envelope(gateway_handler)
```

Common patterns:

- **Headless worker** - register with `attach_plugin=False` and install a handler that pulls work directly off the hub without running an LLM.
- **Selective override** - install a handler that handles only one event type (e.g. custom invite policy) and falls back to `default_handler` for everything else.
- **Filtered forwarding** - wrap `default_handler` with pre/post hooks for logging, rate limiting, or routing.

## Transport - LocalLink

```python
from autogen.beta.network import LocalLink, LinkClient, LinkEndpoint

link = LocalLink(hub)
client_link = link.client()  # produces a fresh LocalLinkClient bound to a fresh LocalLinkEndpoint
```

`LocalLink` is the in-process transport. Each `HubClient(link, hub=hub)` lazily creates one `LocalLinkClient`/`LocalLinkEndpoint` pair on first use:

- `LocalLinkClient` - the agent-process side; sends frames toward the hub, receives notify frames from it.
- `LocalLinkEndpoint` - the hub-process side; the inverse.

Both sides exchange `Frame` records via async queues. Frame types are re-exported at the package level: `HelloFrame`, `WelcomeFrame`, `SendFrame`, `ReceiptFrame`, `NotifyFrame`, `AcceptFrame`, `ErrorFrame`, `PingFrame`, `PongFrame`, `SubscribeFrame`, `UnsubscribeFrame`.

The transport layer is a `Protocol`:

```python
class LinkClient(Protocol):
    async def open(self) -> None: ...
    async def send_frame(self, frame: Frame) -> None: ...
    def frames(self) -> AsyncIterator[Frame]: ...
    async def close(self) -> None: ...
```

Cross-process or cross-host transports plug in here. `LocalLink` is the only built-in, but the abstraction allows for future Redis/WebSocket/gRPC implementations without changing any client code.

## Inbox & Backpressure

Every `AgentClient` has an inbox bounded by its `Rule.inbox.max_pending` (default unbounded). When the inbox fills, sends to that agent fail with `InboxFull`. The `wait_for_channel_event` and the default handler drain the inbox in order; custom handlers should do the same - never block forever in a callback.

`Rule.inbox.high_water` is a soft threshold below the hard cap: when a dispatch first pushes a recipient over it, the hub fires `on_inbox_pressure(agent_id, pending, cap)` on every listener (and on a `Hub` subclass override) - once per crossing, not on every subsequent envelope. It defaults to `None`, which resolves to 80% of `max_pending`; set `0` to disable the signal. See [HubListener](expectations_and_audit.md#hublistener-observing-state-transitions).

## Closing Down

```python
await alice_hc.close()
await bob_hc.close()
await hub.close()
```

`HubClient.close()` cancels the link's listening task and unsubscribes all clients. Always pair with the matching `register` calls; otherwise the hub keeps the registration in its registry.

---

# Human Clients (HITL)

Source: https://docs.ag2.ai/latest/docs/beta/network/human_client/

A `HumanClient` is a **non-LLM participant** on the network - the human-in-the-loop primitive. It is a client in the network, just like LLM agents, so the hub routes envelopes to it exactly the same way.

Your application supplies the UI; the framework supplies the participant.

Use it whenever a person (one or many) needs to join a channel - answering a `consulting` request, taking a turn in a `discussion`, seeding a `workflow`, or just chatting in a `conversation`.

## Registering

```python
from autogen.beta.network import HubClient, LocalLink, Passport

hc = HubClient(LocalLink(hub), hub=hub)

human = await hc.register_human(
    Passport(name="operator"),
    resume=None,            # optional - Resume() defaults
    rule=None,              # optional - Rule(...) for governance, same as agents
    auto_ack_invites=True,  # auto-accept channel invites (see below)
)
```

`register_human` runs the same UUID-stamping and persistence path as `hc.register(...)`, then forces `passport.kind = "human"` so the participant is discoverable as a human:

```python
await hub.list_agents(kind="human")   # -> [Passport(name="operator", kind="human", ...)]
await hub.list_agents(kind="agent")   # agents only (also matches kind=None)
```

`hc.register(...)` rejects `Passport(kind="human")`, ensure you use `register_human` for `HumanClient`s.

## Receiving - push or pull

A `HumanClient` exposes inbound envelopes two ways. Both see every inbound envelope; use whichever fits your UI (or both at once).

### Push - `on_envelope`

Register a coroutine; it fires once per inbound envelope. Multiple callbacks compose in registration order. A callback that raises exceptions is logged and **never** propagates to the hub's dispatch path - a buggy UI cannot break the network.

```python
async def on_inbound(envelope) -> None:
    await ui.push_event(envelope)   # forward to a websocket, queue, etc.

human.on_envelope(on_inbound)
human.remove_envelope_callback(on_inbound)   # detach later
```

### Pull - `next_envelope` / `envelopes`

Block until the next matching envelope arrives, or iterate the inbound stream:

```python
from autogen.beta.network import EV_TEXT

# Wait for the next text reply from a specific peer.
reply = await human.next_envelope(
    predicate=lambda e: e.event_type == EV_TEXT and e.sender_id == peer_id,
    timeout=60.0,   # raises asyncio.TimeoutError if exceeded
)

# Or stream everything until disconnect.
async for envelope in human.envelopes():
    ...

# Channel-scoped wait (symmetric with AgentClient.wait_for_channel_event):
env = await human.wait_for_channel_event(
    channel_id=channel.channel_id,
    predicate=lambda e: e.event_type == EV_TEXT,
    timeout=300.0,
)
```

Envelopes that don't match a `next_envelope` predicate are discarded - use `on_envelope` if you want to observe everything *and* await something specific.

## Sending

Outbound mirrors `AgentClient`. `human.open(...)` returns the same `Channel` handle, so multi-turn channel code is identical whether the initiator is a human or an agent.

```python
from autogen.beta.network import CONVERSATION_TYPE

channel = await human.open(type=CONVERSATION_TYPE, target=expert.agent_id)
await channel.send("Hi - what's a good first ML concept to learn?")
await channel.close(reason="done")

# Convenience for an existing channel id:
await human.send(channel_id, "another message")

# Escape hatch for adapter-shaped envelopes (e.g. a workflow EV_PACKET seed):
await human.post_envelope(envelope)
```

| Call | Notes |
|---|---|
| `await human.open(type=, target=, ttl=, knobs=, intent=, labels=)` | Open a channel as the initiator -> `Channel`. `target` accepts peer names or agent ids. |
| `await human.send(channel_id, text, audience=, causation_id=)` | Post an `EV_TEXT` envelope. |
| `await human.post_envelope(envelope)` | Post an arbitrary envelope (stamps `sender_id` if blank). |
| `await human.close_channel(channel_id, reason=)` | Close a channel this human is in. |
| `await human.disconnect()` | Stop accepting deliveries; wakes any blocked `next_envelope` / `envelopes` consumers. Idempotent - call it in your shutdown path. |

## Channel invites - `auto_ack_invites`

When an agent opens a channel to a human, the hub waits for the human's `EV_CHANNEL_INVITE_ACK` before the channel reaches `ACTIVE`. With `auto_ack_invites=True` (the default) the `HumanClient` acks automatically the moment the invite arrives - the channel handshake completes with no UI round-trip, exactly like the default agent handler.

Pass `auto_ack_invites=False` if you want a human to *decide* whether to join (an "accept invite?" prompt). If so, your UI is responsible for emitting the ack:

```python
from autogen.beta.network import EV_CHANNEL_INVITE, EV_CHANNEL_INVITE_ACK, Envelope

human = await hc.register_human(Passport(name="operator"), auto_ack_invites=False)

async def gate_invites(envelope) -> None:
    if envelope.event_type != EV_CHANNEL_INVITE:
        return
    if await ui.confirm(f"Join channel {envelope.channel_id}?"):
        await human.post_envelope(Envelope(
            channel_id=envelope.channel_id,
            sender_id=human.agent_id,
            event_type=EV_CHANNEL_INVITE_ACK,
            event_data={"channel_id": envelope.channel_id},
            causation_id=envelope.envelope_id,
        ))
    # otherwise let the hub's invite-ack timeout close the channel

human.on_envelope(gate_invites)
```

## Hooking up a UI

The framework deliberately doesn't pick an input modality - you bridge the `HumanClient` to whatever UI you have. Two common shapes:

### A web app / websocket bridge (push out, RPC in)

Forward inbound envelopes to the client over a websocket; turn UI messages into sends.

```python
async def serve(websocket, human):
    # outbound: hub -> browser
    async def to_browser(envelope) -> None:
        await websocket.send_json({
            "channel": envelope.channel_id,
            "from": envelope.sender_id,
            "type": envelope.event_type,
            "data": envelope.event_data,
        })
    human.on_envelope(to_browser)

    # inbound: browser -> hub
    try:
        async for msg in websocket:
            payload = msg.json()
            if payload["action"] == "send":
                await human.send(payload["channel"], payload["text"])
            elif payload["action"] == "open":
                await human.open(type=payload["type"], target=payload["target"])
            elif payload["action"] == "close":
                await human.close_channel(payload["channel"], reason="user_closed")
    finally:
        human.remove_envelope_callback(to_browser)
        await human.disconnect()
```

### A console / CLI loop (pull)

`input()` is blocking - run it off the event loop with `asyncio.to_thread` so the network stays responsive while the user types.

```python
import asyncio
from autogen.beta.network import CONVERSATION_TYPE, EV_TEXT

channel = await human.open(type=CONVERSATION_TYPE, target=expert.agent_id)

while True:
    text = (await asyncio.to_thread(input, "you> ")).strip()
    if not text or text.lower() in {"quit", "exit"}:
        break
    await channel.send(text)
    try:
        reply = await human.next_envelope(
            predicate=lambda e: e.event_type == EV_TEXT and e.sender_id == expert.agent_id,
            timeout=60.0,
        )
        print(f"expert> {reply.event_data['text']}")
    except asyncio.TimeoutError:
        print("expert> (no reply within 60s)")

await channel.close(reason="operator_done")
await human.disconnect()
```

!!! note "Drain rate is yours to manage"
    The pull queue is unbounded by design - the embedder controls how fast it's drained via the UI. If it grows pathologically, the application has a UI bug to fix. Always call `human.disconnect()` on shutdown so blocked consumers wake up instead of hanging.

## See also

- [Agent Clients and Handlers](agent_clients.md) - the LLM-side counterpart.
- [Hub & Identity](hub_and_identity.md) - `Passport.kind`, `Rule`, registration.
- [Human in the Loop](../context/human_in_the_loop.md) - pausing a *single* `Agent` mid-run for input (a different, in-process mechanism).

---

# Network-Assigned Tools

Source: https://docs.ag2.ai/latest/docs/beta/network/network_assigned_tools/

When you call `HubClient.register(agent, ...)` with the default `attach_plugin=True`, the framework attaches `NetworkPlugin` to your agent. The plugin does two things:

1. Adds an **assembly policy** that prefixes every LLM call with the agent's name and id.
2. Adds the **identity-level tools** to `agent.tools`: `delegate`, `peers`, `channels`, `tasks`, `context`. These are stable for the life of the registration and work in any channel context - discovery, channel lifecycle, task observation, and the one-shot `delegate` convenience.

The per-turn tool list an agent actually sees is the **union of two streams**:

| Stream | Where it comes from | Examples |
|---|---|---|
| **Identity-level** | `NetworkPlugin`, attached once at registration | `delegate`, `peers`, `channels`, `tasks`, `context` |
| **Channel-level** | `adapter.tools_for(...)`, resolved per turn by the default handler and merged into `agent.ask(tools=...)` | `say` (text channels), user-authored `Handoff`-returning tools (workflow) |

So `say` is **not** on every agent's `agent.tools` - it's contributed by the adapter that owns the channel, only when that adapter accepts free-form text and only when it's this participant's turn. A workflow participant never sees `say`; it routes via handoff tools instead. See [Channel-level tools](#channel-level-tools-adaptertools_for) below.

## `delegate` - the flat hot-path tool

| Tool | Signature | Purpose |
|---|---|---|
| `delegate` | `delegate(target, prompt, capability?, timeout=300)` | One-shot consult: open a `consulting` channel with `target`, send `prompt`, await the single reply, return its text. |

`delegate` is the most common cross-cutting verb because "ask one specialist a question, take their answer" is the canonical multi-agent pattern - and unlike `say` it isn't tied to a channel the agent is already in, so it lives at the identity level.

```python
# The LLM emits, e.g.:
#   delegate(target="bob", prompt="What's the right way to model X?", capability="modeling")
```

The framework resolves `ChannelInject` (current channel) and `AgentClientInject` (calling agent's hub client) automatically inside the notify handler, so the LLM never sees those parameters.

## Four grouped action-dispatch tools

Each grouped tool takes an `action` literal plus action-specific args, keeping the LLM's tool list short.

### `peers(action)` - discovery

| Action | Args | Returns |
|---|---|---|
| `"find"` | `query?, capability?, sort_by?, limit=20` | List of peer summaries (excludes the calling agent). |
| `"describe"` | `name` | One peer's full profile: `{passport, resume, skill_md}`. `skill_md` falls back to a rendered passport+resume when no `SKILL.md` is registered. |

### `channels(action)` - lifecycle

| Action | Args | Returns |
|---|---|---|
| `"list"` | `state="active"\|"all"` | Channels this agent participates in. |
| `"open"` | `type, target, knobs?, intent?, ttl?, message?` | Mirrors `AgentClient.open`. Returns `{channel_id, type, participants}` - plus `seed_envelope_id` when `message` was supplied. |
| `"info"` | `channel_id` | Full `ChannelMetadata` if the agent is a participant. |
| `"close"` | `channel_id?` (defaults to current) | Closes the channel with reason `"closed_by_agent"`. |

`"open"` accepts an optional `message`: when set, the tool opens the channel and - once it transitions to `OPENED` - posts that text as the first envelope on the initiator's behalf, in the same call. If the seed send fails, the just-opened channel is closed (`reason="seed_failed"`) so you never leave a dangling-open channel nobody ever sends into. Use it for short-lived channels where the agent wants "open and ask" to be one atomic step rather than two tool calls.

### `tasks(action)` - task lifecycle

Two halves: **active actions** (the agent is inside its own `agent.task(...)` block) and **observation actions** (any task the hub has observed).

| Action | Half | Args | Returns |
|---|---|---|---|
| `"progress"` | active | `payload` | Emits `TaskProgress` on the active task. |
| `"complete"` | active | `result?` | Terminal - emits `TaskCompleted`. |
| `"list"` | observation | `scope="own"\|"all", state="active"\|"all", limit=20` | Task summaries. |
| `"status"` | observation | `task_id` | Refreshed `TaskMetadata`. |
| `"wait"` | observation | `task_id, timeout=300, poll_interval=0.1` | Blocks until the task reaches a terminal state. |
| `"cancel"` | - | - | Not implemented; returns an error placeholder. |

`"start"` is intentionally **not** a tool - calling it from the LLM would bypass the `async with agent.task(...)` lifecycle that scopes `TaskInject` correctly. Owners start tasks in their own code; the LLM uses `"progress"`/`"complete"` once a task is active, and `delegate` for one-shot remote work.

### `context(action)` - past content

| Action | Args | Returns |
|---|---|---|
| `"search"` | `query, scope="channel"\|"knowledge", limit=10` | Excerpts of envelopes whose text matches `query` (case-insensitive substring). |
| `"quote"` | `speaker, recent_n=1, channel_id?` | The last `recent_n` `EV_TEXT` envelopes from `speaker` in the current (or specified) channel. |

`scope="knowledge"` reaches into the calling agent's own `KnowledgeStore`. Substring search only - for vector / semantic search, the agent's own loop calls into framework-core `recall` directly.

## Channel-level tools - `adapter.tools_for`

The tools above are *identity-level* - the same on every turn. The tools an agent needs to actually **participate** in a given channel depend on the channel's protocol and on whose turn it is, so they come from the adapter, not the plugin:

```python
def tools_for(
    self,
    client: AgentClient,
    metadata: ChannelMetadata,
    state: AdapterState,
    participant_id: str,
) -> list[Tool]: ...
```

The default handler calls `adapter.tools_for(...)` once per turn and merges the result into the per-call `agent.ask(tools=...)` override. The built-in adapters use it like this:

| Adapter | `tools_for` returns | Gating |
|---|---|---|
| `conversation` | `[say]` | Always - no turn order, both participants can post any time. |
| `discussion` | `[say]` on your round, else `[]` | Round-robin: `say` only when `state.expected_next_speaker` is you. |
| `consulting` | `[say]` to whoever holds the floor, else `[]` | Initiator until the prompt is sent; respondent after, until the one reply lands; then the channel auto-closes and nobody gets `say`. |
| `workflow` | `[]` | Workflow agents route via user-authored `Handoff`-returning tools (already on `agent.tools`) - see [Workflow](workflow.md#writing-a-handoff-tool). |

`say(content, audience?, channel_id?)` posts `EV_TEXT` into the active channel (`audience` is a list of peer **names**, resolved to ids; `None` broadcasts). Internally it builds the envelope via `adapter.build_text_envelope(...)` - the same Layer-2 helper a non-AG2 bridge would call - so per-adapter envelope shaping is honoured automatically. See [Adapters Overview -> The Adapter Protocol](adapters_overview.md#the-adapter-protocol) for the full three-layer picture.

Tool resolution is memoized per `client.agent_id` inside each adapter, so the `fast_depends` schema-build cost is paid once, not on every notify turn.

If you write a custom adapter, override `tools_for` to offer your channel's verbs (or leave the default, which returns `[]`).

## What gets injected, automatically

Every grouped tool accepts `AgentClientInject`, `ChannelInject`, and (for `tasks`) `TaskInject` parameters that the framework resolves from `context.dependencies` when the tool runs inside a notify handler. Your code does not need to wire them up - the default handler stamps them via `stamp_dependencies` before invoking the agent's turn.

If you are testing a tool *outside* the notify-handler context, pass them yourself:

```python
from autogen.beta import Context
from autogen.beta.network import AGENT_CLIENT_DEP, CHANNEL_DEP, Channel

ctx = Context(
    dependencies={
        AGENT_CLIENT_DEP: alice,
        CHANNEL_DEP: Channel(metadata=channel.metadata, client=alice),
    },
)
```

## Opting out

Pass `attach_plugin=False` to `HubClient.register` for a bare agent - useful for headless workers or gateways that handle envelopes entirely in your own code without needing the LLM-facing tool surface.

```python
worker = await hc.register(agent, passport, resume, attach_plugin=False)
worker.on_envelope(my_custom_handler)
```

## See also

* [Agent Clients and Handlers](agent_clients.md) - what the default handler does and how to replace it.
* [Workflow](workflow.md) - hand-written `Handoff`-returning tools complement `ToolCalled -> AgentTarget` transitions for graph-driven routing.

---

# Channel Adapters Overview

Source: https://docs.ag2.ai/latest/docs/beta/network/adapters_overview/

A **channel adapter** governs one channel's allowed sends, default view policy, expectations, and termination rules. Four built-ins ship with the network module; each has its own page.

## Choosing an Adapter

| Use case | Adapter | Page |
|---|---|---|
| 1Q1R - strict question-and-answer, auto-closes after the reply | `consulting` | [Consulting](consulting.md) |
| 2-party free-form chat with no turn ordering | `conversation` | [Conversation](conversation.md) |
| N-party round-robin discussion | `discussion` | [Discussion](discussion.md) |
| Declarative orchestration (group-chat-with-handoff style) | `workflow` | [Workflow](workflow.md) |

If you're migrating a classic `GroupChat` orchestration, see [Migrating from Group Chat](migration_from_group_chat.md) - the workflow adapter is the modern equivalent.

## The Adapter Protocol

An adapter exposes **three concentric layers** of surface:

1. **Capabilities** - what the *hub* calls: `validate_create` / `validate_send` / `fold` / `on_accepted` / `initial_state`. `fold` is replayed on `Hub.hydrate()`, so it must be a pure function.
2. **Envelope helpers** - what *any client* calls: `build_text_envelope` / `build_packet_envelope`. Pure constructors that produce a correctly-shaped `Envelope` for this adapter's protocol. Framework-agnostic - not LLM-specific. This is the surface a non-AG2 bridge drives (see [below](#driving-a-channel-without-an-agent)).
3. **LLM tools** - what the *AG2 agent loop* sees: `tools_for`. The presentation layer; the default handler merges its result with the identity-level tools `NetworkPlugin` attaches. Adapters that take no LLM input (e.g. workflow, where handoff tools are user-authored) return `[]`.

```python
class ChannelAdapter(Protocol):
    manifest: ChannelManifest

    # Layer 1 - capabilities (hub-called)
    def initial_state(self, metadata: ChannelMetadata) -> AdapterState: ...
    def fold(self, envelope: Envelope, state: AdapterState) -> AdapterState: ...
    def validate_create(self, metadata: ChannelMetadata) -> None: ...
    def validate_send(
        self, metadata: ChannelMetadata, envelope: Envelope, state: AdapterState
    ) -> None: ...
    def on_accepted(
        self, metadata: ChannelMetadata, envelope: Envelope, state: AdapterState
    ) -> AdapterResult: ...

    # Layer 2 - envelope helpers (any client)
    def build_text_envelope(
        self, channel_id: str, sender_id: str, text: str, *,
        audience: list[str] | None = None, causation_id: str | None = None,
    ) -> Envelope: ...
    def build_packet_envelope(
        self, channel_id: str, sender_id: str, body: str, *,
        handoff: Handoff | None = None, context_set: dict | None = None,
        audience: list[str] | None = None, causation_id: str | None = None,
    ) -> Envelope: ...

    # Layer 3 - LLM tools (agent loop)
    def tools_for(
        self, client: AgentClient, metadata: ChannelMetadata,
        state: AdapterState, participant_id: str,
    ) -> list[Tool]: ...
```

Each method runs at a specific moment:

| Method | Layer | When | Purpose |
|---|---|---|---|
| `manifest` | 1 | Adapter registration | Static description: type, version, participant counts, knobs schema, default view, default expectations |
| `initial_state` | 1 | Channel creation | Build the per-channel bookkeeping (e.g. `expected_next_speaker`, turn count) |
| `validate_create` | 1 | Channel creation | Reject the create if the manifest's invariants are violated |
| `fold` | 1 | Each accepted envelope | Update the per-channel state (turn-taking, flags, last speaker) |
| `validate_send` | 1 | Each prospective send | Reject sends that would violate the protocol (out-of-turn, post-terminal) |
| `on_accepted` | 1 | Each accepted envelope | Decide whether to auto-close (`AdapterResult(next_state=CLOSING, ...)`) |
| `build_text_envelope` | 2 | Any time, by any client | Construct an `EV_TEXT` envelope shaped for this adapter |
| `build_packet_envelope` | 2 | Any time, by any client | Construct an `EV_PACKET` envelope (workflow encodes `handoff` / `context_set` here) |
| `tools_for` | 3 | Per turn, by the default handler | The LLM tools this participant gets this turn - see [Network Tools -> Channel-level tools](network_assigned_tools.md#channel-level-tools-adaptertools_for) |

Module-level defaults are public, so a custom adapter (or a bridge) can delegate to them directly: `default_build_text_envelope` / `default_build_packet_envelope` (emit plain `EV_TEXT` / `EV_PACKET`) and `default_tools_for` (returns `[]`). All three are importable from `autogen.beta.network`.

You don't normally implement this protocol yourself - the four built-ins cover most cases, and the workflow adapter is parameterised via `TransitionGraph` for custom orchestrations. The `ChannelAdapter` Protocol is exposed for completeness and for advanced use cases.

### Driving a channel without an Agent

The Layer-2 helpers exist so that code with no AG2 plumbing - a chat-platform gateway, a batch harness, a non-AG2 framework - can advance a turn manually. Build the adapter-shaped envelope, then post it through `Hub.post_envelope`:

```python
# Two pure identities - no Agent, no NetworkPlugin, no @tool.
alice = await alice_hc.register_human(Passport(name="alice"), resume=Resume())
bob = await bob_hc.register_human(Passport(name="bob"), resume=Resume())

channel = await alice.open(type="workflow", target=[bob.agent_id], knobs={"graph": graph.to_dict()})

adapter = hub.adapter_for(channel.channel_id)
env = adapter.build_packet_envelope(
    channel_id=channel.channel_id,
    sender_id=alice.agent_id,
    body="alice opens the discussion",
)
await hub.post_envelope(env)

# The workflow's transition graph advanced state purely from the bridge-supplied envelope.
state = hub.adapter_state(channel.channel_id)
assert state.expected_next_speaker == bob.agent_id
```

`hub.adapter_for(channel_id)` returns the bound adapter; `hub.adapter_state(channel_id)` returns the current fold state (and stays available after the channel closes - useful for post-mortem inspection). A bridge that pre-builds envelopes offline can skip the adapter lookup entirely and call `default_build_packet_envelope(...)` directly.

## Channel Lifecycle

```mermaid
stateDiagram-v2
    [*] --> INVITED: alice.open(...)
    INVITED --> ACTIVE: all targets ack
    INVITED --> CLOSED: ack timeout (acks_within violation)

    ACTIVE --> CLOSING: adapter.on_accepted -> CLOSING
    ACTIVE --> CLOSING: explicit channel.close()
    ACTIVE --> CLOSED: TTL expired (sweeper)
    ACTIVE --> CLOSED: expectation violation (auto_close)

    CLOSING --> CLOSED: drain complete
    CLOSED --> [*]
```

The state lives on `ChannelMetadata.state` - read it back via `await hub.get_channel(channel_id)`.

The four adapters differ entirely in what triggers the `ACTIVE -> CLOSING` arrow:

```mermaid
flowchart LR
    A[ConsultingAdapter] -->|"both flags set:<br/>initiator_sent + respondent_replied"| C1[CLOSING]
    B[ConversationAdapter] -->|"explicit close() only"| C2[CLOSING]
    D[DiscussionAdapter] -->|"explicit close() or TTL only"| C3[CLOSING]
    E[WorkflowAdapter] -->|"TransitionGraph emits<br/>TerminateTarget or max_turns"| C4[CLOSING]
```

## ChannelMetadata

```python
from autogen.beta.network import (
    Participant,
    ParticipantRole,
    ParticipantSchema,
    ChannelManifest,
    ChannelMetadata,
    ChannelState,
)
```

The hub-managed record for one channel:

| Field | Notes |
|---|---|
| `channel_id` | UUID hex. |
| `manifest` | Static `ChannelManifest` taken from the adapter. |
| `creator_id` | Who called `agent_client.open(...)`. |
| `participants` | List of `Participant(agent_id, role, order)`. The `order` field is set at create time and used by round-robin adapters. |
| `state` | `ChannelState` enum: `INVITED` / `ACTIVE` / `CLOSING` / `CLOSED` / `EXPIRED`. |
| `created_at` | ISO-Z. |
| `pending_acks` | Agents we're still waiting on. |
| `close_reason` | Free-form string set when the channel terminates. |
| `knobs` | Adapter-specific tuning (`{"ordering": "round_robin"}` for discussion, `{"graph": <dict>}` for workflow). |

## Default Expectations

Each adapter declares its own defaults:

| Adapter | Default expectations |
|---|---|
| `consulting` | `acks_within(30s, auto_close)`, `reply_within(600s, auto_close)` |
| `conversation` | `max_silence(3600s, audit)` |
| `discussion` | `turn_within(120s, warn)`, `turn_within(600s, hide)` |
| `workflow` | `turn_within(120s, warn)`, `turn_within(600s, auto_close)` |

These are enforced by the hub's expectation sweeper. See [Expectations & Audit](expectations_and_audit.md) for the evaluator and handler model.

## Default View Policies

| Adapter | Default view |
|---|---|
| `consulting` | `FullTranscript()` |
| `conversation` | `WindowedSummary(recent_n=10)` |
| `discussion` | `WindowedSummary(recent_n=N*2)` |
| `workflow` | `WindowedSummary(recent_n=N*2)` |

`N` = participant count. The default view governs what each participant sees of the WAL when the default handler projects history into their LLM turn - see [Views & Skills](views_and_skills.md).

## What's Next

Pick an adapter from the table at the top of this page and read its dedicated page. Each one includes a worked example you can copy.

---

# Conversation Adapter

Source: https://docs.ag2.ai/latest/docs/beta/network/conversation/

`conversation` is a free-form 2-party channel. Either side can send at any time; there's no turn order to enforce, and the adapter never auto-closes. Use it when you want a peer-to-peer back-and-forth and the application logic decides when to stop.

## Shape

| | |
|---|---|
| Participants | Exactly 2 (`INITIATOR` + `RESPONDENT`) |
| Turn order | None - either side, any time |
| Auto-close | Never |
| Termination | Explicit `channel.close()` or TTL |
| Default view | `WindowedSummary(recent_n=10)` |
| Default expectation | `max_silence(3600s, audit)` |

## Lifecycle

```mermaid
sequenceDiagram
    participant A as alice (initiator)
    participant H as Hub + ConversationAdapter
    participant B as bob (respondent)

    A->>H: open(type="conversation", target="bob")
    H->>B: EV_CHANNEL_INVITE
    B->>H: EV_CHANNEL_INVITE_ACK
    H->>A: EV_CHANNEL_OPENED
    Note over A,B: state = ACTIVE - no turn order

    A->>H: EV_TEXT
    H->>B: deliver
    B->>H: EV_TEXT
    H->>A: deliver
    A->>H: EV_TEXT
    H->>B: deliver
    Note over A,B: ...continues until empty reply<br/>or close() is called

    A->>H: channel.close()
    H-->>A: EV_CHANNEL_CLOSED
    H-->>B: EV_CHANNEL_CLOSED
```

`validate_send` only checks "is the sender a participant?" - it accepts sends from either side at any time, in any order.

## Smallest Example

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import Hub, HubClient, LocalLink, Passport, Resume

config = AnthropicConfig(model="claude-sonnet-4-6")
hub = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
link = LocalLink(hub)

alice_hc = HubClient(link, hub=hub)
bob_hc = HubClient(link, hub=hub)

alice = await alice_hc.register(
    Agent("alice", prompt="Curious novice. One short sentence with a follow-up.", config=config),
    Passport(name="alice"),
    Resume(),
)
bob = await bob_hc.register(
    Agent("bob", prompt="Patient expert. One short sentence, no questions back.", config=config),
    Passport(name="bob"),
    Resume(),
)

channel = await alice.open(type="conversation", target="bob")
await channel.send("Hi bob, what's a good first ML concept to learn?")
```

Both default handlers run `Agent.ask` on every inbound `EV_TEXT`, so the conversation auto-drives. Two ways to halt:

```python
# Cap by message count, then explicit close.
async def wait_for_text_count(hub, channel_id, expected, *, timeout=120.0):
    import asyncio
    deadline = asyncio.get_event_loop().time() + timeout
    while asyncio.get_event_loop().time() < deadline:
        wal = await hub.read_wal(channel_id)
        if sum(1 for e in wal if e.event_type == EV_TEXT) >= expected:
            return
        await asyncio.sleep(0.05)
    raise asyncio.TimeoutError("did not reach expected count")

await wait_for_text_count(hub, channel.channel_id, expected=6)
await channel.close()
```

Or rely on the LLM returning empty: the default handler treats an empty body as "don't send", which halts the chain naturally.

## When to Use

- Two specialists who genuinely converse without a fixed order - for example, an analyst and a critic going back and forth.
- Building chat UIs where the application controls when to stop, not the protocol.
- Any scenario where the adapter's job is just to deliver envelopes between two named participants and let your code do the rest.

## When NOT to Use

- Strict 1Q1R - use [`consulting`](consulting.md); it auto-closes for you.
- Multiple participants - use [`discussion`](discussion.md) or [`workflow`](workflow.md).
- Workflows with explicit handoffs - use [`workflow`](workflow.md).

## Validation Rules

`ConversationAdapter.validate_send` rejects:

- Sends from a non-participant.
- Sends after `state == CLOSED`.

It accepts everything else - including either participant sending two in a row. The adapter doesn't try to model "whose turn is it" because that's not the contract.

## State Object

```python
@dataclass(slots=True)
class ConversationState:
    turn_count: int = 0
    last_speaker_id: str | None = None
```

Minimal - just a count and a last-speaker hint that custom orchestrators or observers can read. Read it via `hub._adapter_states[channel_id]` (the underscore is intentional - operator API).

## Closing

The adapter never closes itself. To end the channel, do one of:

```python
# Explicit close from any participant.
await channel.close()

# Or rely on the TTL - set via Rule.limits or the adapter's manifest.
```

When closed, the hub posts `EV_CHANNEL_CLOSED` with whatever reason you supply (or the default `"explicit_close"`).

Three more termination patterns work cleanly with `conversation`:

* **Agent-side tool** - any participant calls a tool that closes the channel. Modern analogue of `is_termination_msg`.
* **Adapter sentinel** - subclass `ConversationAdapter` and watch for a keyword in accepted envelopes.
* **TTL / expectations** - safety nets only; not the primary stop signal.

See [Closing Channels](termination.md) for the worked examples.

---

# Consulting Adapter

Source: https://docs.ag2.ai/latest/docs/beta/network/consulting/

`consulting` is a strict 1-question-1-reply channel. The initiator sends exactly one substantive envelope; the respondent sends exactly one reply; the adapter auto-closes with reason `"consulting_complete"`.

Use it when you want a precisely-bounded query/answer exchange - exactly the shape of "ask another agent for advice and stop."

## Shape

| | |
|---|---|
| Participants | Exactly 2 (`INITIATOR` + `RESPONDENT`) |
| Turn order | Strict: initiator first, respondent once, then closed |
| Auto-close | Yes - after respondent's reply |
| Termination | Auto-close, explicit close, or TTL |
| Default view | `FullTranscript()` |
| Default expectations | `acks_within(30s, auto_close)`, `reply_within(600s, auto_close)` |

The full transcript view (vs the windowed summary used by other adapters) reflects the use case: a consultation is short enough that the respondent should see the entire exchange unfiltered.

## Lifecycle

```mermaid
sequenceDiagram
    participant A as alice (initiator)
    participant H as Hub + ConsultingAdapter
    participant B as bob (respondent)

    A->>H: open(type="consulting", target="bob")
    H->>B: EV_CHANNEL_INVITE
    B->>H: EV_CHANNEL_INVITE_ACK
    H->>A: EV_CHANNEL_OPENED
    Note over A,B: state = ACTIVE

    A->>H: EV_TEXT (initiator_sent <- true)
    H->>B: deliver
    B->>H: EV_TEXT (respondent_replied <- true)
    H->>A: deliver
    Note over H: on_accepted -> CLOSED, reason="consulting_complete"
    H-->>A: EV_CHANNEL_CLOSED
    H-->>B: EV_CHANNEL_CLOSED
```

The adapter rejects any further send via `validate_send` raising `ProtocolError`.

## Smallest Example

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_CHANNEL_CLOSED,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
)

config = AnthropicConfig(model="claude-sonnet-4-6")
hub = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
link = LocalLink(hub)

alice_hc = HubClient(link, hub=hub)
bob_hc = HubClient(link, hub=hub)

alice = await alice_hc.register(
    Agent("alice", prompt="Ask one focused question.", config=config),
    Passport(name="alice"),
    Resume(),
)
bob = await bob_hc.register(
    Agent("bob", prompt="Answer in one short sentence.", config=config),
    Passport(name="bob"),
    Resume(),
)

channel = await alice.open(type="consulting", target="bob")
await channel.send(
    "What's the most important property of a distributed system?",
    audience=[bob.agent_id],
)

close_env = await alice.wait_for_channel_event(
    channel_id=channel.channel_id,
    predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
    timeout=60.0,
)
print(close_env.event_data["reason"])  # 'consulting_complete'
```

The flow:

1. `alice.open(type="consulting", target="bob")` - hub posts invite to bob; bob auto-acks; channel goes `ACTIVE`.
2. `channel.send(...)` - alice's first (and only) envelope.
3. Bob's default handler probes `can_send` (yes - respondent hasn't replied), runs `Agent.ask`, sends the reply.
4. `ConsultingAdapter.on_accepted(...)` sees both flags set and returns `AdapterResult(next_state=CLOSED, auto_close_reason="consulting_complete")`.
5. Hub posts `EV_CHANNEL_CLOSED`. `alice.wait_for_channel_event(...)` wakes.

## When to Use

- One-shot query/response - "ask the database expert what indexing strategy fits this query."
- Built-in workflows where the calling code wants a single result and shouldn't wait around if no reply comes.
- Scenarios where the audit trail benefits from each consult being a separate channel id.

## When NOT to Use

- Multi-turn back-and-forth - use [`conversation`](conversation.md).
- Multiple respondents - use [`discussion`](discussion.md) or [`workflow`](workflow.md).
- When the LLM may need to ask follow-up questions - `consulting` rejects them.

## Validation Rules

`ConsultingAdapter.validate_send` rejects:

- Out-of-order sends - respondent trying to speak before the initiator's first envelope.
- Any send after both `initiator_sent` and `respondent_replied` flags are set.

`validate_send` raises `ProtocolError`; the hub propagates it back to the sender's `channel.send(...)` call.

## State Object

```python
@dataclass(slots=True)
class ConsultingState:
    initiator_sent: bool = False
    respondent_replied: bool = False
```

Two flags. `on_accepted` returns `CLOSED` when both are true.

## Auto-Close vs Explicit Close

The two states use different `close_reason` values:

| Trigger | Reason |
|---|---|
| Adapter auto-close after reply | `"consulting_complete"` |
| Explicit `channel.close()` | `"explicit_close"` (or whatever string you pass) |
| `acks_within` violation | `"expectation_violated:acks_within"` |
| `reply_within` violation | `"expectation_violated:reply_within"` |
| TTL expired | `"ttl_expired"` |

The reason flows on the `EV_CHANNEL_CLOSED` envelope's `event_data` and is also stored in `ChannelMetadata.close_reason` for later inspection via `hub.get_channel(...)`.

For the full set of close patterns (agent-side tool, sentinel, TTL safety nets), see [Closing Channels](termination.md).

---

# Discussion Adapter

Source: https://docs.ag2.ai/latest/docs/beta/network/discussion/

`discussion` is an N-party round-robin channel. Participants speak in a fixed order, cycling indefinitely until you close it. The adapter enforces "wait your turn" via `validate_send`; the hub's `can_send` probe lets the default handler skip wasted LLM calls when it isn't this agent's turn.

## Shape

| | |
|---|---|
| Participants | 2+ |
| Turn order | Round-robin (creator first, then participants in order) |
| Auto-close | No |
| Termination | Explicit `channel.close()` or TTL |
| Default view | `WindowedSummary(recent_n=N*2)` (where `N` = participant count) |
| Default expectations | `turn_within(120s, warn)`, `turn_within(600s, hide)` |
| Knob | `{"ordering": "round_robin"}` (only ordering shipped today) |

## Lifecycle

```mermaid
sequenceDiagram
    participant A as alice
    participant H as Hub + DiscussionAdapter
    participant B as bob
    participant C as carol

    A->>H: open(type="discussion", target=[bob, carol], knobs=round_robin)
    H->>B: EV_CHANNEL_INVITE
    H->>C: EV_CHANNEL_INVITE
    B->>H: EV_CHANNEL_INVITE_ACK
    C->>H: EV_CHANNEL_INVITE_ACK
    H->>A: EV_CHANNEL_OPENED
    Note over A,C: expected_next_speaker = alice

    A->>H: EV_TEXT (alice 1)
    Note over H: state.expected_next_speaker <- bob
    H->>B: deliver
    H->>C: deliver (probes can_send -> false, no LLM)
    B->>H: EV_TEXT (bob 1)
    Note over H: state.expected_next_speaker <- carol
    C->>H: EV_TEXT (carol 1)
    Note over H: state.expected_next_speaker <- alice (cycle)

    Note over A,C: ...continues until close() or TTL
    A->>H: channel.close()
    H-->>A: EV_CHANNEL_CLOSED
    H-->>B: EV_CHANNEL_CLOSED
    H-->>C: EV_CHANNEL_CLOSED
```

The `can_send` probe lets each default handler skip its LLM call when it's not that participant's turn - see "How Turn Skipping Works" below.

## Smallest Example

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_TEXT,
    ORDERING_ROUND_ROBIN,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
)

config = AnthropicConfig(model="claude-sonnet-4-6")
hub = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
link = LocalLink(hub)

alice_hc, bob_hc, carol_hc = (HubClient(link, hub=hub) for _ in range(3))

alice = await alice_hc.register(
    Agent("alice", prompt="The optimist. One short sentence.", config=config),
    Passport(name="alice"), Resume(),
)
bob = await bob_hc.register(
    Agent("bob", prompt="The realist. One short sentence.", config=config),
    Passport(name="bob"), Resume(),
)
carol = await carol_hc.register(
    Agent("carol", prompt="The skeptic. One short sentence.", config=config),
    Passport(name="carol"), Resume(),
)

channel = await alice.open(
    type="discussion",
    target=[bob.agent_id, carol.agent_id],
    knobs={"ordering": ORDERING_ROUND_ROBIN},
)

await channel.send("Topic: should every developer learn Rust?")
# After the kickoff, each agent's default handler responds when can_send
# returns true for them - bob, then carol, then alice again, and so on.
```

To halt, cap on text count and call `channel.close()`:

```python
await wait_for_text_count(hub, channel.channel_id, expected=6)
await channel.close()
```

## How Turn Skipping Works

When alice sends "alice 1", the hub fans out an `EV_TEXT` to bob and carol. Both default handlers fire in parallel:

- **bob's handler** - calls `hc.can_send(channel_id, bob.agent_id)`. The adapter says "yes, bob is `expected_next_speaker`." Handler runs `Agent.ask`, sends bob's reply.
- **carol's handler** - calls `hc.can_send(channel_id, carol.agent_id)`. The adapter says "no, expected_next_speaker is bob, not carol." Handler returns without engaging the LLM.

When bob's reply lands, the same fan-out happens. Now `expected_next_speaker = carol`, so carol's handler engages and bob's skips. No wasted LLM calls.

## When to Use

- Brainstorms with a fixed cast - three agents debating a topic in turn.
- Panel discussions where each agent has a static viewpoint.
- Round-robin reviewers - three reviewers each commenting once per cycle on a draft.

## When NOT to Use

- Conditional handoffs ("if alice mentions security, hand to the security expert") - use [`workflow`](workflow.md).
- Two participants only with no order - use [`conversation`](conversation.md).
- A pipeline where each step happens once - use [`workflow`](workflow.md) with `TransitionGraph.sequence(...)`.

## Validation Rules

`DiscussionAdapter.validate_send` rejects:

- `EV_TEXT` from anyone other than `state.expected_next_speaker`.
- Sends from non-participants.
- Sends to a closed channel.

Protocol envelopes (`EV_CHANNEL_*`, `ag2.task.*`) bypass the turn check.

## State Object

```python
@dataclass(slots=True)
class DiscussionState:
    participant_order: list[str]
    expected_next_speaker: str
    turn_count: int = 0
```

Read via `hub._adapter_states[channel_id]`. The order is fixed at create time by sorting participants on `Participant.order`; round-robin advances by `(current_index + 1) % len(participant_order)`.

## Customising the Ordering

Today only `ORDERING_ROUND_ROBIN` ships. The knob is `knobs={"ordering": "round_robin"}`; passing anything else raises at create time. Future orderings (dynamic, weighted) will plug in here without breaking the round-robin contract.

## Closing

`discussion` never auto-closes. The example below caps at 6 turns and calls `channel.close()`, but four other patterns work for `discussion` too:

* **App-side cap** - count turns and call `channel.close()` (canonical, simplest).
* **Agent-side tool** - any participant calls a tool that closes the channel. See [Closing Channels -> Agent-side tool](termination.md#pattern-2--agent-side-tool).
* **Custom adapter** - subclass `DiscussionAdapter` to fold `turn_count` and emit `CLOSING` at a cap (or switch to `workflow` with `TransitionGraph.round_robin(max_turns=N)`).
* **TTL / expectations** - safety nets only.

See [Closing Channels](termination.md) for the full picture.

---

# Workflow Adapter

Source: https://docs.ag2.ai/latest/docs/beta/network/workflow/

`workflow` is the orchestrated multi-party adapter. A declarative `TransitionGraph` describes who speaks first, what conditions fire, and when the channel terminates. It's the modern replacement for the classic `GroupChat + handoffs` pattern - see [Migrating from Group Chat](migration_from_group_chat.md) for the side-by-side translation.

## Shape

| | |
|---|---|
| Participants | 2+ |
| Turn order | Whatever `TransitionGraph` says |
| Auto-close | Yes - when graph emits a `TerminateTarget` decision or `max_turns` is hit |
| Termination | Auto-close, explicit close, TTL, or expectation violation |
| Default view | `WindowedSummary(recent_n=N*2)` |
| Default expectations | `turn_within(120s, warn)`, `turn_within(600s, auto_close)` |
| Required knob | `{"graph": <TransitionGraph.to_dict()>}` |

## How a TransitionGraph Resolves the Next Speaker

```mermaid
flowchart LR
    Env[Accepted EV_TEXT or EV_PACKET] --> Fold[adapter.fold:<br/>bookkeeping advanced]
    Fold --> Iter{For each transition<br/>in priority order}
    Iter -->|when.evaluate true| Then[then.resolve]
    Iter -->|when.evaluate false| Iter
    Iter -->|none match| Default[default_target.resolve]
    Then --> Decision[TransitionDecision:<br/>next_speaker / close_reason]
    Default --> Decision
    Decision -->|next_speaker is a participant| Continue[expected_next_speaker <- agent_id]
    Decision -->|next_speaker is None| Terminate[on_accepted returns CLOSING<br/>EV_CHANNEL_CLOSED with close_reason]
```

Each accepted substantive envelope walks the transitions list, finds the first matching condition, and resolves a target. `TerminateTarget` (with `next_speaker=None`) ends the channel.

## Lifecycle: Sequence (Pipeline) Example

```mermaid
sequenceDiagram
    participant A as alice (creator)
    participant H as Hub + WorkflowAdapter
    participant B as bob
    participant C as carol

    A->>H: open(type="workflow", target=[bob, carol],<br/>knobs.graph = sequence([alice, bob, carol]))
    H->>B: EV_CHANNEL_INVITE
    H->>C: EV_CHANNEL_INVITE
    B->>H: EV_CHANNEL_INVITE_ACK
    C->>H: EV_CHANNEL_INVITE_ACK
    H->>A: EV_CHANNEL_OPENED
    Note over H: graph.initial_speaker = alice

    A->>H: EV_TEXT (turn 1)
    Note over H: FromSpeaker(alice) -> AgentTarget(bob)
    H->>B: deliver
    B->>H: EV_TEXT (turn 2)
    Note over H: FromSpeaker(bob) -> AgentTarget(carol)
    H->>C: deliver
    C->>H: EV_TEXT (turn 3)
    Note over H: no transition matches -> default_target<br/>= TerminateTarget("sequence_complete")
    H-->>A: EV_CHANNEL_CLOSED (sequence_complete)
    H-->>B: EV_CHANNEL_CLOSED
    H-->>C: EV_CHANNEL_CLOSED
```

## Building the Graph

`TransitionGraph` is the orchestrator script:

```python
@dataclass(slots=True)
class TransitionGraph:
    initial_speaker: str                   # agent_id of the first speaker
    transitions: list[Transition]          # ordered list, evaluated by priority
    default_target: TransitionTarget       # what happens if no transition matches
    max_turns: int | None = None           # hard turn cap
```

Each `Transition` pairs a condition with a target:

```python
@dataclass(slots=True)
class Transition:
    when: TransitionCondition   # evaluated against the just-accepted envelope
    then: TransitionTarget      # if when() returns True, this resolves the next speaker
    priority: int = 0           # higher priority runs first; ties break by insertion order
```

## Built-in Targets

| Target | Decision |
|---|---|
| `AgentTarget(agent_id)` | Hand off to a specific named agent. |
| `RoundRobinTarget()` | Advance through the participant order. |
| `StayTarget()` | Same speaker continues (rare; for "let me elaborate" patterns). |
| `RevertToInitiatorTarget()` | Hand back to whoever opened the channel. |
| `TerminateTarget(reason="...")` | End the channel; reason flows on `EV_CHANNEL_CLOSED`. |

## Built-in Conditions

| Condition | Fires when |
|---|---|
| `Always()` | Every accepted envelope. |
| `FromSpeaker(agent_id)` | The just-accepted envelope was sent by this agent. |
| `ToolCalled(tool_name)` | The previous turn called this tool by name (matched via the packet's `routing.tool` field). |
| `ContextEquals(key, value)` | Channel-scoped `context_vars[key]` equals `value`. |

`ContextEquals` is the read side of the [Context Variables](context_variables.md) primitive - most non-trivial routing in classic AG2 went through `OnContextCondition` and friends, and `ContextEquals` is its modern equivalent. The "[Context-Driven Transitions](#context-driven-transitions)" section below covers routing patterns, multi-branch dispatch, and the order-of-rules traps in detail.

Both `TransitionTarget` and `TransitionCondition` are `Protocol`s with a `name: ClassVar[str]` registration key. Custom targets or conditions register via `register_target(MyTarget)` / `register_condition(MyCondition)` so they can round-trip through `TransitionGraph.to_dict()`.

## Convenience Factories

Two are shipped:

```python
# Cycle through participants for max_turns total turns.
graph = TransitionGraph.round_robin(
    participants=[alice.agent_id, bob.agent_id, carol.agent_id],
    max_turns=6,
)

# Pipeline: alice -> bob -> carol -> terminate.
graph = TransitionGraph.sequence([
    alice.agent_id, bob.agent_id, carol.agent_id,
])
```

`round_robin(participants)` uses `Always() -> RoundRobinTarget()`. `sequence(steps)` uses `FromSpeaker(steps[i]) -> AgentTarget(steps[i+1])` for each pair, with `TerminateTarget("sequence_complete")` as the default.

## Custom Graphs

```python
graph = TransitionGraph(
    initial_speaker=triage.agent_id,
    transitions=[
        Transition(
            when=ToolCalled("escalate_to_security"),
            then=AgentTarget(security.agent_id),
        ),
        Transition(
            when=FromSpeaker(security.agent_id),
            then=RevertToInitiatorTarget(),
        ),
    ],
    default_target=TerminateTarget(reason="triage_complete"),
    max_turns=20,
)
```

`ToolCalled` reads from the packet's `routing.tool` field. The flow is: the agent's `Agent.ask` round runs and one or more routing tools fire; when the round ends, the framework walks the agent's local-stream `ToolCallEvent`s in emission order and records the first one matching a `ToolCalled(name)` rule into the packet's `routing` field; the workflow adapter folds the resulting `EV_PACKET` envelope and `ToolCalled("escalate_to_security")` matches.

For dynamic routing (target depends on runtime state), a tool can return a typed `Handoff(target="<name>", reason="...")` value instead - the framework reads it from the tool's result, resolves the participant name, and stamps `routing.target` on the packet. The matching `ToolCalled` rule is shadowed when a `Handoff` is returned from the same tool: the dynamic target wins.

### Writing a handoff tool

For each `ToolCalled(name) -> AgentTarget(agent)` transition, attach an `@tool`-decorated function with that exact name and have it return a typed `Handoff(target=agent.agent_id)`:

```python
@triage.tool
async def transfer_to_eng(reason: str = "") -> Handoff:
    """Transfer the conversation to the engineering specialist."""
    return Handoff(target=eng.agent_id, reason=reason)
```

The workflow adapter consumes the `Handoff` from the agent's local-stream `ToolResultEvent` at round-end and routes the next speaker - no separate `ToolCalled` evaluation is needed because the typed return supersedes the graph match. The matching `ToolCalled` rule in the graph remains useful as documentation and as a fallback if you ever want to re-route the same tool name to a different target.

### Ending a workflow with `Finish`

A tool can also end the channel cleanly by returning a typed `Finish(summary="...", reason="...")`. The framework reads it from the agent's `ToolResultEvent` and closes the channel - same effect as a `TerminateTarget` rule firing, but the decision is made by the tool at runtime rather than by a static graph transition:

```python
@coord.tool
async def finish(summary: str) -> Finish:
    """Wrap up - no further handoffs needed."""
    return Finish(summary=summary)
```

`reason` (default `"finished"`) lands on `ChannelMetadata.close_reason`; `summary` rides on the packet's `routing.summary` field for callers and observability. With `Finish`, you no longer need a `Rule(when=ToolCalled("finish"), then=TerminateTarget())` glue rule - the typed return is enough.

!!! tip "Handoff vs Finish - which to use?"
    Both are typed returns the framework reads from `ToolResultEvent`. They're mutually exclusive intents:

    - `Handoff(target="alice", reason="...")` - redirect: the channel continues; `alice` speaks next.
    - `Finish(summary="...", reason="...")` - terminate: the channel closes; no further turns.

    First emission wins. If a tool emits both in the same round (unusual), the first event in stream order takes precedence.

## Context-Driven Transitions

Most non-trivial group-chat orchestrations in classic AG2 routed on context variables - `OnContextCondition`, `StringContextCondition`, `ExpressionContextCondition`. The beta equivalent is a tool that emits `EV_CONTEXT_SET` and a transition whose `when` is `ContextEquals(key, value)`. The mutation primitive lives on the [Context Variables](context_variables.md) page; this section is about *using* context to decide who speaks next.

### Reading context in a transition

`ContextEquals` compares `state.context_vars.get(key)` to `value`. Missing keys compare as `None`, so an unset key never matches a non-None value:

```python
Transition(
    when=ContextEquals(key="route", value="security"),
    then=AgentTarget(security.agent_id),
)
```

The state is read on every fold of a substantive envelope (text or handoff). So as soon as a tool's `EV_CONTEXT_SET` lands on the WAL, the *next* fold sees the new value - typically the speaker's reply text, fired from the same `Agent.ask` call.

### Routing on a flag

The 1-bit case: a tool sets a boolean, the transition routes on it. This is the modern `is_termination_msg` analogue, but generalised to "is some condition met."

```python
graph = TransitionGraph(
    initial_speaker=intake.agent_id,
    transitions=[
        Transition(when=FromSpeaker(intake.agent_id),     then=AgentTarget(triage.agent_id)),
        Transition(when=ContextEquals("urgent", value=True), then=AgentTarget(oncall.agent_id)),
        Transition(when=FromSpeaker(triage.agent_id),     then=AgentTarget(reviewer.agent_id)),
        Transition(when=FromSpeaker(oncall.agent_id),     then=TerminateTarget("paged")),
        Transition(when=FromSpeaker(reviewer.agent_id),   then=TerminateTarget("reviewed")),
    ],
    default_target=TerminateTarget("fall_through"),
    max_turns=10,
)
```

Triage's tool flips `urgent=True` when the ticket warrants paging. The condition fires on triage's reply fold and reroutes the next turn to `oncall` - whose `FromSpeaker` rule then terminates the channel. Without the flag, triage hands off to the reviewer instead.

### Multi-branch dispatch

Three or more buckets, one `ContextEquals` per branch. Order matters: lower-priority transitions are checked first, and ties resolve in *insertion order*. Put the more specific rules first so they win:

```python
graph = TransitionGraph(
    initial_speaker=triage.agent_id,
    transitions=[
        Transition(when=ContextEquals("domain", value="security"), then=AgentTarget(sec.agent_id)),
        Transition(when=ContextEquals("domain", value="legal"),    then=AgentTarget(legal.agent_id)),
        Transition(when=ContextEquals("domain", value="billing"),  then=AgentTarget(billing.agent_id)),
        # Default: fall through to the generic catch-all.
        Transition(when=FromSpeaker(triage.agent_id), then=AgentTarget(generic.agent_id)),
    ],
    default_target=TerminateTarget("done"),
)
```

If triage's tool calls `set_context(domain="security")` then the security row matches and the generic `FromSpeaker(triage)` row is never consulted. Note that `ContextEquals(key, value=None)` fires on missing keys - useful for "unset" branches.

### Combining with `FromSpeaker`

Most useful patterns combine the two: "if alice spoke AND the flag is set, do X." Beta's first-cut conditions don't ship a built-in `AllOf` composer, so you encode the conjunction as transition order - list the most specific rules first, with subsequent rules as fallbacks:

```python
transitions=[
    # Specific: alice flagged escalation -> security
    Transition(when=ContextEquals("escalate", value=True), then=AgentTarget(security.agent_id)),
    # Less specific: alice's normal reply -> reviewer
    Transition(when=FromSpeaker(alice.agent_id), then=AgentTarget(reviewer.agent_id)),
]
```

When alice speaks and `escalate==True`, the first row wins. When alice speaks and the flag is unset, the first row falls through and the second row matches. Same evaluation order as classic `OnCondition` lists.

If you need a true AND of "from this speaker AND in this state," register a custom composer (the `AllOf` recipe in [Context Variables](context_variables.md#custom-conditions) is the typical shape).

### Avoiding the stuck-routing trap

`ContextEquals` is **sticky**. Once `route="security"` is in `context_vars`, every subsequent fold re-evaluates it. If the security agent speaks next and you have `ContextEquals("route", "security") -> AgentTarget(security)` near the top of the list, you'll bounce right back to security forever (or until `max_turns`).

Two fixes:

1. **List terminate rules before context-conditions.** `FromSpeaker(security) -> TerminateTarget(...)` placed earlier in the list short-circuits the loop after security speaks.

  ```python linenums="1" hl_lines="3 4"
  transitions=[
      # Terminate FIRST so post-handoff speaker exits before re-matching.
      Transition(when=FromSpeaker(security.agent_id), then=TerminateTarget("security_done")),
      Transition(when=ContextEquals("route", value="security"), then=AgentTarget(security.agent_id)),
      Transition(when=FromSpeaker(triage.agent_id), then=AgentTarget(legal.agent_id)),
  ]
  ```

2. **Have the second agent clear the key.** Security's tool emits `EV_CONTEXT_SET` with `{"delete": ["route"]}` when it's done. Subsequent folds see `route` unset and the routing transition stops firing.

The first fix is the more common pattern - terminate transitions are cheap, the speaker-rule check is just an `==` against the envelope's sender_id.

!!! tip "Order check"
    When a graph-driven channel loops unexpectedly, the first thing to check is the transition list order. The shipped `sequence` and `round_robin` factories handle this for you; custom graphs need explicit attention.

### Beyond `ContextEquals`

`ContextEquals` is the only context-driven condition shipped today. For richer predicates, register your own - the Protocol is `evaluate(state, envelope) -> bool`:

```python
from typing import ClassVar
from dataclasses import dataclass
from autogen.beta.network import register_condition

@dataclass(slots=True)
class ContextThreshold:
    """Fires when ``state.context_vars[key] >= threshold``."""

    key: str
    threshold: float
    name: ClassVar[str] = "context_threshold"

    def evaluate(self, state, envelope) -> bool:
        value = state.context_vars.get(self.key, 0)
        return isinstance(value, (int, float)) and value >= self.threshold

register_condition(ContextThreshold)
```

`register_condition` plugs the class into `TransitionGraph.to_dict()` round-tripping, so the graph still serialises cleanly through `Hub.hydrate()`. The `AllOf` / `AnyOf` / `ContextIn` / `ContextThreshold` recipes follow the same pattern. See [Context Variables -> Custom Conditions](context_variables.md#custom-conditions) for the full set of recipes.

## Opening a Workflow Channel

```python
graph = TransitionGraph.round_robin(
    participants=[alice.agent_id, bob.agent_id, carol.agent_id],
    max_turns=6,
)

channel = await alice.open(
    type="workflow",
    target=[bob.agent_id, carol.agent_id],
    knobs={"graph": graph.to_dict()},
)
```

`initial_speaker` must match a participant. The creator (alice) is automatically a participant; targets fill the rest. The graph's dict form is what gets stored on `ChannelMetadata.knobs["graph"]`, so it round-trips through `Hub.hydrate()` deterministically.

After opening, the creator's first send is treated as turn 1 by the adapter. The default handler then drives subsequent turns by probing `can_send`.

## Termination

The graph terminates the channel when:

1. A transition fires whose target is `TerminateTarget(reason="...")`, or
2. `max_turns` is reached and no other transition fires (the `default_target` is consulted), or
3. An agent explicitly calls `channel.close(reason="...")`, or
4. An expectation violation triggers `AutoCloseHandler`, or
5. The channel's TTL expires.

`EV_CHANNEL_CLOSED` carries the close reason on its `event_data`.

For the cross-adapter view (when to use the workflow graph vs. an app-side cap, an agent tool, or a sentinel adapter), see [Closing Channels](termination.md).

## State Object

```python
@dataclass(slots=True)
class WorkflowState:
    participant_order: list[str]
    expected_next_speaker: str | None
    last_speaker_id: str | None = None
    last_envelope_id: str | None = None
    turn_count: int = 0
    pending_close_reason: str = ""
    creator_id: str = ""
    graph_data: dict = field(default_factory=dict)
```

`expected_next_speaker = None` signals "channel should terminate." The adapter's `on_accepted` reads this and returns `AdapterResult(next_state=CLOSING, ...)`.

`graph_data` is the serialised graph - the adapter rebuilds `TransitionGraph` on every fold so it doesn't keep mutable graph state in memory between turns.

## Custom Targets / Conditions

Implement the Protocol, decorate with a unique `name`, register on the default registry:

```python
from autogen.beta.network import (
    Envelope,
    TransitionTarget,
    TransitionDecision,
    register_target,
)

@dataclass(slots=True)
class HighestRankedReviewer(TransitionTarget):
    name: ClassVar[str] = "highest_ranked_reviewer"
    role_priority: list[str] = field(default_factory=list)

    def resolve(self, state, envelope: Envelope) -> TransitionDecision:
        # ...look up the next reviewer based on your domain logic...
        return TransitionDecision(next_speaker=chosen_id)

register_target(HighestRankedReviewer)
```

Then use it in a graph just like a built-in:

```python
Transition(when=Always(), then=HighestRankedReviewer(role_priority=["security", "legal"]))
```

Custom targets and conditions persist via `TransitionGraph.to_dict()` - the `name` field is the key, and the dataclass fields become the args dict that `loads(...)` passes back to the constructor.

## Packet execution model

Each `Agent.ask` round on a workflow channel commits to the WAL atomically as a single `EV_PACKET` envelope. The packet carries the agent's routing decision (`routing.tool` matched against `ToolCalled` rules, or a pre-resolved `routing.target` from a typed `Handoff` return), the round's body text, and a reserved `context_updates` slot. State mutations from tool calls (via `set_context(channel, ...)`) land as separate `EV_CONTEXT_SET` envelopes during tool execution - they're folded before the packet, so a `ContextEquals` rule on the same fold sees the just-set value.

### External side-effects and packet retry

The packet model commits a round's effects atomically: if the agent crashes mid-packet, the channel reverts to its pre-packet state and the original input is re-dispatched. **Tool calls within that packet will execute again on retry.**

If your tool calls an external system (HTTP API, database, payment gateway, queue), **it must be idempotent under retry** - calling it twice with the same arguments must produce the same outcome as calling it once.

Recommended patterns:

- Use the external service's idempotency-key feature where available (Stripe, S3, well-designed REST APIs). Derive a stable key from `(channel_id, round_counter, tool_name)` so retries within a packet reuse the same key.
- For database writes, use upsert (`INSERT ... ON CONFLICT`) rather than blind insert.
- For tools that genuinely cannot be made idempotent (rare), gate them behind a HITL confirmation step or run them in a single-tool round so the packet-rollback boundary is tighter.

### HITL packet-boundary semantics

Non-speaker substantive sends (e.g. a supervisor injecting a correction via `channel.send(EV_TEXT, ...)`) are accepted at packet boundaries, not arbitrary instants. While an agent's packet is in flight (including any slow tool execution), `validate_send` keeps that agent as the expected speaker until the packet commits. A supervisor's mid-packet inject waits for the active packet to commit (≈ slow-tool latency) before being accepted.

!!! tip "Loose-semantics writes are unaffected"
    `EV_CONTEXT_SET` envelopes (emitted by `set_context` / `delete_context` from any participant) are non-substantive and land immediately, regardless of who's speaking - observer writes during another agent's packet are visible to the next packet's `ContextEquals` evaluation.

## Working Examples

For the canonical multi-agent patterns translated from classic AG2 (Pipeline, Star, Feedback Loop, Triage-with-Tasks, etc.), see the [Pattern Cookbook](pattern_cookbook/pattern_cookbook.md).

---

# Context Variables

Source: https://docs.ag2.ai/latest/docs/beta/network/context_variables/

Channel-scoped mutable state that any participant can read or write,
auto-persisted on the WAL, and visible to transition conditions. The
modern equivalent of classic `ContextVariables` from
`autogen.agentchat.group`, scoped to one workflow channel.

## The Mechanism

Context variables live on `WorkflowState.context_vars: dict[str, Any]`
- a field folded under the per-channel WAL lock. There's no parallel
persistence layer; the dict is a *derivation* of the WAL, rebuilt
deterministically by `Hub.hydrate()` replaying the recorded envelopes.

Two pieces:

1. **Mutation** - anyone in the channel emits an `EV_CONTEXT_SET`
   envelope with `event_data = {"set": {...}, "delete": [...]}`. The
   workflow adapter folds it before any substantive turn check, so the
   new values are visible to the *next* fold (typically the speaker's
   text reply).
2. **Read** - transition conditions get `state` as their first arg, so
   `ContextEquals(key, value)` reads `state.context_vars.get(key)`
   directly. Tools can also inject `ChannelStateInject` to read.

```mermaid
flowchart LR
    Tool["agent's tool calls<br/>channel.send(EV_CONTEXT_SET, ...)"]
    Hub["Hub.post_envelope<br/>under per-channel WAL lock"]
    WAL[(WAL append)]
    Fold["WorkflowAdapter.fold<br/>merges into context_vars"]
    State[("WorkflowState.context_vars")]
    Cond["next-speaker rule<br/>ContextEquals(key, value)"]

    Tool --> Hub --> WAL --> Fold --> State --> Cond
```

## Loose semantics

Any participant of the channel can write `EV_CONTEXT_SET`, regardless
of whose turn it is. The per-channel WAL lock serialises concurrent
writes - there's no race even with multiple tools racing on the same
key. An out-of-turn observer can stamp a flag for the *next* turn's
routing decision; no need to wait for the floor.

Sender must be a participant of the channel - non-participants are
rejected by `validate_send`.

## Setting Values

Send the envelope from any participant via the existing `Channel.send`:

```python
from autogen.beta.network import EV_CONTEXT_SET, ChannelInject

async def set_route(route: str, channel: ChannelInject) -> str:
    """Record the routing decision for this channel."""
    if channel is None:
        return "no active channel"
    await channel.send(
        "",
        event_type=EV_CONTEXT_SET,
        event_data={"set": {"route": route{{ "}}" }},
        audience=[],   # context update; no participant needs notify
    )
    return f"route set to {route!r}"

agent.tool(set_route)
```

`audience=[]` keeps the dispatch list empty - context updates are
state-only; we don't fire anyone's `receive`. The envelope still lands
on the WAL.

The full event_data shape:

```python
{
    "set": {"key1": value1, "key2": value2},   # merge into context_vars
    "delete": ["key3", "key4"],                # remove these keys
}
```

Either field is optional. Within one envelope, `delete` runs first,
then `set` - so you can atomically delete-then-overwrite if needed.
Multiple envelopes serialise via the WAL lock, so deterministic order
is the WAL's order.

## Reading Values

Transition conditions get the State directly. `ContextEquals` is shipped:

```python
from autogen.beta.network import ContextEquals, FromSpeaker, AgentTarget, Transition

Transition(
    when=ContextEquals(key="route", value="security"),
    then=AgentTarget(security.agent_id),
)
```

Missing keys compare as `None`, so
`ContextEquals(key="foo", value=None)` fires when `foo` was never set
or was explicitly deleted.

For tools that need to read context (e.g. to make their writes
idempotent), inject the State:

```python
from autogen.beta.network import ChannelStateInject

async def increment_counter(channel: ChannelInject, state: ChannelStateInject) -> str:
    """Demonstrates reading current state, then writing a new value."""
    if state is None or channel is None:
        return "no active channel"
    current = state.context_vars.get("counter", 0)
    await channel.send(
        "",
        event_type=EV_CONTEXT_SET,
        event_data={"set": {"counter": current + 1{{ "}}" }},
        audience=[],
    )
    return f"counter now {current + 1}"
```

## Custom Conditions

If `ContextEquals` isn't enough, write a custom `TransitionCondition`
and register it. The Protocol is just two attributes:

```python
from typing import ClassVar
from dataclasses import dataclass
from autogen.beta.network import register_condition

@dataclass(slots=True)
class ContextThreshold:
    """Fires when ``state.context_vars[key] >= threshold``."""

    key: str
    threshold: float
    name: ClassVar[str] = "context_threshold"

    def evaluate(self, state, envelope) -> bool:
        value = state.context_vars.get(self.key, 0)
        return isinstance(value, (int, float)) and value >= self.threshold

register_condition(ContextThreshold)
```

Once registered, the condition serialises through `TransitionGraph.to_dict()`
and re-loads correctly - same path as the built-in `FromSpeaker` /
`ToolCalled` / `ContextEquals`.

## Initial Values

Pre-populate context at channel creation by passing a `context_vars`
knob alongside the graph:

```python
channel = await alice.open(
    type="workflow",
    target=[bob.agent_id, carol.agent_id],
    knobs={
        "graph": graph.to_dict(),
        "context_vars": {"escalation_level": 0, "ticket_id": ticket_id},
    },
)
```

The knob is read once by `WorkflowAdapter.initial_state` and copied
into the State. Subsequent `EV_CONTEXT_SET` envelopes mutate from there.

## Persistence and Hydrate

Adapter state is *not* stored separately on disk. The hub's
`KnowledgeStore` persists the WAL; on `Hub.hydrate()`, every
channel's adapter state is reconstructed by replaying the WAL through
`initial_state` then `fold` once per envelope. So the
`context_vars` dict that exists in memory after a write is always the
deterministic result of the recorded mutations - survives restart,
survives a fresh process, identical across replicas.

This is the lead dev's "WAL is the source of truth, indexes and
derivations are fine" rule applied: `context_vars` is a derivation of
`EV_CONTEXT_SET` envelopes on the WAL.

## Turn Bookkeeping

`EV_CONTEXT_SET` is **non-substantive**: it does not advance
`turn_count`, does not rotate `expected_next_speaker`, and does not
appear in the LLM's projected history through the default
`WindowedSummary` view. From the perspective of "whose turn is it,"
the envelope might as well not exist. Only its effect on
`context_vars` survives.

This means a tool can write context mid-turn (during the active
speaker's `Agent.ask` call), the speaker can then emit a normal
`EV_TEXT` reply, and the *reply's* fold sees the new context. The
next-speaker rule fires against the post-write context. Exactly what
you want for "agent's tool decides where we go next."

## Worked Example

For a runnable end-to-end example of context-driven routing - a router agent classifies a request and a `ContextEquals` transition routes to the matching specialist - see the [Context-Aware Routing](pattern_cookbook/context_aware_routing.md) entry in the Pattern Cookbook.

## Comparison to Classic `ContextVariables`

| Capability | Classic (`autogen.agentchat.group`) | Beta workflow |
|---|---|---|
| Mutable channel-scoped dict | `ContextVariables(...)` passed to `initiate_chat` | `WorkflowState.context_vars` |
| Tool writes context | `ReplyResult(message, context_variables=...)` | Tool emits `EV_CONTEXT_SET` envelope |
| Condition reads context | `StringContextCondition`, `ExpressionContextCondition` | `ContextEquals` (and custom-registered conditions) |
| Auto-render into LLM prompt | Built-in | Not yet - write a middleware that reads `ChannelStateInject.context_vars` and prepends to the prompt |
| Persisted across restart | Held in memory only | WAL-replayed on `Hub.hydrate()` |
| Visible in audit trail | No | Every mutation is a real envelope on the WAL |

The two missing classic features - auto-render and the rich expression
DSL - are deliberate omissions for first cut. Both can be added on
top of the existing primitives without framework changes.

## See Also

- [Workflow Adapter](workflow.md) - graphs, transitions, targets, conditions, and a "Context-Driven Transitions" section dedicated to routing patterns.
- [Closing Channels](termination.md) - `ContextEquals` is also useful for context-driven termination, e.g. `Transition(when=ContextEquals("done", True), then=TerminateTarget(reason="user_done"))`.
- [Pattern Cookbook](pattern_cookbook/pattern_cookbook.md) - eight canonical orchestrations (Pipeline, Star, Feedback Loop, Triage-with-Tasks, etc.) translated from classic AG2.
- [Migrating from Group Chat](migration_from_group_chat.md) - side-by-side translation of classic patterns.

---

# Closing Channels

Source: https://docs.ag2.ai/latest/docs/beta/network/termination/

Every channel terminates with an `EV_CHANNEL_CLOSED` envelope on the WAL, carrying a free-form reason on `event_data["reason"]` and on `ChannelMetadata.close_reason`. Five routes lead there. Pick by who decides.

## The Five Routes

| Pattern | Who decides | Best for |
|---|---|---|
| Application `channel.close()` | Your orchestration code | Custom caps (turn count, time, predicate) |
| Agent-side tool | The LLM | "Agent decides we're done" |
| Adapter sentinel | The framework | Content-based stop ("TERMINATE" keyword) |
| Workflow `TerminateTarget` | A declarative graph | Multi-step orchestrations |
| TTL / expectations | The hub's sweepers | Time- or expectation-based safety nets |

```mermaid
flowchart LR
    A[application calls<br/>channel.close]
    B[agent calls<br/>end_conversation tool]
    C[adapter on_accepted<br/>returns CLOSING]
    D[workflow graph emits<br/>TerminateTarget]
    E[TTL or expectation<br/>sweeper fires]

    A --> H[Hub._transition_channel]
    B --> H
    C --> H
    D --> H
    E --> H

    H --> CLOSED[state <- CLOSED<br/>EV_CHANNEL_CLOSED posted]
```

The hub funnels every termination through one transition, so observers only have to listen for one event.

## Adapter Compatibility

| | Auto-close | Application close | Agent tool | Adapter sentinel | Workflow graph |
|---|---|---|---|---|---|
| `consulting` | Yes (after reply) | Yes (early bailout) | Yes | Possible (subclass) | n/a |
| `conversation` | Never | Yes (typical) | Yes | Yes (canonical pattern) | n/a |
| `discussion` | Never | Yes (typical) | Yes | Possible (subclass) | n/a |
| `workflow` | Yes (graph) | Yes (override) | Yes (via `ToolCalled`) | n/a | Yes (canonical pattern) |

`consulting` and `workflow` ship with auto-close behaviour. `conversation` and `discussion` never auto-close - without one of the patterns below, the chain runs until the TTL fires.

---

## Pattern 1 - Application `channel.close()`

Your code holds a `Channel` handle and calls `close()` whenever it decides the channel is done. Most explicit, lowest ceremony, runs entirely outside the LLM turn.

```python
channel = await alice.open(
    type="discussion",
    target=[bob.agent_id, carol.agent_id],
    knobs={"ordering": ORDERING_ROUND_ROBIN},
)
await channel.send("Topic: should every developer learn Rust?")

# Stream live, return after 6 text envelopes (= 2 full round-robin cycles).
await stream_text_until_count(hub, channel.channel_id, name_by_id, expected=6)
await channel.close(reason="cap_reached")
```

The reason string flows on `EV_CHANNEL_CLOSED.event_data["reason"]` - pick something descriptive so observers can tell the close apart from a TTL or expectation violation.

**When to use:** the termination condition lives in *your* code (a turn cap, a wall-clock deadline, a custom signal from elsewhere in your application).

!!! tip "Race window"
    A reply to the *last* in-flight envelope can land while you're calling `close()`. The default handler short-circuits on `state != ACTIVE`, so most of the time it's a no-op - but an LLM call already in flight will return after the close and its `channel.send` is rejected by the hub. The receive loop catches and logs that error (see [Agent Clients](agent_clients.md)) so the failure is diagnosable, not silent.

---

## Pattern 2 - Agent-side tool

The LLM itself decides. Define a tool that injects the active `Channel` and calls `close(...)`.

```python
from autogen.beta.network.client.inject import ChannelInject

async def end_conversation(reason: str, channel: ChannelInject) -> str:
    """Close the active discussion. The reason flows on EV_CHANNEL_CLOSED."""
    if channel is None:
        return "no active channel"
    await channel.close(reason=f"agent_close:{reason}")
    return f"closed: {reason}"

alice_agent.tool(end_conversation)
bob_agent.tool(end_conversation)
carol_agent.tool(end_conversation)
```

The default notify handler stamps the active `Channel` into `context.dependencies` before each LLM turn (`stamp_dependencies` in `client/handlers.py`), so any tool running inside that turn can resolve it via `ChannelInject`. Outside a network turn the inject resolves to `None` - the guard above keeps the tool safe to call from non-network contexts.

**When to use:** any participant should be able to wrap up the channel based on its own judgement (the modern analogue of `ConversableAgent.is_termination_msg`, but driven by a tool call instead of a magic substring).

!!! note "Why not just check the message body?"
    Tool calls are visible on the WAL (`EV_HANDOFF` for workflow, or as part of the `ModelResponse` history) and pass typed arguments - pattern 3 (sentinel) is fine for classic parity but tool-call termination is more traceable, robust to multilingual prompts, and resists prompt-injection ("ignore previous instructions and write TERMINATE").

---

## Pattern 3 - Adapter sentinel

Subclass the adapter, watch every accepted envelope for a sentinel, return `CLOSING`. Closest analogue to classic `is_termination_msg`.

```python
class TerminatingConversationAdapter(ConversationAdapter):
    """Auto-closes when an EV_TEXT body contains the configured keyword."""

    def __init__(self, keyword: str = "TERMINATE") -> None:
        super().__init__()
        self.keyword = keyword

    def on_accepted(self, metadata, envelope, state) -> AdapterResult:
        if (
            envelope.event_type == EV_TEXT
            and self.keyword in envelope.event_data.get("text", "")
        ):
            return AdapterResult(
                next_state=ChannelState.CLOSING,
                auto_close_reason=f"terminate_keyword:{self.keyword}",
            )
        return super().on_accepted(metadata, envelope, state)

hub.register_adapter(TerminatingConversationAdapter(keyword="TERMINATE"))
```

Three properties this gives you for free:

* **Symmetric.** Anyone in the channel saying the keyword ends it.
* **Survives `Hub.hydrate()`.** The close decision is re-derived from the WAL on replay - no out-of-band state to persist.
* **Sentinel envelope is delivered first.** The TERMINATE message lands on the WAL before the adapter calls for close, so the goodbye is visible in the transcript.

**When to use:** classic migrations from `ConversableAgent.is_termination_msg`, or applications where termination is fundamentally a message-content concern (debate moderators saying "RECESS", a CLI command pattern).

---

## Pattern 4 - Workflow `TerminateTarget`

In `workflow` channels, terminate is just another transition. Wire a condition that emits `TerminateTarget(reason="...")`. The graph's `max_turns` and `default_target` provide the two implicit terminate paths.

```python
graph = TransitionGraph(
    initial_speaker=triage.agent_id,
    transitions=[
        Transition(when=ToolCalled("escalate"), then=AgentTarget(security.agent_id)),
        Transition(when=ToolCalled("done"),     then=TerminateTarget(reason="agent_done")),
        Transition(
            when=FromSpeaker(security.agent_id),
            then=RevertToInitiatorTarget(),
        ),
    ],
    default_target=TerminateTarget(reason="fall_through"),
    max_turns=20,
)
```

Three paths to close in one graph:

1. The `ToolCalled("done")` transition fires -> `TerminateTarget(reason="agent_done")`.
2. No transition matches and no further turn fits the rules -> `default_target` resolves to `TerminateTarget(reason="fall_through")`.
3. `turn_count` reaches `max_turns=20` -> adapter forces close.

The convenience factories ship the same shape: `TransitionGraph.round_robin(participants, max_turns=N)` uses `TerminateTarget` as its default; `TransitionGraph.sequence([a, b, c])` uses `TerminateTarget(reason="sequence_complete")` after the last step.

**When to use:** orchestrations with branching, conditional handoffs, or multi-step pipelines - termination is one branch in a graph, not an external decision.

---

## Pattern 5 - TTL & expectations

Two safety nets the hub runs in the background. Both terminate with adapter-specific reason strings.

**TTL.** Every channel has a `channel_ttl_default` from the creator's `Rule.limits`, or an explicit `ttl=...` override on `open(...)`. The TTL sweeper closes the channel when wall-clock time exceeds the deadline, with reason `"ttl_expired"`.

**Expectations.** Each adapter ships expectations the sweeper evaluates on every tick - e.g. `consulting` declares `acks_within(30s, auto_close)` and `reply_within(600s, auto_close)`. A violation handler attached to `auto_close` closes the channel with reason like `"expectation_violated:acks_within"`. See [Expectations & Audit](expectations_and_audit.md).

**When to use:** never as the *primary* termination mechanism - these are safety nets. Set them so a stuck or runaway channel can't hang forever, and pick one of patterns 1-4 to handle the happy path.

---

## Choosing

```mermaid
flowchart TD
    Q1{Termination condition is...}
    Q1 -->|a fixed turn count or<br/>app-side predicate| P1[Pattern 1<br/>channel.close]
    Q1 -->|the agent's own judgement| P2[Pattern 2<br/>agent tool]
    Q1 -->|a magic word in the reply| P3[Pattern 3<br/>adapter sentinel]
    Q1 -->|a multi-step orchestration| P4[Pattern 4<br/>workflow graph]
    Q1 -->|a safety net only| P5[Pattern 5<br/>TTL / expectations]
```

You can stack: a workflow graph (pattern 4) for the happy path, a TTL (pattern 5) as a safety net, and an `end_conversation` tool (pattern 2) so any agent can bail early. They don't conflict - first one to fire wins, and `EV_CHANNEL_CLOSED` carries whichever reason got there first.

## Watching for Close

All five patterns terminate the same way, so observers only need one predicate:

```python
close_env = await alice.wait_for_channel_event(
    channel_id=channel.channel_id,
    predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
    timeout=180.0,
)
print(f"reason: {close_env.event_data.get('reason')!r}")
```

Or stream live and return on the close envelope:

```python
async def stream_until_closed(hub, channel_id, name_by_id, *, timeout=180.0):
    seen: set[str] = set()
    deadline = asyncio.get_event_loop().time() + timeout
    while asyncio.get_event_loop().time() < deadline:
        wal = await hub.read_wal(channel_id)
        for env in wal:
            if env.envelope_id in seen:
                continue
            seen.add(env.envelope_id)
            if env.event_type == EV_TEXT:
                print(f"{name_by_id[env.sender_id]:>10}: {env.event_data['text']}")
            if env.event_type == EV_CHANNEL_CLOSED:
                return env.event_data
        await asyncio.sleep(0.05)
    raise asyncio.TimeoutError(...)
```

`ChannelMetadata.close_reason` is also stored, so post-mortem inspection via `hub.get_channel(channel_id)` returns the reason string without re-reading the WAL.

## See Also

- [Pattern Cookbook](pattern_cookbook/pattern_cookbook.md) - every cookbook entry calls out which termination route it uses (e.g. `ToolCalled("resolve") -> TerminateTarget("resolved")` for Escalation, `ContextEquals("done", True) -> TerminateTarget("approved")` for Feedback Loop).
- [Workflow Adapter](workflow.md) - `TerminateTarget` and the surrounding graph machinery.

---

# Migrating from Group Chat

Source: https://docs.ag2.ai/latest/docs/beta/network/migration_from_group_chat/

This page is for users porting an existing AG2 deployment that uses `GroupChat` plus `Handoffs`-style orchestration onto the new `autogen.beta.network` module. The modern equivalent is the [`WorkflowAdapter`](workflow.md), driven by a declarative `TransitionGraph`.

The translation is mostly mechanical. The two systems share the same vocabulary - speakers, conditions, targets, terminations - but the new one is data-first (a `TransitionGraph` is JSON-serialisable and survives `Hub.hydrate()`) and runs on top of the network's hub-and-spoke architecture instead of an in-process turn manager.

For per-pattern translations of the canonical orchestrations (Pipeline, Star, Feedback Loop, Triage-with-Tasks, etc.), jump to the [Pattern Cookbook](pattern_cookbook/pattern_cookbook.md).

## Concept Mapping

| Classic (non-beta) concept | Beta network equivalent | Notes |
|---|---|---|
| `GroupChat(agents=[...])` | A `workflow` channel with the agents as participants | The hub plays the role of `GroupChatManager`. |
| `GroupChatManager` | The `WorkflowAdapter` + `Hub` | Turn-taking enforcement moves into the hub; the adapter validates each send against the graph. |
| `Agent.handoffs` | A `TransitionGraph` (per-channel, not per-agent) | Handoffs are described once at the channel level - easier to reason about and serialisable. |
| `AgentTarget(agent)` | `AgentTarget(agent_id)` | Same name; takes an `agent_id` instead of an agent reference. |
| `RevertToInitiator()` | `RevertToInitiatorTarget()` | Identical semantics. |
| `Stay()` | `StayTarget()` | Identical. |
| `Terminate()` | `TerminateTarget(reason="...")` | Now carries an explicit reason that lands on `EV_CHANNEL_CLOSED`. |
| `OnContextCondition(...)` | A custom `TransitionCondition` | Implement the Protocol; register via `register_condition(...)`. |
| `OnCondition(...)` | A custom `TransitionCondition` | Same. |
| `ReplyResult(target=...)` from a tool | `EV_HANDOFF` envelope, paired with `ToolCalled("...")` | The default handler emits `EV_HANDOFF` automatically when a tool's result implies a handoff. |
| `FunctionTarget(fn)` | A custom `TransitionTarget` whose `resolve()` runs Python | Same pattern; register via `register_target(...)`. |
| `max_round=N` | `TransitionGraph(..., max_turns=N)` | Same hard cap. |

## Side-by-Side: Round Robin

**Classic (non-beta):**

```python
from autogen import GroupChat, GroupChatManager

groupchat = GroupChat(
    agents=[alice, bob, carol],
    speaker_selection_method="round_robin",
    max_round=6,
)
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)
alice.initiate_chat(manager, message="Topic: should every dev learn Rust?")
```

**Beta network:**

```python
from autogen.beta.network import (
    Hub, HubClient, LocalLink, Passport, Resume,
    TransitionGraph,
)

graph = TransitionGraph.round_robin(
    participants=[alice.agent_id, bob.agent_id, carol.agent_id],
    max_turns=6,
)

channel = await alice.open(
    type="workflow",
    target=[bob.agent_id, carol.agent_id],
    knobs={"graph": graph.to_dict()},
)
await channel.send("Topic: should every dev learn Rust?")
```

The auto-termination on `max_turns` lands on `EV_CHANNEL_CLOSED` with `reason="round_robin_complete"`.

!!! note
    `discussion` is also a round-robin channel - but it never auto-terminates and has no turn cap. Use it when the application decides when to stop. Use `workflow + TransitionGraph.round_robin` when you want a hard cap and an `EV_CHANNEL_CLOSED` to wait on.

## Side-by-Side: Pipeline / Sequential

**Classic:**

```python
researcher.handoffs.add_to([
    OnCondition(target=AgentTarget(writer)),
])
writer.handoffs.add_to([
    OnCondition(target=AgentTarget(editor)),
])
editor.handoffs.set_after_work(target=Terminate())

groupchat = GroupChat(agents=[researcher, writer, editor])
manager = GroupChatManager(...)
researcher.initiate_chat(manager, message="Topic: how does HTTPS work?")
```

**Beta network:**

```python
graph = TransitionGraph.sequence([
    researcher.agent_id, writer.agent_id, editor.agent_id,
])

channel = await researcher.open(
    type="workflow",
    target=[writer.agent_id, editor.agent_id],
    knobs={"graph": graph.to_dict()},
)
await channel.send("Topic: how does HTTPS work?")
```

Auto-terminates with `reason="sequence_complete"`.

## Side-by-Side: Conditional Handoff

A common classic pattern: a triage agent inspects the user request and routes to a specialist via a tool call.

**Classic:**

```python
@triage.register_for_llm(description="Escalate to security review.")
def escalate(reason: str) -> ReplyResult:
    return ReplyResult(
        target=AgentTarget(security_reviewer),
        message=f"Escalated: {reason}",
    )

triage.handoffs.add_to([
    OnCondition(target=AgentTarget(general_responder)),
])
groupchat = GroupChat(agents=[triage, security_reviewer, general_responder])
```

**Beta network:**

```python
from autogen.beta.network import (
    AgentTarget, Always, FromSpeaker, RevertToInitiatorTarget,
    TerminateTarget, ToolCalled, Transition, TransitionGraph,
)

graph = TransitionGraph(
    initial_speaker=triage.agent_id,
    transitions=[
        # If triage's tool said "escalate", route to the security reviewer.
        Transition(
            when=ToolCalled("escalate"),
            then=AgentTarget(security_reviewer.agent_id),
        ),
        # Once security has spoken, terminate.
        Transition(
            when=FromSpeaker(security_reviewer.agent_id),
            then=TerminateTarget(reason="security_review_complete"),
        ),
        # Otherwise, the general responder handles it.
        Transition(
            when=FromSpeaker(triage.agent_id),
            then=AgentTarget(general_responder.agent_id),
        ),
    ],
    default_target=TerminateTarget(reason="default"),
    max_turns=10,
)
```

The triage agent has its own `@triage_agent.tool` definition for `escalate`; the default handler emits an `EV_HANDOFF` envelope when the tool returns and the next-speaker rule sees `ToolCalled("escalate")` match.

!!! tip
    Tool-driven handoffs (`ToolCalled(...)`) are the cleanest migration path for `ReplyResult`-style classic logic. Define one tool per branch you want the LLM to take, and write a `ToolCalled` transition for each.

## Side-by-Side: Function Target / Custom Routing

`FunctionTarget(fn)` runs Python to decide who's next. The new equivalent is a custom `TransitionTarget`.

**Classic:**

```python
def route_by_topic(state):
    if "security" in state.last_message.lower():
        return security
    return general

triage.handoffs.add_to([
    OnCondition(target=FunctionTarget(route_by_topic)),
])
```

**Beta network:**

```python
from dataclasses import dataclass, field
from typing import ClassVar

from autogen.beta.network import (
    Envelope, TransitionDecision, TransitionTarget, register_target,
)

@dataclass(slots=True)
class TopicRouter(TransitionTarget):
    name: ClassVar[str] = "topic_router"
    security_id: str = ""
    general_id: str = ""

    def resolve(self, state, envelope: Envelope) -> TransitionDecision:
        text = envelope.event_data.get("text", "").lower()
        if "security" in text:
            return TransitionDecision(next_speaker=self.security_id)
        return TransitionDecision(next_speaker=self.general_id)

register_target(TopicRouter)

# Then in the graph:
Transition(
    when=FromSpeaker(triage.agent_id),
    then=TopicRouter(security_id=security.agent_id, general_id=general.agent_id),
)
```

The custom target dataclass is JSON-serialisable, so it round-trips through `TransitionGraph.to_dict()` and survives `Hub.hydrate()`. That's the win over `FunctionTarget`: state is data, not closures.

## What's Different (Beyond the Translation)

A few changes that will affect how you architect the migration:

- **The hub is authoritative.** Turn-taking, state, audit, and expectation enforcement all live in the hub. Your agent code becomes thinner.
- **Channels have ids.** Every `alice.open(...)` returns a channel id. Audit, replay, and inspection are scoped to it.
- **Default handlers do most of the heavy lifting.** When agents have `attach_plugin=True`, you don't write any "agent receives message -> run LLM -> send reply" glue. The framework handles it.
- **Sub-task observation comes for free.** Calls to `agent.task(..., capability="X")` inside a network turn auto-update `Resume.observed["X"]`. There was no equivalent in the classic world. See [Task Observation](task_observation.md).
- **Expectations replace `max_round` exhaustively.** You can declare `reply_within(60s, auto_close)` per channel type, get a violation logged to the audit log, and have the channel auto-close - see [Expectations & Audit](expectations_and_audit.md).
- **Views replace ad-hoc context windowing.** Each adapter has a default view (`FullTranscript`, `WindowedSummary`); custom views plug in via the `ViewPolicy` Protocol - see [Views & Skills](views_and_skills.md).
- **The transport is pluggable.** `LocalLink` is the in-process default, but the link-layer is a Protocol - cross-process and cross-host transports are drop-in replacements.

## Migration Checklist

1. Stand up a hub: `hub = await Hub.open(MemoryKnowledgeStore())`. (Pick a `KnowledgeStore` based on whether you need persistence.)
2. Wrap each existing `Agent` in an `AgentClient` via `hc.register(...)`. Provide a `Passport` (name) and a `Resume` (claimed capabilities).
3. Translate your `Handoffs` configuration into a `TransitionGraph`. Use `round_robin` / `sequence` factories where they fit; build the graph manually for conditional logic.
4. Replace `initiate_chat(...)` with `alice.open(type="workflow", target=[...], knobs={"graph": graph.to_dict()})` followed by `channel.send(text)`.
5. Wait for termination with `alice.wait_for_channel_event(channel_id=..., predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED)`.
6. Inspect the WAL via `hub.read_wal(channel_id)` for replay and the audit log via `hub.audit_log.read_all()` for governance.

For a runnable end-to-end translation of the "researcher -> writer -> editor" handoff pattern, see the [Pipeline](pattern_cookbook/pipeline.md) entry in the Pattern Cookbook.

---

# Pattern Cookbook

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/pattern_cookbook/

Side-by-side translations of the canonical multi-agent orchestration
patterns from classic (non-beta) AG2 (`autogen.agentchat.group`) onto the
beta `WorkflowAdapter`. Each pattern lists the classic primitives
it used, a beta translation, the full runnable source, and any gaps
that don't yet have a clean equivalent.

For the classic (non-beta) versions and their conceptual background, see the
[Pattern Cookbook in the user guide](../../../user-guide/advanced-concepts/pattern-cookbook/overview.md).

## Status Legend

| Status | Meaning |
|---|---|
| **Clean** | Maps one-to-one onto existing beta primitives. Code below works as written. |
| **Partial** | Translates with one or more named gaps. Workaround included; gap tracked on roadmap. |

## Quick Reference

| # | Pattern | Status | Beta primitives | Notable gap |
|---|---|---|---|---|
| 1 | [Pipeline](pipeline.md) | Clean | `TransitionGraph.sequence` or `FromSpeaker -> AgentTarget` chain | - |
| 2 | [Hierarchical](hierarchical.md) | Partial | `Handoff` returns from delegate tools + `ContextEquals` for terminate | No `NestedChatTarget`; sub-flows via separate channels |
| 3 | [Star](star.md) | Clean | `ToolCalled` graph rule for spoke selection + WAL-reading synthesis tool | - |
| 4 | [Coordinator](coordinator.md) | Clean | One generic `handoff(to)` tool returning `Handoff` + `finish` tool returning `Finish`; empty transitions list + `RevertToInitiatorTarget` default | - |
| 5 | [Escalation](escalation.md) | Clean | `Handoff` returns from escalate tools + `ToolCalled("resolve")` for terminate | - |
| 6 | [Redundant](redundant.md) | Partial | Sequential fan-out via `TransitionGraph.sequence` | No parallel dispatch; one specialist at a time |
| 7 | [Feedback Loop](feedback_loop.md) | Clean | `ContextEquals` on a `done` flag with `max_turns` cap | - |
| 8 | [Context-Aware Routing](context_aware_routing.md) | Partial | Router agent's tool sets `category` -> `ContextEquals` per branch | No `LLMCondition`; routing reasoning lives inside the coordinator's LLM, not the framework |
| 9 | [Triage with Tasks](triage_with_tasks.md) | Clean | `TransitionGraph.sequence` over a triage-produced plan; `knobs["context_vars"]` seeds the queue | - |

The "Organic" pattern from classic AG2 (LLM-driven `AutoPattern` group-manager auto-selection) translates to the [Coordinator](coordinator.md) pattern: one generic parameterised handoff tool on the coordinator, plus a `Finish` tool to terminate. No graph rewrite when specialists are added or removed. See [Roadmap](#roadmap) for the remaining gaps.

## Routing idioms

Every cookbook page below uses one of three composable routing idioms.
Picking the right one is the most common decision when porting a
classic pattern.

**1. Static routing, graph-centralised.** Plain `@tool` returning
a string; the graph holds the routing edge.

```python
async def delegate_researcher(reason: str) -> str:
    """Send the work to the research specialist."""
    return f"routing: {reason}"

# in the graph:
Transition(when=ToolCalled("delegate_researcher"),
           then=AgentTarget(researcher.agent_id))
```

When to use: many routing edges across multiple agents and you want
one place (the graph) to read the topology. Used by `03_star.py`.

**2. Static routing, tool-local.** `@tool` returning a fixed-target
`Handoff`. No graph rule needed - the framework reads
`Handoff.target` from the agent's local `ToolResultEvent`
stream and stamps it onto the packet's routing field directly.

```python
async def delegate_researcher(reason: str) -> Handoff:
    """Send the work to the research specialist."""
    return Handoff(target="researcher", reason=reason)
```

When to use: simple workflows where the routing target is obvious
from the tool's name and you'd rather not maintain a parallel graph
rule. Used by `02_hierarchical.py` and `04_escalation.py`.

**3. Dynamic routing.** `@tool` returning a *computed-target*
`Handoff`. The case that wasn't cleanly expressible before
the typed return shape.

```python
async def smart_route(query: str, state: ChannelStateInject) -> Handoff:
    target = pick_best_specialist(query, state.context_vars)
    return Handoff(target=target, reason="routed by load")
```

When to use: target depends on runtime state, load balancing, or any
condition the graph can't express declaratively.

## State updates

State mutations are independent of routing and use the workflow-scoped
helpers `set_context(channel, key, value)` and
`delete_context(channel, key)`. They live in
`autogen.beta.network.workflow_helpers` rather than the
top-level `autogen.beta.network` namespace - they only operate
on `WorkflowState.context_vars` and raise `RuntimeError`
on a non-workflow channel, so the import path itself signals the
adapter scope.

```python
from autogen.beta.network.workflow_helpers import set_context

async def classify_as_billing(reason: str, channel: ChannelInject) -> str:
    """Classify the request as billing. Sets context_vars['category']."""
    await set_context(channel, "category", "billing")
    return f"classified as billing: {reason}"
```

## Knowing when the workflow is finished

Every cookbook page below uses the same close-detection primitive in
its demo `main`:

```python
close_env = await intake.wait_for_channel_event(
    channel_id=channel.channel_id,
    predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
    timeout=180.0,
)
print(f"closed: reason={close_env.event_data.get('reason')!r}")
```

`AgentClient.wait_for_channel_event` blocks on the per-channel
inbox until an envelope matching the predicate arrives - here, the
`EV_CHANNEL_CLOSED` envelope every termination route emits.
The reason string distinguishes the close (e.g. `'sequence_complete'`,
`'approved'`, `'resolved'`, `'fall_through'`, `'max_iterations'`,
`'ttl_expired'`).

To print the transcript, dump the WAL once after close - no polling
helper needed (this is the pattern used in each cookbook page below).

For the full picture of the five close routes (application close,
agent-side tool, adapter sentinel, workflow `TerminateTarget`,
TTL / expectations), see
[Closing Channels -> Watching for Close](../termination.md#watching-for-close).

## Roadmap

The gaps surfaced in each pattern page are tracked. None block the
patterns; each either has a working workaround or is currently
deferred:

* **`LLMCondition`** - declarative routing where the framework asks an LLM whether a transition fires. Unblocks the cleanest version of Context-Aware Routing.
* **`NestedChatTarget`** - first-class sub-flow target. Unblocks Hierarchical and Redundant fully.
* **Parallel dispatch** - fan out to multiple speakers within a single workflow turn, gather their replies, advance once all are in. Unblocks parallel Redundant.
* **Auto-merge of tool-return fields into context** - the `ReplyResult.context_variables` analogue. Cuts the boilerplate for Escalation and Triage-with-Tasks.
* **`AllOf` / `AnyOf` condition composers** - useful sugar across most patterns; today the conjunction is encoded via transition order.

For the broader scope of what the workflow adapter does and doesn't
yet ship versus classic, see [Migrating from Group Chat](../migration_from_group_chat.md).

## See Also

- [Workflow Adapter](../workflow.md) - the underlying graph machinery and built-in conditions.
- [Context Variables](../context_variables.md) - `set_context` / `delete_context`, `ContextEquals`, custom conditions.
- [Closing Channels](../termination.md) - termination patterns; many cookbook entries lean on `TerminateTarget` with informative reasons.
- [Migrating from Group Chat](../migration_from_group_chat.md) - concept-level translation table.

---

# Pipeline / Sequential Processing

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/pipeline/

The Pipeline pattern organises agents into a strict linear sequence:
each agent processes the previous agent's output, then hands off to
the next. Information flows in one direction; the run ends after the
last stage replies.

**Classic (non-beta) primitives:** `DefaultPattern` with explicit
`AgentTarget` handoffs, optionally `ReplyResult` to bundle
a context update with each reply.

### Key Characteristics

* **Specialised stages.** Each agent focuses on one transformation -
  validate, enrich, fulfil - and ignores the rest.
* **Unidirectional flow.** Each stage hands off forward only. There
  is no return path for revisions inside the pipeline.
* **Progressive refinement.** The conversation accumulates through
  the WAL: every agent sees the full prior context via the windowed
  view, so no explicit state passing is needed.
* **Well-defined interfaces.** Each agent's prompt shapes its single
  reply line so the next stage can pick it up unambiguously.

### Information Flow

The graph's `TransitionGraph.sequence([...])` shorthand wires
`FromSpeaker(a) -> AgentTarget(b)`, `FromSpeaker(b) -> AgentTarget(c)`,
and so on, with `TerminateTarget("sequence_complete")` as the
default and `max_turns=len(steps)` matching the pipeline length.
Each step terminates only by reaching the end - a stage that wants to
abort early can return a typed `Handoff(target="terminate")` or
emit any other routing intent the framework recognises.

## Agent Flow

```mermaid
sequenceDiagram
    participant User
    participant Intake as intake
    participant Validator as validator
    participant Enricher as enricher
    participant Fulfilment as fulfilment

    User->>Intake: Order line + customer ref
    Intake->>Validator: FromSpeaker(intake) -> AgentTarget(validator)
    Validator->>Enricher: VALID - ... (FromSpeaker -> AgentTarget)
    Enricher->>Fulfilment: ENRICHED - tier=..., notes=...
    Fulfilment->>User: SHIPPED - tracking=#..., ETA=...
    Note over User,Fulfilment: TerminateTarget("sequence_complete") fires after fulfilment's reply
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| `DefaultPattern` + per-agent handoff registration | `TransitionGraph.sequence([...])` |
| `ReplyResult(message, target=AgentTarget(next))` from a tool | Either bake the handoff into the graph (preferred), return a typed `Handoff(target=...)` from a tool, or wire a `ToolCalled` rule to a plain `@tool` |
| `ContextVariables` carrying intermediate state | `set_context(channel, key, value)` from inside a tool; reads via `ChannelStateInject` |

## Code

!!! tip
    Each agent uses `AnthropicConfig(model="claude-sonnet-4-6")`
    so the validator / enricher / fulfilment stages produce real
    domain output. Set `ANTHROPIC_API_KEY` before running.

```python
"""Cookbook 01 - Pipeline pattern.

Strict linear hand-off: A -> B -> C -> D -> terminate.
TransitionGraph.sequence ships the canonical implementation -
each step's FromSpeaker rule routes to the next, and
sequence_complete terminates after the last speaker.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_PACKET,
    EV_CHANNEL_CLOSED,
    EV_TEXT,
    WORKFLOW_TYPE,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    TransitionGraph,
)
from autogen.beta.testing import TestConfig

load_dotenv()

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    hub = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub)

    intake_hc = HubClient(link, hub=hub)
    validator_hc = HubClient(link, hub=hub)
    enricher_hc = HubClient(link, hub=hub)
    fulfilment_hc = HubClient(link, hub=hub)

    intake_agent = Agent("intake", config=TestConfig())

    validator_agent = Agent(
        "validator",
        prompt=(
            "You are the order validator. The intake message describes "
            "an order. Check that it has a customer reference and at "
            "least one line item. If valid, reply on a single line: "
            "`VALID - <one-line summary of what was validated>`. If "
            "invalid, reply: `INVALID - <reason>`. Either way, ONE "
            "line, no preamble."
        ),
        config=config,
    )

    enricher_agent = Agent(
        "enricher",
        prompt=(
            "You are the order enricher. The validator has just "
            "approved an order. Look up (i.e. invent plausibly) the "
            "customer tier and any shipping notes a fulfilment agent "
            "would need. Reply on a single line: "
            "`ENRICHED - tier=<tier>, notes=<short notes>`. ONE line, "
            "no preamble."
        ),
        config=config,
    )

    fulfilment_agent = Agent(
        "fulfilment",
        prompt=(
            "You are fulfilment. The validator approved the order and "
            "the enricher added the customer tier. Issue a shipping "
            "confirmation. Reply on a single line: "
            "`SHIPPED - tracking=#<plausible-tracking>, ETA=<short "
            "phrase>`. ONE line, no preamble."
        ),
        config=config,
    )

    intake = await intake_hc.register(intake_agent, Passport(name="intake"), Resume())
    validator = await validator_hc.register(validator_agent, Passport(name="validator"), Resume())
    enricher = await enricher_hc.register(enricher_agent, Passport(name="enricher"), Resume())
    fulfilment = await fulfilment_hc.register(fulfilment_agent, Passport(name="fulfilment"), Resume())

    graph = TransitionGraph.sequence([
        intake.agent_id,
        validator.agent_id,
        enricher.agent_id,
        fulfilment.agent_id,
    ])

    channel = await intake.open(
        type=WORKFLOW_TYPE,
        target=[validator.agent_id, enricher.agent_id, fulfilment.agent_id],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        intake.agent_id: "intake",
        validator.agent_id: "validator",
        enricher.agent_id: "enricher",
        fulfilment.agent_id: "fulfilment",
    }

    await channel.send("Order: 2x widget (SKU W-100), customer ACME-7, ship-to: London EC1.")

    # Wait for the workflow to terminate (any of the five close routes
    # documented in /docs/beta/network/termination - this demo uses
    # TerminateTarget("sequence_complete") as its happy path).
    close_env = await intake.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=180.0,
    )

    # Print the transcript from the WAL after close.
    for env in await hub.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    await intake_hc.close()
    await validator_hc.close()
    await enricher_hc.close()
    await fulfilment_hc.close()
    await hub.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

```console
channel: 3a1f...

         intake: Order: 2x widget (SKU W-100), customer ACME-7, ship-to: London EC1.
      validator: VALID - order has customer ref ACME-7 and one line item (2x SKU W-100).
       enricher: ENRICHED - tier=Gold, notes=expedite for London EC1, signature on delivery.
     fulfilment: SHIPPED - tracking=#GB-2026-09-AC7-W100, ETA=next-day before 12:00 GMT.

closed: reason='sequence_complete'
```

---

# Hierarchical / Tree

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/hierarchical/

The Hierarchical pattern places a coordinator above a set of
specialists. The coordinator delegates work, the researcher returns
to the coordinator with facts, and the writer is the *terminal* step
that uses those facts to produce the final summary - its reply
closes the workflow.

**Classic (non-beta) primitives:** `NestedChat` for sub-flows,
`GroupManager` for delegation, top-level `DefaultPattern`
for the coordinator graph.

### Key Characteristics

* **Coordinator owns dispatch.** The coord decides whether to call
  the researcher or the writer, never replies as the specialist.
* **Researcher returns to coord.** The researcher's reply routes back
  via `FromSpeaker(researcher) -> AgentTarget(coord)` so coord
  can decide the next move.
* **Writer is the terminal speaker.** A `FromSpeaker(writer) ->
  TerminateTarget("written")` rule closes the workflow as soon as the
  writer's summary lands. There is no separate "finish" tool.

### Routing Mechanics

* **Typed `Handoff` return.** Each `delegate_<spoke>` tool
  returns `Handoff(target="researcher", reason=...)`. The framework
  reads it from the agent's local `ToolResultEvent` stream after
  the round and stamps it onto the packet's `routing.target`. No
  graph rule is needed for the delegation edge - `Handoff.target`
  is authoritative.
* **No `finish_delegate` tool.** Earlier iterations of this demo had a
  `finish_delegate` tool that flipped a context flag to terminate.
  Real LLMs (Sonnet, GPT, Gemini) routinely emit several tool calls in
  parallel inside one round - a parallel call to `finish_delegate`
  alongside `delegate_researcher` would set the flag *before* the
  researcher had a chance to speak, terminating the workflow prematurely.
  Making the writer the terminal speaker sidesteps this hazard entirely:
  parallel calls to `delegate_researcher` and `delegate_writer`
  are safe because [first-emitted-wins](../workflow.md#first-emitted-wins)
  picks researcher (the other tool runs but its `Handoff` doesn't
  drive routing), and the writer step still runs in its own dedicated
  round once researcher returns.

## Agent Flow

```mermaid
sequenceDiagram
    participant Intake as intake
    participant Coord as coord
    participant Researcher as researcher
    participant Writer as writer

    Intake->>Coord: kickoff (FromSpeaker -> AgentTarget)
    Coord->>Researcher: Handoff(target="researcher")
    Researcher->>Coord: facts (FromSpeaker -> AgentTarget)
    Coord->>Writer: Handoff(target="writer")
    Writer-->>Writer: 1-line summary
    Note over Writer: FromSpeaker(writer) -> TerminateTarget("written")
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| `ReplyResult(target=AgentTarget(researcher))` from a delegate tool | `return Handoff(target="researcher", reason=...)` |
| `NestedChat` for a specialist that runs its own sub-flow | Specialist tool opens a separate `consulting` channel |
| Explicit "finish" tool flipping a context flag | Make the terminal specialist's reply itself close via `FromSpeaker(writer) -> TerminateTarget` |

### Gaps & Workarounds

* **No `NestedChatTarget`.** A specialist that needs its own
  sub-flow can't open a "child workflow" inline. Workaround: the
  specialist's tool opens a *separate* channel (e.g. a `consulting`
  channel via `AgentClient.open(...)`), runs the sub-conversation
  there, and returns the result to the coordinator channel via its
  reply. Two channels, one per nesting level - clean WAL per nesting,
  but the affordance isn't built in.

## Code

!!! tip
    Coord, researcher, and writer all use real Sonnet - the
    coordinator genuinely decides between researcher and writer, and
    the writer's summary is a real LLM output.

```python
"""Cookbook 02 - Hierarchical / Tree pattern.

A coordinator delegates work to specialists. The researcher returns
to the coordinator with facts; the writer is the *terminal* step
that uses those facts to produce the final summary, and its reply
closes the workflow.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_PACKET,
    EV_CHANNEL_CLOSED,
    EV_TEXT,
    WORKFLOW_TYPE,
    AgentTarget,
    FromSpeaker,
    Handoff,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    TerminateTarget,
    Transition,
    TransitionGraph,
)
from autogen.beta.testing import TestConfig

load_dotenv()

async def delegate_researcher(reason: str) -> Handoff:
    """Send the work to the research specialist. The returned
    Handoff carries the target name; the framework resolves it
    and routes the next turn there. No graph rule needed for this
    edge - Handoff.target is authoritative."""
    print(f"  [tool] delegate_researcher({reason!r})")
    return Handoff(target="researcher", reason=reason)

async def delegate_writer(reason: str) -> Handoff:
    """Send the work to the writing specialist. Writer's reply is
    the terminal step - the graph closes the workflow on
    FromSpeaker(writer)."""
    print(f"  [tool] delegate_writer({reason!r})")
    return Handoff(target="writer", reason=reason)

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    hub = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub)

    intake_hc = HubClient(link, hub=hub)
    coord_hc = HubClient(link, hub=hub)
    researcher_hc = HubClient(link, hub=hub)
    writer_hc = HubClient(link, hub=hub)

    intake_agent = Agent("intake", config=TestConfig())

    coord_agent = Agent(
        "coord",
        prompt=(
            "You are a router-only coordinator. You produce no prose "
            "yourself; specialists do that work.\n"
            "\n"
            "On each turn your ENTIRE output MUST be one tool call "
            "and nothing else - no preface, no summary, no commentary. "
            "Any text body you emit is treated as a bug.\n"
            "\n"
            "Routing logic:\n"
            "* If the conversation does not yet contain bullet facts "
            "from the researcher, call `delegate_researcher`.\n"
            "* Once the researcher's facts are present, call "
            "`delegate_writer`. The writer produces the final 1-line "
            "summary and that reply ends the workflow.\n"
            "\n"
            "Strict: never call both delegations in the same turn - "
            "wait for the researcher's reply before delegating to the "
            "writer."
        ),
        config=config,
    )
    coord_agent.tool(delegate_researcher)
    coord_agent.tool(delegate_writer)

    researcher_agent = Agent(
        "researcher",
        prompt=(
            "You are the researcher. Reply with ONE short opening "
            "sentence followed by three bullet facts (each one line). "
            "No preamble, no closing."
        ),
        config=config,
    )
    writer_agent = Agent(
        "writer",
        prompt=(
            "You are the writer. The conversation contains the user's "
            "request and the researcher's three bullet facts. Produce "
            "ONE short sentence (≤ 30 words) summarising the topic, "
            "drawing on the researcher's facts. Your reply ends the "
            "workflow - make it the final answer the user receives. "
            "No preamble, no headers."
        ),
        config=config,
    )

    intake = await intake_hc.register(intake_agent, Passport(name="intake"), Resume())
    coord = await coord_hc.register(coord_agent, Passport(name="coord"), Resume())
    researcher = await researcher_hc.register(researcher_agent, Passport(name="researcher"), Resume())
    writer = await writer_hc.register(writer_agent, Passport(name="writer"), Resume())

    graph = TransitionGraph(
        initial_speaker=intake.agent_id,
        transitions=[
            # Writer's reply is the terminal step.
            Transition(when=FromSpeaker(writer.agent_id), then=TerminateTarget("written")),
            # Researcher returns to coord so coord can delegate to writer.
            Transition(when=FromSpeaker(researcher.agent_id), then=AgentTarget(coord.agent_id)),
            # intake -> coord kickoff. Routing FROM coord to a specialist
            # happens via Handoff returns from delegate_* tools - the
            # framework reads target from the Handoff and stamps it onto
            # the packet, so no ToolCalled rules are needed for that edge.
            Transition(when=FromSpeaker(intake.agent_id), then=AgentTarget(coord.agent_id)),
        ],
        default_target=TerminateTarget("fall_through"),
        max_turns=10,
    )

    channel = await intake.open(
        type=WORKFLOW_TYPE,
        target=[coord.agent_id, researcher.agent_id, writer.agent_id],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        intake.agent_id: "intake",
        coord.agent_id: "coord",
        researcher.agent_id: "researcher",
        writer.agent_id: "writer",
    }

    await channel.send("Brief on distributed consensus: research, then write a 1-line summary.")

    # Wait for the workflow to terminate (any of the five close routes
    # documented in /docs/beta/network/termination - this demo uses
    # FromSpeaker(writer) -> TerminateTarget("written")).
    close_env = await intake.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=180.0,
    )

    # Print the transcript from the WAL after close.
    for env in await hub.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    await intake_hc.close()
    await coord_hc.close()
    await researcher_hc.close()
    await writer_hc.close()
    await hub.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

```console
channel: 9b7c...

         intake: Brief on distributed consensus: research, then write a 1-line summary.
  [tool] delegate_researcher('Gather bullet facts about distributed consensus...')
          coord: [Handed off via delegate_researcher] Gather bullet facts about distributed consensus...
     researcher: Distributed consensus enables networked nodes to agree on a single value or state despite failures.

- Key algorithms: Paxos, Raft, PBFT
- Core trade-offs: CAP theorem, FLP impossibility
- Real-world uses: ZooKeeper, etcd, blockchain protocols
  [tool] delegate_writer('Write a 1-line summary covering algorithms, trade-offs, and uses.')
          coord: [Handed off via delegate_writer] Write a 1-line summary covering algorithms, trade-offs, and uses.
         writer: Distributed consensus algorithms - Paxos, Raft, PBFT - let networked nodes agree on shared state despite faults, trading consistency against availability in real systems like etcd and blockchains.

closed: reason='written'
```

---

# Star

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/star/

The Star pattern places one hub agent at the centre with several
specialist spokes. The hub fans out questions to the relevant spoke,
collects each reply, and synthesises a final answer. Spokes never
talk to each other; everything routes through the hub.

**Classic (non-beta) primitives:** `DefaultPattern` with
`OnContextCondition` routing, spoke handoffs returning to centre,
`ContextVariables` tracking results.

### Key Characteristics

* **Single hub.** The hub picks which spoke to query, waits for the
  reply, then either delegates to another spoke or terminates with a
  synthesis.
* **Dynamic `Handoff`.** A single parameterised
  `ask_spoke(spoke, query)` tool returns
  `Handoff(target=spoke)`, so the framework routes the next turn
  directly to the chosen spoke. No per-spoke graph rules are needed
  for the delegation edge - `Handoff.target` is authoritative.
* **WAL-gated synthesis.** The `synthesise` tool reads the WAL
  via `HubInject` and checks for each spoke's reply by
  sender identity (`hub.name_for`). It refuses with a
  `"pending: ..."` string until all required spokes have replied,
  then stores the synthesis via `set_context` and returns
  `Finish(reason="answered")` to terminate the workflow.
* **`StayTarget` for hub.** The graph's hub transition uses
  `StayTarget` so that turns where `synthesise` returns
  `"pending"` (and the hub writes a short text note) re-route back
  to the hub rather than stalling on a silent round.

### Routing Mechanics

* **Spokes return to hub.** `FromSpeaker(<spoke>) -> AgentTarget(hub)`
  rotates control back after each spoke replies.
* **Termination via `Finish`.** When `synthesise` has all
  three spoke replies, it returns `Finish(reason="answered")`. The
  adapter reads `Finish` from the `ToolResultEvent` and
  sets `routing.kind = "finish"`, which `fold` converts
  directly into a channel close - no `ToolCalled` graph rule needed.
* **Sender-identity detection.** `synthesise` identifies spoke
  replies by calling `hub.name_for(env.sender_id)` on each
  `EV_PACKET` in the WAL. This works regardless of the reply's
  text format - no `"[spoke]:"` prefix convention required.

!!! note "Parallel-call defence"
    Without mitigation, Sonnet may emit `ask_spoke`
    calls for multiple spokes in a single turn (the agent's
    internal multi-round loop). The first `Handoff` in the
    event list always wins routing, so the correct spoke is
    reached. Still, disabling parallel tool calls keeps the trace
    clean and prevents redundant LLM calls:

    ```python
    AnthropicConfig(
        model="claude-sonnet-4-6",
        extra_body={"tool_choice": {"type": "auto", "disable_parallel_tool_use": True{{ "}}" }},
    )
    ```

    OpenAI exposes the analogous `parallel_tool_calls=False`
    as a typed field on `OpenAIConfig`; Gemini's behaviour
    is naturally serial.

## Agent Flow

```mermaid
sequenceDiagram
    participant User as user
    participant Hub as hub
    participant Weather as weather
    participant Sports as sports
    participant Finance as finance

    User->>Hub: question
    Hub->>Weather: Handoff(target="weather")
    Weather->>Hub: [weather reply]
    Hub->>Sports: Handoff(target="sports")
    Sports->>Hub: [sports reply]
    Hub->>Finance: Handoff(target="finance")
    Finance->>Hub: [finance reply]
    Hub->>Hub: synthesise reads WAL by sender identity, set_context("synthesis", ...)
    Note over Hub: synthesise returns Finish(reason="answered") -> channel closes
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| Coordinator routes by inspecting `ContextVariables` | Hub routes via a parameterised `ask_spoke` tool returning `Handoff(target=spoke)` |
| Spoke replies carry `ReplyResult.target=AgentTarget(coordinator)` | `FromSpeaker(<spoke>) -> AgentTarget(hub)` rule rotates control back |
| Synthesis triggered by checking aggregated context | Synthesis triggered by an explicit `synthesise` tool call; the tool reads the WAL via `HubInject`, gates on required senders, stores the result via `set_context`, and returns `Finish` to close the workflow |

## Code

!!! tip
    The hub uses real Sonnet (the routing decision is the LLM-driven
    part of the demo). The spokes use `TestConfig` with
    pre-canned deterministic replies so the synthesis turn can quote
    them cleanly without LLM-quality noise.

```python
"""Cookbook 03 - Star pattern.

A hub agent fans out to specialist spokes via a parameterised
``ask_spoke(spoke, query)`` tool returning ``Handoff(target=spoke)``.
A WAL-gated ``synthesise`` tool composes the final answer once all
required spokes have replied, then returns ``Finish(reason="answered")``
to terminate the workflow.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_CHANNEL_CLOSED,
    EV_CONTEXT_SET,
    EV_PACKET,
    EV_TEXT,
    WORKFLOW_TYPE,
    AgentTarget,
    ChannelInject,
    Finish,
    FromSpeaker,
    Handoff,
    Hub,
    HubClient,
    HubInject,
    LocalLink,
    Passport,
    Resume,
    StayTarget,
    TerminateTarget,
    Transition,
    TransitionGraph,
)
from autogen.beta.network.workflow_helpers import set_context
from autogen.beta.testing import TestConfig

load_dotenv()

_REQUIRED_SPOKES = ("weather", "sports", "finance")

async def ask_spoke(spoke: str, query: str) -> Handoff:
    """Route a query to one of the spokes. Returns a typed
    Handoff(target=spoke) so the framework routes the next turn
    directly to the named spoke."""
    print(f"  [tool] ask_spoke({spoke}): {query}")
    if spoke not in _REQUIRED_SPOKES:
        return Handoff(target="hub", reason=f"unknown spoke {spoke!r}")
    return Handoff(target=spoke, reason=query)

async def synthesise(headline: str, channel: ChannelInject, hub: HubInject) -> "Finish | str":
    """Scan the WAL for spoke replies by sender identity, build synthesis.

    Uses hub.name_for(sender_id) to identify each spoke - works
    regardless of reply text format, no "[spoke]:" prefix required.
    Returns Finish(reason="answered") once all three spokes have replied,
    terminating the workflow. Returns "pending: ..." until then.
    """
    if channel is None or hub is None:
        return "no channel or hub"
    print(f"  [tool] synthesise(headline={headline!r})")

    wal = await hub.read_wal(channel.channel_id)
    by_spoke: dict[str, str] = {}
    for env in wal:
        if env.event_type != EV_PACKET:
            continue
        body = env.event_data.get("body", "")
        if not body:
            continue
        name = hub.name_for(env.sender_id, default="")
        if name in _REQUIRED_SPOKES:
            by_spoke[name] = body

    missing = [s for s in _REQUIRED_SPOKES if s not in by_spoke]
    if missing:
        return f"pending: {', '.join(missing)}"

    bullets: list[str] = [f"**{headline.strip() or 'Roundup'}**", ""]
    for spoke in _REQUIRED_SPOKES:
        bullets.append(f"- {by_spoke[spoke]}")
    synthesis = "\n".join(bullets)

    await set_context(channel, "synthesis", synthesis)
    return Finish(reason="answered")

async def main() -> None:
    # Hub config: disable parallel tool calls so Sonnet emits exactly
    # one ask_spoke (or synthesise) per LLM response. Spokes use
    # TestConfig so they don't need this.
    hub_config = AnthropicConfig(
        model="claude-sonnet-4-6",
        extra_body={"tool_choice": {"type": "auto", "disable_parallel_tool_use": True{{ "}}" }},
    )

    hub_obj = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub_obj)

    user_hc = HubClient(link, hub=hub_obj)
    hub_hc = HubClient(link, hub=hub_obj)
    weather_hc = HubClient(link, hub=hub_obj)
    sports_hc = HubClient(link, hub=hub_obj)
    finance_hc = HubClient(link, hub=hub_obj)

    user_agent = Agent("user", config=TestConfig())

    hub_agent = Agent(
        "hub",
        prompt=(
            "You are the hub of a Q&A star. Tools:\n"
            "\n"
            "* `ask_spoke(spoke, query)` - query ONE spoke by name "
            "(`weather`, `sports`, `finance`). Returns a Handoff. The "
            "spoke's reply appears in your conversation on a later turn. "
            "The tool's return value is just a routing token; ignore it "
            "and wait for the spoke's reply on a future turn.\n"
            "* `synthesise(headline)` - call ONCE when all three "
            "spoke replies are visible in your conversation history. "
            "Builds the final synthesis from the WAL and ends the "
            "workflow.\n"
            "\n"
            "Protocol:\n"
            "1. Call exactly ONE tool per turn.\n"
            "2. Each turn, check your conversation history for spoke "
            "replies. Call `ask_spoke` for a spoke that has NOT yet "
            "replied.\n"
            "3. Once all three spoke replies are visible, call "
            "`synthesise` with a short headline like 'Daily Roundup'.\n"
            "\n"
            "If `synthesise` returns `pending: ...` some spokes are "
            "still missing - call `ask_spoke` for the first listed "
            "missing spoke on your next turn."
        ),
        config=hub_config,
    )
    hub_agent.tool(ask_spoke)
    hub_agent.tool(synthesise)

    weather_agent = Agent(
        "weather",
        config=TestConfig(
            "[weather]: Partly cloudy, 68°F, light southwesterly breeze; no precipitation expected."
        ),
    )
    sports_agent = Agent(
        "sports",
        config=TestConfig(
            "[sports]: The Riverhawks won 2-1 last night, with the winning goal scored in the 87th minute."
        ),
    )
    finance_agent = Agent(
        "finance",
        config=TestConfig(
            "[finance]: S&P 500 closed up 0.4% on cooling inflation data ahead of next week's Fed meeting."
        ),
    )

    user = await user_hc.register(user_agent, Passport(name="user"), Resume())
    central = await hub_hc.register(hub_agent, Passport(name="hub"), Resume())
    weather = await weather_hc.register(weather_agent, Passport(name="weather"), Resume())
    sports = await sports_hc.register(sports_agent, Passport(name="sports"), Resume())
    finance = await finance_hc.register(finance_agent, Passport(name="finance"), Resume())

    graph = TransitionGraph(
        initial_speaker=user.agent_id,
        transitions=[
            # Spokes always return to hub.
            Transition(when=FromSpeaker(weather.agent_id), then=AgentTarget(central.agent_id)),
            Transition(when=FromSpeaker(sports.agent_id),  then=AgentTarget(central.agent_id)),
            Transition(when=FromSpeaker(finance.agent_id), then=AgentTarget(central.agent_id)),
            # Hub stays on text turns (synthesise returns "pending" and
            # hub writes a short note). Prevents silent-round stalls.
            # Handoff turns (ask_spoke) and Finish turns (synthesise done)
            # bypass this rule via routing.target / routing.kind.
            Transition(when=FromSpeaker(central.agent_id), then=StayTarget()),
            # User's question -> hub.
            Transition(when=FromSpeaker(user.agent_id), then=AgentTarget(central.agent_id)),
        ],
        default_target=TerminateTarget("fall_through"),
        max_turns=20,
    )

    channel = await user.open(
        type=WORKFLOW_TYPE,
        target=[central.agent_id, weather.agent_id, sports.agent_id, finance.agent_id],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        user.agent_id: "user",
        central.agent_id: "hub",
        weather.agent_id: "weather",
        sports.agent_id: "sports",
        finance.agent_id: "finance",
    }

    await channel.send(
        "What's the weather like and how did the local football team do? "
        "Also a quick word on the markets."
    )

    close_env = await user.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=240.0,
    )

    for env in await hub_obj.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            elif routing.get("kind") == "finish":
                print(f"{speaker:>14}: [Finish] {routing.get('reason', '')}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    print("\n--- final synthesis ---")
    synthesis = "(no synthesis)"
    for env in await hub_obj.read_wal(channel.channel_id):
        if env.event_type == EV_CONTEXT_SET:
            kv = env.event_data.get("set", {})
            if "synthesis" in kv:
                synthesis = kv["synthesis"]
    print(synthesis)

    await user_hc.close()
    await hub_hc.close()
    await weather_hc.close()
    await sports_hc.close()
    await finance_hc.close()
    await hub_obj.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

```console
channel: 4f2e...

           user: What's the weather like and how did the local football team do? Also a quick word on the markets.
  [tool] ask_spoke(weather): What is the current weather like?
            hub: [Handed off via ask_spoke] What is the current weather like?
        weather: [weather]: Partly cloudy, 68°F, light southwesterly breeze; no precipitation expected.
  [tool] ask_spoke(sports): How did the local football team do?
            hub: [Handed off via ask_spoke] How did the local football team do?
         sports: [sports]: The Riverhawks won 2-1 last night, with the winning goal scored in the 87th minute.
  [tool] ask_spoke(finance): Quick market summary
            hub: [Handed off via ask_spoke] Quick market summary
        finance: [finance]: S&P 500 closed up 0.4% on cooling inflation data ahead of next week's Fed meeting.
  [tool] synthesise(headline='Daily Roundup')
            hub: [Finish] answered

closed: reason='answered'

--- final synthesis ---
**Daily Roundup**

- [weather]: Partly cloudy, 68°F, light southwesterly breeze; no precipitation expected.
- [sports]: The Riverhawks won 2-1 last night, with the winning goal scored in the 87th minute.
- [finance]: S&P 500 closed up 0.4% on cooling inflation data ahead of next week's Fed meeting.
```

---

# Coordinator (Manager + Specialists)

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/coordinator/

A coordinator agent at the centre with several specialists. The coordinator's LLM decides which specialist to consult next via a single generic `handoff(to)` tool, and ends the conversation via a `finish(summary)` tool. No per-specialist routing rules, no graph rewiring when a specialist is added or removed.

This is the closest beta equivalent to classic AG2's `AutoPattern` group chat with LLM-driven manager handoffs.

**Classic (non-beta) primitives:** `AutoPattern` with `GroupChat`, `GroupChatManager`, LLM-selected `next_agent`, and per-specialist handoff registration.

### Key Characteristics

* **One generic handoff tool.** The coordinator's LLM picks the next specialist by *passing the name as a parameter* (`handoff(to="researcher")`). Adding a fifth specialist doesn't add a new tool - just a new participant on the channel.
* **One finish tool.** The coordinator ends the discussion by returning `Finish(summary=...)`. The framework closes the channel cleanly via the typed-return routing path.
* **Both routing tools are terminal.** `handoff` and `finish` each return a `ToolResult(..., final=True)` - calling either *ends the coordinator's round immediately*, so each turn produces exactly one routing decision. (Without this the round runs a multi-step tool loop and emits several routing intents; see the admonition below.)
* **Empty transitions list.** The graph has no per-specialist rules. Dynamic `Handoff` resolution routes coordinator -> specialist; `default_target=AgentTarget(coordinator)` routes every other turn (the user's kickoff, each specialist's reply) back to the coordinator.

The whole topology fits in four lines of `TransitionGraph`:

```python
graph = TransitionGraph(
    initial_speaker=user.agent_id,                     # the user kicks off
    transitions=[],                                    # no per-specialist rules
    default_target=AgentTarget(coordinator.agent_id),  # every non-handoff turn -> coordinator
    max_turns=14,
)
```

### Tools the coordinator carries

Both tools are user-authored on the coordinator agent. Each returns a `ToolResult(..., final=True)` wrapping the typed routing object:

```python
from typing import Literal
from autogen.beta import ToolResult
from autogen.beta.network import Finish, Handoff

async def handoff(to: Literal["researcher", "critic", "cost_analyst", "operator"], reason: str = "") -> ToolResult:
    """Hand off the conversation to a participant by name."""
    return ToolResult(Handoff(target=to, reason=reason), final=True)

async def finish(summary: str = "") -> ToolResult:
    """End the conversation cleanly with a brief summary."""
    return ToolResult(Finish(summary=summary), final=True)

# ... then register them on the coordinator agent:
coordinator_agent.tool(handoff)
coordinator_agent.tool(finish)
```

!!! warning "Why the routing tools must be terminal (`final=True`)"
    An `Agent.ask` round runs a **multi-step tool loop**: the LLM calls a tool, sees the result, and may call another - repeating until it stops. A coordinator with plain `handoff` / `finish` tools will call `handoff` *and* `finish` within a single round.

    The workflow resolves a round's routing **first-emit-wins** - it acts on the first routing tool and silently drops the rest. So a round that calls `handoff` then `finish` routes the handoff and *loses the finish* - the channel never closes and the coordinator loops.

    Returning `ToolResult(..., final=True)` makes each routing tool **terminal**: the first call ends the round, so every coordinator turn carries exactly one routing intent. Pair it with disabling parallel tool calls on the coordinator's config (`disable_parallel_tool_use` - see the [Star pattern](star.md)'s "Parallel-call defence" note) so the model also commits to one tool per response.

!!! tip "Constraining `to` to known names"
    Typing `to` as a `Literal[...]` over the participant names lets the LLM's tool schema constrain its picks at the model layer - the LLM can't invent a name. If you'd rather keep targets fully dynamic (e.g. the participant set isn't known at decoration time), use `to: str` and list the valid names in the docstring; the framework's `hub.name_for` resolves unknown names to themselves, so a hallucinated target surfaces as a `validate_send` mismatch on the next envelope.

### How the coordinator knows the roster

For the coordinator to pick the *right* specialist it needs to know what each one is good at. That knowledge comes from two author-supplied sources:

* **The `handoff` tool's `Literal[...]`** - constrains *which* names are valid, surfaced to the LLM as a tool-schema enum. It says nothing about what each specialist does.
* **The system prompt's roster block** - describes *what each specialist is for*. The code below builds it from each specialist's `Resume.summary`:

```python
researcher_resume = Resume(
    summary="Evidence-based analyst - cites studies, benchmarks, and data.",
    claimed_capabilities=["research", "literature-review"],
)
critic_resume = Resume(
    summary="Skeptical risk analyst - surfaces failure modes and overlooked costs.",
    claimed_capabilities=["risk-analysis", "review"],
)
cost_resume = Resume(
    summary="Cost analyst - models total cost of ownership and budget impact.",
    claimed_capabilities=["cost-modelling", "budgeting"],
)
operator_resume = Resume(
    summary="Operations engineer - speaks to day-2 ops, on-call load, and maintenance.",
    claimed_capabilities=["operations", "reliability"],
)

roster = {
    "researcher": researcher_resume,
    "critic": critic_resume,
    "cost_analyst": cost_resume,
    "operator": operator_resume,
}
roster_block = "\n".join(f"- `{name}`: {r.summary}" for name, r in roster.items())
# coordinator prompt embeds f"{roster_block}"; the same Resume objects
# are passed to register() so the prompt and the hub registry agree.
```

Each `Resume` is built once and used twice - folded into the coordinator's prompt, and passed to `register(...)` so the hub registry carries the same description. Because the prompt is fixed when the coordinator `Agent` is constructed, the resumes must exist first.

!!! note "Static snapshot vs. dynamic discovery"
    Baking the roster into the prompt captures it at construction time - fine for a fixed-roster channel. If specialists join or leave mid-run, the prompt goes stale. For a churning roster, register the coordinator with `attach_plugin=True` (the default) and let it call the `peers` tool to read peers' `Resume` / `skill_md` from the hub at runtime. You can also fetch a resume directly with `await hub.get_resume(agent_id)`.

### Agent Flow

```mermaid
sequenceDiagram
    participant User as user
    participant C as coordinator
    participant R as researcher
    participant K as critic
    participant Co as cost_analyst
    participant O as operator

    User->>C: kickoff (default_target -> coordinator)
    C->>R: handoff(to="researcher") - final=True ends the round
    R->>C: text reply (default_target -> coordinator)
    C->>K: handoff(to="critic")
    K->>C: text reply (default_target -> coordinator)
    C->>Co: handoff(to="cost_analyst")
    Co->>C: text reply (default_target -> coordinator)
    C->>O: handoff(to="operator")
    O->>C: text reply (default_target -> coordinator)
    C->>C: finish(summary=...)
    Note over C: routing.kind="finish" -> channel closes
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| `AutoPattern` builds the group chat from `agents=[...]` + an LLM-driven `group_manager` | `TransitionGraph(initial_speaker=user, transitions=[], default_target=AgentTarget(coordinator))` plus the coordinator's two tools |
| `GroupChatManager` runs the manager's LLM each turn and parses `next_agent` from its output | Coordinator is a regular `Agent` whose LLM calls `handoff(to=...)`; the framework reads the typed `Handoff` return and routes |
| One pseudo-tool per agent in classic, generated by `AutoPattern` | One generic `handoff(to)` tool the user writes once |
| Termination via `is_termination_msg` predicate or `max_round` | Termination via the coordinator's `finish(summary)` tool returning `Finish`, or `max_turns` on the graph |

## Code

!!! tip
    Every LLM-driven agent uses `AnthropicConfig(model="claude-sonnet-4-6")` - set `ANTHROPIC_API_KEY` in your environment (the script calls `load_dotenv()`), or swap in `OpenAIConfig` / `GeminiConfig` to run on another provider. The `user` agent uses `TestConfig()` because it only opens the channel and sends the kickoff - it never takes an LLM turn. All agents register with `attach_plugin=False`: none of them need the `NetworkPlugin` tools (`say` / `delegate` / `peers` / ...), and leaving those off keeps the coordinator's tool surface to just `handoff` and `finish`.

```python
"""Cookbook - Coordinator (manager + specialists) pattern.

One coordinator at the centre directs several specialists via a single
generic ``handoff(to)`` tool. Specialists reply; the channel default
routes every non-handoff turn back to the coordinator. The coordinator
ends the channel with a ``finish(summary)`` tool.

Both routing tools return ``ToolResult(..., final=True)`` so each
coordinator round produces exactly one routing decision.
"""

import asyncio
from typing import Literal

from dotenv import load_dotenv

from autogen.beta import Agent, ToolResult
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_CHANNEL_CLOSED,
    EV_PACKET,
    EV_TEXT,
    WORKFLOW_TYPE,
    AgentTarget,
    Finish,
    Handoff,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    TransitionGraph,
)
from autogen.beta.testing import TestConfig

load_dotenv()

async def handoff(to: Literal["researcher", "critic", "cost_analyst", "operator"], reason: str = "") -> ToolResult:
    """Hand off the conversation to a specialist by name."""
    return ToolResult(Handoff(target=to, reason=reason), final=True)

async def finish(summary: str = "") -> ToolResult:
    """End the conversation with a summary recommendation."""
    return ToolResult(Finish(summary=summary), final=True)

async def main() -> None:
    config = AnthropicConfig(
        model="claude-sonnet-4-6",
        extra_body={"tool_choice": {"type": "auto", "disable_parallel_tool_use": True{{ "}}" }},
    )

    hub_obj = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub_obj)

    user_hc = HubClient(link, hub=hub_obj)
    coord_hc = HubClient(link, hub=hub_obj)
    researcher_hc = HubClient(link, hub=hub_obj)
    critic_hc = HubClient(link, hub=hub_obj)
    cost_hc = HubClient(link, hub=hub_obj)
    operator_hc = HubClient(link, hub=hub_obj)

    # Each Resume is built once and used twice: folded into the
    # coordinator's system message and passed to register().
    researcher_resume = Resume(
        summary="Evidence-based analyst - cites studies, benchmarks, and data.",
        claimed_capabilities=["research", "literature-review"],
    )
    critic_resume = Resume(
        summary="Skeptical risk analyst - surfaces failure modes and overlooked costs.",
        claimed_capabilities=["risk-analysis", "review"],
    )
    cost_resume = Resume(
        summary="Cost analyst - models total cost of ownership and budget impact.",
        claimed_capabilities=["cost-modelling", "budgeting"],
    )
    operator_resume = Resume(
        summary="Operations engineer - speaks to day-2 ops, on-call load, and maintenance.",
        claimed_capabilities=["operations", "reliability"],
    )

    roster = {
        "researcher": researcher_resume,
        "critic": critic_resume,
        "cost_analyst": cost_resume,
        "operator": operator_resume,
    }
    roster_block = "\n".join(f"- `{name}`: {r.summary}" for name, r in roster.items())

    user_agent = Agent("user", config=TestConfig())

    coord_agent = Agent(
        "coordinator",
        prompt=(
            "You coordinate a discussion between these specialists:\n"
            f"{roster_block}\n"
            "\n"
            "Each turn, do exactly ONE of:\n"
            "- Call `handoff(to=<name>, reason=...)` to consult a "
            "specialist who has not yet weighed in.\n"
            "- Call `finish(summary=...)` once ALL specialists have "
            "contributed, summarising a recommendation."
        ),
        config=config,
    )
    coord_agent.tool(handoff)
    coord_agent.tool(finish)

    researcher_agent = Agent(
        "researcher",
        prompt=(
            "You are a researcher. When consulted, give a concise, "
            "evidence-based input - cite relevant studies, benchmarks, "
            "or data. 2-3 sentences. You have no tools; just reply."
        ),
        config=config,
    )
    critic_agent = Agent(
        "critic",
        prompt=(
            "You are a critic. When consulted, give a skeptical, "
            "risk-focused take - what could go wrong, what's being "
            "overlooked. 2-3 sentences. You have no tools; just reply."
        ),
        config=config,
    )
    cost_agent = Agent(
        "cost_analyst",
        prompt=(
            "You are a cost analyst. When consulted, give a concise "
            "take on total cost of ownership and budget impact - "
            "name the cost drivers. 2-3 sentences. You have no tools; "
            "just reply."
        ),
        config=config,
    )
    operator_agent = Agent(
        "operator",
        prompt=(
            "You are an operations engineer. When consulted, speak to "
            "day-2 operations - on-call load, maintenance, and "
            "reliability. 2-3 sentences. You have no tools; just reply."
        ),
        config=config,
    )

    user = await user_hc.register(user_agent, Passport(name="user"), Resume(), attach_plugin=False)
    coordinator = await coord_hc.register(
        coord_agent, Passport(name="coordinator"), Resume(), attach_plugin=False
    )
    researcher = await researcher_hc.register(
        researcher_agent, Passport(name="researcher"), researcher_resume, attach_plugin=False
    )
    critic = await critic_hc.register(critic_agent, Passport(name="critic"), critic_resume, attach_plugin=False)
    cost_analyst = await cost_hc.register(
        cost_agent, Passport(name="cost_analyst"), cost_resume, attach_plugin=False
    )
    operator = await operator_hc.register(
        operator_agent, Passport(name="operator"), operator_resume, attach_plugin=False
    )

    graph = TransitionGraph(
        initial_speaker=user.agent_id,
        transitions=[],
        default_target=AgentTarget(coordinator.agent_id),
        max_turns=14,
    )

    channel = await user.open(
        type=WORKFLOW_TYPE,
        target=[
            coordinator.agent_id,
            researcher.agent_id,
            critic.agent_id,
            cost_analyst.agent_id,
            operator.agent_id,
        ],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        user.agent_id: "user",
        coordinator.agent_id: "coordinator",
        researcher.agent_id: "researcher",
        critic.agent_id: "critic",
        cost_analyst.agent_id: "cost_analyst",
        operator.agent_id: "operator",
    }

    await channel.send("Should we adopt Kubernetes for an 8-person startup?")

    close_env = await user.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=180.0,
    )

    for env in await hub_obj.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>13}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            kind = routing.get("kind")
            if kind == "handoff":
                target = name_by_id.get(routing.get("target", ""), "?")
                print(f"{speaker:>13}: [handoff -> {target}] {routing.get('reason', '')}")
            elif kind == "finish":
                print(f"{speaker:>13}: [finish] {routing.get('summary', '')}")
            else:
                body = env.event_data.get("body", "")
                if body:
                    print(f"{speaker:>13}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    await user_hc.close()
    await coord_hc.close()
    await researcher_hc.close()
    await critic_hc.close()
    await cost_hc.close()
    await operator_hc.close()
    await hub_obj.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

A real `claude-sonnet-4-6` run. The coordinator's turn-by-turn
selection is LLM-driven, so wording varies between runs; the specialist
replies and the final summary below are abridged for length.

```console
channel: e1d5855c...

         user: Should we adopt Kubernetes for an 8-person startup?
  coordinator: [handoff -> researcher] Need evidence-based analysis on Kubernetes adoption for small startups, including benchmarks and industry data.
   researcher: A CNCF survey (2022) found Kubernetes complexity is the #1 challenge, and
              teams without dedicated infra staff spend disproportionate time on operations.
              Managed alternatives (Railway, Render, ECS) handle startup-scale workloads
              adequately; ROI on Kubernetes typically appears at 50+ engineers or 20+
              services.
  coordinator: [handoff -> critic] Researcher has weighed in. Surface failure modes and hidden risks on both sides.
       critic: The "managed platform" advice glosses over vendor lock-in: Railway and
              Render have repriced and had outages that leave small teams stranded. There's
              also a knowledge-concentration risk - if the one engineer who understands
              your cluster leaves, you can't safely modify production. And both sides
              assume you know your scaling trajectory, which most 8-person startups don't.
  coordinator: [handoff -> cost_analyst] Need TCO modelling comparing Kubernetes vs. managed alternatives.
 cost_analyst: Kubernetes Year 1 TCO is 3-5x higher than managed platforms: ~$75-100K/year
              in engineering labor (0.5 FTE at startup salaries) plus tooling, vs. $5-20K
              on Render/ECS. The cost is front-loaded and fixed regardless of traction -
              dangerous burn-rate structure pre-PMF.
  coordinator: [handoff -> operator] Need day-2 operations perspective before finalising.
     operator: Week to week, Kubernetes means cert rotations, node upgrades, etcd health,
              and CrashLoopBackOffs landing on whoever knows the cluster - often at 2am
              before a product sprint. Managed platforms push that entire maintenance
              surface to the vendor, which at your size is almost always the right trade.
  coordinator: [finish] ## Recommendation: Skip Kubernetes - For Now
              Evidence, cost, risk, and ops all point the same way. Use a managed platform
              (Render, Fly.io, AWS ECS) until you have a dedicated platform hire and
              genuine multi-service complexity. Containerise now with Docker to keep the
              future migration tractable.

closed: reason='finished'
```

## When to reach for this pattern

This pattern works when:

* The coordinator's LLM is the only decision-maker - specialists answer when asked but don't pick the next turn.
* The set of specialists is small enough that the coordinator can reason about which one to consult (typically &lt; 10).
* You want to add or remove specialists by editing the participant list, not by rewriting graph rules.
* Termination is a deliberate LLM call (`finish`), not a side-effect of state or a turn cap.

When it's not a fit:

* If specialists hand off to each other (specialist -> specialist without going through the coordinator), you need explicit `FromSpeaker -> AgentTarget` rules - the default-to-coordinator target no longer covers it. See [Hierarchical](hierarchical.md) for nested coordination.
* If the LLM needs to decide *whether* to continue rather than *who* to ask, you may want a `ContextEquals` gate. See [Feedback Loop](feedback_loop.md).
* If you need parallel fan-out + synthesis (ask all specialists, then summarise), see [Star](star.md) - same idea, different control flow.

## See Also

- [Workflow Adapter](../workflow.md) - the underlying graph machinery, dynamic `Handoff`, and `Finish` typed return.
- [Star](star.md) - fan-out variant with WAL-gated synthesis instead of LLM-driven turn-by-turn selection.
- [Hierarchical](hierarchical.md) - when specialists need their own sub-coordinators.
- [Migrating from Group Chat](../migration_from_group_chat.md) - the broader translation table for classic patterns.

---

# Escalation

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/escalation/

The Escalation pattern flows a request up a tiered support stack.
Each tier either resolves the request (terminating the workflow) or
escalates to the next tier.

**Classic (non-beta) primitives:** `DefaultPattern`,
`ExpressionContextCondition("confidence < 7")`,
`ReplyResult(context_variables={"confidence": 6}, target=AgentTarget(tier2))`
from tier-1 to record its self-rated confidence and hand off.

### Key Characteristics

* **Fixed tier order.** `tier1` -> `tier2` -> `senior`. Each tier knows
  only its own tools, so the senior agent has no `escalate_to_*` tool
  registered - the buck stops there structurally, not by prompt
  convention.
* **Two routing idioms in one graph.** Escalation uses typed
  `Handoff` returns; termination uses a `ToolCalled` graph
  rule. The two compose freely.

### Routing Mechanics

* **Typed `Handoff` returns** for escalation - each tier's
  `escalate_to_*` tool returns a `Handoff(target=...)`
  carrying the next tier's name. The framework reads it from the
  agent's local `ToolResultEvent` stream after the round and
  stamps it onto the packet's `routing.target`. Mirrors classic
  AG2's `ReplyResult(target=AgentTarget(...))` pattern.
* **`set_context` + `ToolCalled` graph rule** for
  termination - the `resolve` tool writes the answer into
  `context_vars["resolution"]` and a `ToolCalled("resolve")`
  graph rule fires the terminate transition.

## Agent Flow

```mermaid
sequenceDiagram
    participant User as user
    participant Tier1 as tier1
    participant Tier2 as tier2
    participant Senior as senior

    User->>Tier1: question
    alt question in tier-1 scope
        Tier1->>User: resolve(answer); ToolCalled("resolve") -> terminate
    else out of scope
        Tier1->>Tier2: Handoff(target="tier2", reason=...)
        alt question in tier-2 scope
            Tier2->>User: resolve(answer); ToolCalled("resolve") -> terminate
        else genuinely edge case
            Tier2->>Senior: Handoff(target="senior", reason=...)
            Senior->>User: resolve(answer); ToolCalled("resolve") -> terminate
        end
    end
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| `ReplyResult(target=AgentTarget(tier2), context_variables={...})` | `return Handoff(target="tier2", reason=...)` |
| `ContextVariables[...] = answer` to record the answer | `await set_context(channel, "resolution", answer)` |
| Confidence threshold via `ExpressionContextCondition("confidence < 7")` | Custom `ContextThreshold` condition (recipe in [Context Variables -> Custom Conditions](../context_variables.md#custom-conditions)) |

### Gaps & Workarounds

* **`ContextThreshold` not shipped as a built-in.** If you'd
  rather the *graph* check a confidence threshold (e.g.
  `"confidence < 7"` decides escalation, not the tool name),
  register a `ContextThreshold` condition.

## Code

!!! tip
    All three tiers use real Sonnet so the escalation decision is
    genuinely LLM-driven. The demo's tier-2 prompt has a small rule
    that forces escalation when the question mentions "compressor",
    "GDPR", or "compliance" - the sample input ("refrigerator
    compressor") follows that path all the way to senior.

```python
"""Cookbook 04 - Escalation pattern.

A request flows up a tiered support stack. Each tier either
``resolve``\\ s the request (terminating the workflow) or escalates
to the next tier. Demonstrates two routing idioms together:

* Typed ``Handoff`` returns for escalation - each tier's
  ``escalate_to_*`` tool returns a ``Handoff(target=...)`` carrying
  the next tier's name. Mirrors classic AG2's
  ``ReplyResult(target=AgentTarget(...))`` pattern.
* ``set_context`` + ``ToolCalled`` graph rule for termination -
  the ``resolve`` tool writes the answer into
  ``context_vars["resolution"]`` and a ``ToolCalled("resolve")``
  graph rule fires the terminate transition.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_PACKET,
    EV_CHANNEL_CLOSED,
    EV_TEXT,
    WORKFLOW_TYPE,
    AgentTarget,
    FromSpeaker,
    Handoff,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    ChannelInject,
    TerminateTarget,
    ToolCalled,
    Transition,
    TransitionGraph,
)
from autogen.beta.network.workflow_helpers import set_context
from autogen.beta.testing import TestConfig

load_dotenv()

async def escalate_to_tier2(reason: str) -> Handoff:
    """Escalate to the tier-2 specialist. The returned Handoff
    carries the target's Passport.name; the framework resolves
    it and routes the next turn there."""
    print(f"  [tool] escalate_to_tier2({reason!r})")
    return Handoff(target="tier2", reason=reason)

async def escalate_to_senior(reason: str) -> Handoff:
    """Escalate to senior support."""
    print(f"  [tool] escalate_to_senior({reason!r})")
    return Handoff(target="senior", reason=reason)

async def resolve(answer: str, channel: ChannelInject) -> str:
    """Resolve the request. Stores the answer in
    context_vars['resolution'] and the graph's
    ToolCalled('resolve') rule terminates the workflow."""
    if channel is None:
        return "no channel"
    print(f"  [tool] resolve({answer[:60]!r}...)")
    await set_context(channel, "resolution", answer)
    return "resolved"

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    hub_obj = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub_obj)

    user_hc = HubClient(link, hub=hub_obj)
    tier1_hc = HubClient(link, hub=hub_obj)
    tier2_hc = HubClient(link, hub=hub_obj)
    senior_hc = HubClient(link, hub=hub_obj)

    user_agent = Agent("user", config=TestConfig())

    tier1_agent = Agent(
        "tier1",
        prompt=(
            "You are tier-1 support. You only handle simple FAQ-like "
            "questions: business hours, general policies, account "
            "lookups. ANY technical diagnosis, billing dispute, or "
            "specialist topic is OUT OF SCOPE for you - call "
            "`escalate_to_tier2(reason)` immediately. Otherwise call "
            "`resolve(answer)` with a 1-line answer.\n"
            "\n"
            "Call exactly ONE tool. Don't write a separate body."
        ),
        config=config,
    )
    tier1_agent.tool(escalate_to_tier2)
    tier1_agent.tool(resolve)

    tier2_agent = Agent(
        "tier2",
        prompt=(
            "You are tier-2 support - a domain specialist. You handle "
            "most technical questions in your area. For genuinely edge "
            "cases (rare hardware faults, legal compliance questions, "
            "anything requiring senior judgement) call "
            "`escalate_to_senior(reason)`. Otherwise call "
            "`resolve(answer)` with a 1-2 sentence answer.\n"
            "\n"
            "For demo purposes, if the question mentions 'compressor', "
            "'GDPR', or 'compliance', escalate. Otherwise resolve.\n"
            "\n"
            "Call exactly ONE tool. Don't write a separate body."
        ),
        config=config,
    )
    tier2_agent.tool(escalate_to_senior)
    tier2_agent.tool(resolve)

    senior_agent = Agent(
        "senior",
        prompt=(
            "You are senior support - the buck stops here. Whatever the "
            "question, you provide a confident, concise answer. Call "
            "`resolve(answer)` with a 1-2 sentence answer. Never "
            "escalate (you have no escalate tool).\n"
            "\n"
            "Call exactly ONE tool. Don't write a separate body."
        ),
        config=config,
    )
    senior_agent.tool(resolve)

    user = await user_hc.register(user_agent, Passport(name="user"), Resume())
    tier1 = await tier1_hc.register(tier1_agent, Passport(name="tier1"), Resume())
    tier2 = await tier2_hc.register(tier2_agent, Passport(name="tier2"), Resume())
    senior = await senior_hc.register(senior_agent, Passport(name="senior"), Resume())

    graph = TransitionGraph(
        initial_speaker=user.agent_id,
        transitions=[
            # resolve terminates the workflow.
            Transition(when=ToolCalled("resolve"), then=TerminateTarget("resolved")),
            # User's question -> tier1. Escalation FROM tier1 / tier2
            # is via Handoff returns from escalate_to_* tools - the
            # framework reads target from the Handoff and stamps it
            # onto the packet, so no graph rules are needed for those
            # edges.
            Transition(when=FromSpeaker(user.agent_id), then=AgentTarget(tier1.agent_id)),
        ],
        default_target=TerminateTarget("fall_through"),
        max_turns=10,
    )

    channel = await user.open(
        type=WORKFLOW_TYPE,
        target=[tier1.agent_id, tier2.agent_id, senior.agent_id],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        user.agent_id: "user",
        tier1.agent_id: "tier1",
        tier2.agent_id: "tier2",
        senior.agent_id: "senior",
    }

    await channel.send(
        "My refrigerator's compressor is humming louder than usual and "
        "occasionally clicks. What's wrong and how do I fix it?"
    )

    # Wait for the workflow to terminate (any of the five close routes
    # documented in /docs/beta/network/termination - this demo uses
    # ToolCalled("resolve") -> TerminateTarget("resolved")).
    close_env = await user.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=180.0,
    )

    # Print the transcript from the WAL after close.
    for env in await hub_obj.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    print("\n--- final resolution ---")
    state = hub_obj._adapter_states[channel.channel_id]
    print(state.context_vars.get("resolution", "(no resolution)"))

    await user_hc.close()
    await tier1_hc.close()
    await tier2_hc.close()
    await senior_hc.close()
    await hub_obj.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

```console
channel: c047...

           user: My refrigerator's compressor is humming louder than usual and occasionally clicks. What's wrong and how do I fix it?
  [tool] escalate_to_tier2('Customer is reporting a technical issue with a refrigerator compressor that requires technical diagnosis, which is outside tier-1 scope.')
          tier1: [Handed off via escalate_to_tier2] Customer is reporting a technical issue with a refrigerator compressor that requires technical diagnosis, which is outside tier-1 scope.
  [tool] escalate_to_senior("Customer is reporting issues with a refrigerator compressor - per domain rules, any question mentioning 'compressor' requires senior escalation for proper diagnosis.")
          tier2: [Handed off via escalate_to_senior] Customer is reporting issues with a refrigerator compressor - per domain rules, any question mentioning 'compressor' requires senior escalation for proper diagnosis.
  [tool] resolve('A loud humming combined with clicking from your refrigerator'...)
         senior: [Handed off via resolve]

closed: reason='resolved'

--- final resolution ---
A loud humming combined with clicking from your refrigerator compressor most commonly points to a faulty start relay - a small, inexpensive part that helps the compressor start up; when it fails, the compressor tries to start, clicks off, and hums under the strain. You can confirm this by removing the start relay (a small component plugged into the side of the compressor at the back of the fridge) and shaking it - if it rattles, it's bad and needs replacing.
```

---

# Redundant

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/redundant/

The Redundant pattern sends the same problem to multiple specialists
with different perspectives; an evaluator at the end picks the best
answer or synthesises across them.

**Classic (non-beta) primitives:** `DefaultPattern` with parallel dispatch,
evaluator at the end, `ContextVariables` collecting per-specialist
results.

### Key Characteristics

* **Fan-out via sequence.** `TransitionGraph.sequence` runs each
  proposer in turn. Proposers are prompted to *differentiate* (e.g.
  "suggest something different from prior proposals - clever pun").
* **Evaluator at the end.** The evaluator sees all three proposals
  in its projected history and picks the best one.
* **`sequence_complete` terminates** after the evaluator's reply.

### Routing Mechanics

There is no routing tool in this pattern - every step is a plain
`FromSpeaker(a) -> AgentTarget(b)` rule wired by
`TransitionGraph.sequence([...])`. Each proposer's reply is
visible to subsequent proposers via the windowed view, so the prompt
"suggest something DIFFERENT from any prior proposal" works without
any explicit state tracking.

## Agent Flow

```mermaid
sequenceDiagram
    participant Intake as intake
    participant Safe as proposer_safe
    participant Clever as proposer_clever
    participant Nerdy as proposer_nerdy
    participant Evaluator as evaluator

    Intake->>Safe: kickoff (FromSpeaker -> AgentTarget)
    Safe->>Clever: [safe]: <name> - <rationale>
    Clever->>Nerdy: [clever]: <name> - <rationale>
    Nerdy->>Evaluator: [nerdy]: <name> - <rationale>
    Evaluator->>Intake: WINNER / NAME / REASON
    Note over Intake,Evaluator: TerminateTarget("sequence_complete") fires after evaluator's reply
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| Parallel dispatch to N specialists | Sequential `TransitionGraph.sequence([...])` (no parallel today) |
| `ContextVariables` collecting per-specialist results | Each proposer reads prior proposals from the windowed view; evaluator reads all three from its context |
| Evaluator picks via `ReplyResult` and signals completion | Evaluator writes a plain text reply; `sequence_complete` terminates the run automatically |

### Gaps & Workarounds

* **No parallel dispatch.** Classic Redundant could fan out to three
  specialists *simultaneously* and the evaluator would receive all
  three responses. Beta workflow is strictly sequential. Workaround:
  the sequential version above is functionally equivalent for
  synthesis, just slower (3× LLM latency instead of 1×). For genuine
  parallelism, run the specialists in separate `consulting`
  channels opened by the asker's tool, gather replies via
  `wait_for_channel_event`, then write the results back into
  the main workflow's `context_vars` via `set_context`.
  Heavier but parallel.
* **No `NestedChatTarget`.** Same as Hierarchical - child
  channels are the workaround when you genuinely want isolated
  specialist runs.

## Code

!!! tip
    All four agents use real Sonnet so the proposals and the
    evaluator's pick are genuinely LLM-driven.

```python
"""Cookbook 05 - Redundant pattern.

The same task is given to multiple agents with different perspectives;
an evaluator at the end picks the best response. In classic AG2 the
fan-out can be parallel - three agents work concurrently. Beta
workflow has no parallel dispatch yet, so this demo shows the
sequential equivalent.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_PACKET,
    EV_CHANNEL_CLOSED,
    EV_TEXT,
    WORKFLOW_TYPE,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    TransitionGraph,
)
from autogen.beta.testing import TestConfig

load_dotenv()

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    hub_obj = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub_obj)

    intake_hc = HubClient(link, hub=hub_obj)
    safe_hc = HubClient(link, hub=hub_obj)
    clever_hc = HubClient(link, hub=hub_obj)
    nerdy_hc = HubClient(link, hub=hub_obj)
    evaluator_hc = HubClient(link, hub=hub_obj)

    intake_agent = Agent("intake", config=TestConfig())

    safe_agent = Agent(
        "proposer_safe",
        prompt=(
            "You are the SAFE proposer. Suggest exactly ONE name for "
            "the product the user describes. Pick something traditional "
            "and professional - no puns, no jargon. Reply in one line: "
            "`[safe]: <name> - <one-sentence rationale>`."
        ),
        config=config,
    )
    clever_agent = Agent(
        "proposer_clever",
        prompt=(
            "You are the CLEVER proposer. You can see prior proposals "
            "in the conversation. Suggest exactly ONE name that is "
            "DIFFERENT from any prior proposal - a clever pun or word "
            "play. Reply in one line: "
            "`[clever]: <name> - <one-sentence rationale>`."
        ),
        config=config,
    )
    nerdy_agent = Agent(
        "proposer_nerdy",
        prompt=(
            "You are the NERDY proposer. You can see prior proposals. "
            "Suggest exactly ONE name DIFFERENT from any prior - a "
            "reference to programming, sci-fi, or maths culture. Reply "
            "in one line: `[nerdy]: <name> - <one-sentence rationale>`."
        ),
        config=config,
    )
    evaluator_agent = Agent(
        "evaluator",
        prompt=(
            "You are the evaluator. The conversation contains three "
            "name proposals tagged `[safe]:`, `[clever]:`, and "
            "`[nerdy]:`. Pick the ONE you think best balances clarity "
            "and memorability for the product. Reply in this format:\n"
            "\n"
            "  WINNER: <one of safe / clever / nerdy>\n"
            "  NAME: <the chosen name>\n"
            "  REASON: <one-sentence reason>"
        ),
        config=config,
    )

    intake = await intake_hc.register(intake_agent, Passport(name="intake"), Resume())
    safe = await safe_hc.register(safe_agent, Passport(name="safe"), Resume())
    clever = await clever_hc.register(clever_agent, Passport(name="clever"), Resume())
    nerdy = await nerdy_hc.register(nerdy_agent, Passport(name="nerdy"), Resume())
    evaluator = await evaluator_hc.register(evaluator_agent, Passport(name="evaluator"), Resume())

    graph = TransitionGraph.sequence([
        intake.agent_id,
        safe.agent_id,
        clever.agent_id,
        nerdy.agent_id,
        evaluator.agent_id,
    ])

    channel = await intake.open(
        type=WORKFLOW_TYPE,
        target=[safe.agent_id, clever.agent_id, nerdy.agent_id, evaluator.agent_id],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        intake.agent_id: "intake",
        safe.agent_id: "safe",
        clever.agent_id: "clever",
        nerdy.agent_id: "nerdy",
        evaluator.agent_id: "evaluator",
    }

    await channel.send(
        "Suggest a name for our new code-review SaaS. It runs as a "
        "GitHub bot and gives senior-engineer-style feedback on PRs."
    )

    # Wait for the workflow to terminate (any of the five close routes
    # documented in /docs/beta/network/termination - this demo uses
    # TerminateTarget("sequence_complete") after the evaluator's reply).
    close_env = await intake.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=240.0,
    )

    # Print the transcript from the WAL after close.
    for env in await hub_obj.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    await intake_hc.close()
    await safe_hc.close()
    await clever_hc.close()
    await nerdy_hc.close()
    await evaluator_hc.close()
    await hub_obj.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

```console
channel: 7e91...

         intake: Suggest a name for our new code-review SaaS. It runs as a GitHub bot and gives senior-engineer-style feedback on PRs.
           safe: [safe]: ReviewPro - a clear, professional name signalling thorough code review with senior-level expertise.
         clever: [clever]: PullSenior - a pun on "pull request" + "senior" that hints at the bot's seniority while staying memorable.
          nerdy: [nerdy]: Linus's Reviewer - a nod to Linus Torvalds's famously direct kernel reviews, signalling the bot delivers no-nonsense senior-engineer feedback.
      evaluator: WINNER: clever
                 NAME: PullSenior
                 REASON: PullSenior is memorable, on-brand for GitHub workflows, and immediately conveys the product's value (senior-level pull-request review) without sacrificing clarity.

closed: reason='sequence_complete'
```

---

# Feedback Loop

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/feedback_loop/

The Feedback Loop pattern alternates a drafter and a reviewer until
the reviewer flips a `done` flag in context, or `max_turns`
fires as the safety cap. Drafter writes, reviewer either approves or
gives concrete feedback for revision, drafter incorporates, reviewer
re-evaluates - until satisfied.

**Classic (non-beta) primitives:** `DefaultPattern`, `OnContextCondition`
checking an `iteration_needed` flag, `ReplyResult` updating that
flag, `max_round`.

### Key Characteristics

* **Two termination paths.**
    * Happy path: `ContextEquals("done", True) -> TerminateTarget("approved")`,
      fired when the reviewer's `approve` tool flips the flag.
    * Safety belt: `default_target=TerminateTarget("max_iterations")`
      paired with `max_turns=10`.
* **Intake agent owns kickoff.** A separate `intake` agent (using
  `TestConfig`) owns the initial topic message so `drafter` is
  the first agent the graph hands control to (rather than seeing its
  own kickoff and bouncing straight to reviewer).

### Routing Mechanics

Under the packet execution model, `approve` uses `set_context`
to flip `done=True`. The reviewer's reply body (containing the
"APPROVED" note) flows naturally through the packet's `body` field
on the same turn - when the packet folds, `ContextEquals(done, True)`
matches the just-set value and the channel terminates with reason
`'approved'`.

If the reviewer instead just writes feedback as text (no `approve`
call), the loop continues - drafter is the next speaker via
`FromSpeaker(reviewer) -> AgentTarget(drafter)`. `max_turns=10`
is the safety cap.

## Agent Flow

```mermaid
sequenceDiagram
    participant Intake as intake
    participant Drafter as drafter
    participant Reviewer as reviewer

    Intake->>Drafter: topic (FromSpeaker -> AgentTarget)
    loop until reviewer approves or max_turns
        Drafter->>Reviewer: paragraph
        alt good enough
            Reviewer->>Reviewer: approve(reason); set_context("done", True); writes approval body
            Note over Reviewer: ContextEquals("done", True) -> TerminateTarget("approved")
        else needs work
            Reviewer->>Drafter: feedback (FromSpeaker -> AgentTarget)
        end
    end
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| `ReplyResult(context_variables={"iteration_needed": False})` | `await set_context(channel, "done", True)` |
| `ExpressionContextCondition("not iteration_needed")` | `ContextEquals("done", True)` |
| `max_round=12` on the pattern | `max_turns=12` on the graph |

## Code

!!! tip
    Drafter and reviewer both use real Sonnet - the reviewer
    genuinely decides when the draft is good enough. The reviewer's
    prompt enforces three *objective* structural rules (sentence
    count, last-token type, opening sentence length) so the
    feedback loop reliably iterates 1-2 times before approval
    rather than approving on round 1. Approve cap is the 4th round
    to avoid burning budget.

```python
"""Cookbook 06 - Feedback Loop pattern.

Drafter and reviewer alternate until the reviewer flips ``done=True``
in context (via the ``approve`` tool), or ``max_turns`` fires as the
safety cap.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_PACKET,
    EV_CHANNEL_CLOSED,
    EV_TEXT,
    WORKFLOW_TYPE,
    AgentTarget,
    ContextEquals,
    FromSpeaker,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    ChannelInject,
    TerminateTarget,
    Transition,
    TransitionGraph,
)
from autogen.beta.network.workflow_helpers import set_context
from autogen.beta.testing import TestConfig

load_dotenv()

async def approve(reason: str, channel: ChannelInject) -> str:
    """Mark the draft approved by setting done=True in context.
    The graph's ContextEquals(done, True) rule terminates the
    workflow; the reviewer can also write a short approval note in
    its reply body on the same turn - when the packet folds, the
    just-set done flag triggers terminate."""
    if channel is None:
        return "no channel"
    print(f"  [tool] approve({reason!r})")
    await set_context(channel, "done", True)
    return f"approved: {reason}"

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    hub = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub)

    intake_hc = HubClient(link, hub=hub)
    drafter_hc = HubClient(link, hub=hub)
    reviewer_hc = HubClient(link, hub=hub)

    intake_agent = Agent("intake", config=TestConfig())

    drafter_agent = Agent(
        "drafter",
        prompt=(
            "You are the drafter. The user gives a topic; you write "
            "an opening paragraph. On subsequent turns, the reviewer "
            "will give feedback as their reply - incorporate it and "
            "post a revised draft.\n"
            "\n"
            "Reply with ONE paragraph. No preamble, no headers, no "
            "labels - just the paragraph."
        ),
        config=config,
    )

    reviewer_agent = Agent(
        "reviewer",
        prompt=(
            "You are the reviewer. The drafter posts a paragraph. The "
            "house style requires that an opening paragraph satisfies "
            "ALL of these objective rules:\n"
            "\n"
            "  (a) exactly 3 sentences (count the periods);\n"
            "  (b) the final sentence's LAST TOKEN is a concrete "
            "number, percentage, or year (e.g. '62%', '2019', "
            "'30 seconds');\n"
            "  (c) opens with a single short sentence (≤ 12 words).\n"
            "\n"
            "Each turn:\n"
            "\n"
            "1. If the paragraph satisfies all three rules, call the "
            "`approve(reason)` tool with a one-sentence reason citing "
            "the rules. This terminates the workflow.\n"
            "\n"
            "2. If even one rule fails, DO NOT call `approve`. Reply "
            "with feedback that names the specific failing rule(s) so "
            "the drafter can revise. Be precise - don't editorialise "
            "on style, only on the rules above.\n"
            "\n"
            "Approve no later than the 4th round to avoid burning "
            "budget."
        ),
        config=config,
    )
    reviewer_agent.tool(approve)

    intake = await intake_hc.register(intake_agent, Passport(name="intake"), Resume())
    drafter = await drafter_hc.register(drafter_agent, Passport(name="drafter"), Resume())
    reviewer = await reviewer_hc.register(reviewer_agent, Passport(name="reviewer"), Resume())

    graph = TransitionGraph(
        initial_speaker=intake.agent_id,
        transitions=[
            # done=True flips the loop into terminate.
            Transition(when=ContextEquals("done", value=True), then=TerminateTarget("approved")),
            # intake kicks off -> drafter.
            Transition(when=FromSpeaker(intake.agent_id),   then=AgentTarget(drafter.agent_id)),
            # Otherwise alternate.
            Transition(when=FromSpeaker(drafter.agent_id),  then=AgentTarget(reviewer.agent_id)),
            Transition(when=FromSpeaker(reviewer.agent_id), then=AgentTarget(drafter.agent_id)),
        ],
        default_target=TerminateTarget("max_iterations"),
        max_turns=10,
    )

    channel = await intake.open(
        type=WORKFLOW_TYPE,
        target=[drafter.agent_id, reviewer.agent_id],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        intake.agent_id: "intake",
        drafter.agent_id: "drafter",
        reviewer.agent_id: "reviewer",
    }

    await channel.send(
        "Topic: write the opening paragraph of a blog post explaining "
        "why distributed systems are hard."
    )

    # Wait for the workflow to terminate (any of the five close routes
    # documented in /docs/beta/network/termination - this demo uses
    # ContextEquals("done", True) -> TerminateTarget("approved")).
    close_env = await intake.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=360.0,
    )

    # Print the transcript from the WAL after close.
    for env in await hub.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    await intake_hc.close()
    await drafter_hc.close()
    await reviewer_hc.close()
    await hub.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

A typical run iterates 1-2 times before approval - the structural
rules (especially the last-token-must-be-a-number rule) reliably
catch first-draft paragraphs and force a revision:

```console
channel: 5d3a...

         intake: Topic: write the opening paragraph of a blog post explaining why distributed systems are hard.
        drafter: Distributed systems are hard because they break the comfortable assumptions you can lean on. Networks fail in subtle ways, clocks drift, machines crash mid-operation, and there is no single source of truth. The result is a class of problems that look easy on a whiteboard but become surprisingly difficult in production.
       reviewer: Rule (b) fails - the final sentence's last token is 'production', not a concrete number, percentage, or year. Please revise so the closing sentence ends on a concrete metric (e.g. an outage statistic or year of a famous incident).
        drafter: Distributed systems are hard because they break single-process assumptions. Networks fail, clocks drift, machines crash, and no node is the single source of truth - only a tangle of messages spread across nodes that may or may not agree. Surveys consistently show that distributed-systems bugs cost the median engineering team upwards of 30%.
  [tool] approve("All three rules satisfied: 3 sentences, opening ≤ 12 words, final token is '30%' (a concrete percentage).")
       reviewer: APPROVED - all three structural rules now hold.

closed: reason='approved'
```

---

# Context-Aware Routing

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/context_aware_routing/

The Context-Aware Routing pattern uses a router agent to read the
user's request, classify it into a category, and dispatch to the
specialist whose domain matches.

**Classic primitives:** `DefaultPattern`, `StringLLMCondition`
(LLM-evaluated routing inside the framework), or
`ExpressionContextCondition` over a router-tool-set domain field.

### Key Characteristics

* **Router agent thin.** The router's only job is to classify and
  call the matching `classify_as_<category>` tool.
* **Dynamic `Handoff`.** Each classify tool returns
  `Handoff(target=<specialist>)` directly. The framework
  reads the `target` from the tool result and routes the
  next turn to that specialist without any graph condition - no
  `ContextEquals` rules needed.
* **Specialist's reply terminates.** A `FromSpeaker(<specialist>) -> TerminateTarget`
  rule closes the workflow after the specialist speaks.

### Routing Mechanics

Each `classify_as_<category>` tool returns
`Handoff(target=<specialist_name>)`. The workflow adapter
reads the `Handoff.target` from the tool's
`ToolResultEvent` and stamps it onto the outgoing packet as
`routing.target`. When `fold` processes that
packet, `expected_next_speaker` is set directly from
`routing.target`, bypassing the transition graph entirely.
No `ContextEquals` state variable is needed.

!!! note "Why not ContextEquals?"
    The original idiom - tools call `set_context` then return
    a string; the graph uses `ContextEquals` to pick the
    next speaker - stalls in practice.

    `set_context` emits a non-substantive `EV_CONTEXT_SET`
    envelope, and `ContextEquals` only evaluates when a
    *substantive* `EV_PACKET` follows. But the router's classify
    tool also returns a plain string. If the router produces no text body
    (as a minimal implementation would), the round is silent:
    `build_round_envelope` returns `None`, no
    `EV_PACKET` is posted, `fold` is never called,
    and `ContextEquals` never fires. The channel stalls.

    `Handoff` sidesteps this entirely - the routing target is
    resolved from the tool result, not from a graph condition evaluated
    on a subsequent envelope.

## Agent Flow

```mermaid
sequenceDiagram
    participant User as user
    participant Router as router
    participant Billing as billing
    participant Technical as technical
    participant General as general

    User->>Router: question
    Router->>Router: classify_as_<category>(reason) -> Handoff(target=<category>)
    Note over Router: Handoff.target routes directly to specialist
    alt Handoff(target="billing")
        Router->>Billing: AgentTarget(billing) via Handoff
        Billing->>User: answer; FromSpeaker(billing) -> TerminateTarget("billing_resolved")
    else Handoff(target="technical")
        Router->>Technical: AgentTarget(technical) via Handoff
        Technical->>User: answer; FromSpeaker(technical) -> TerminateTarget("technical_resolved")
    else Handoff(target="general")
        Router->>General: AgentTarget(general) via Handoff
        General->>User: answer; FromSpeaker(general) -> TerminateTarget("general_resolved")
    end
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| `StringLLMCondition` (framework asks LLM at the transition) | Router agent's LLM turn calls `classify_as_<category>` tool; tool returns `Handoff(target=specialist)` |
| `ReplyResult(context_variables={"category": "billing"}, target=AgentTarget(billing))` | `return Handoff(target="billing", reason=reason)` directly from the classify tool |
| `ExpressionContextCondition(...)` per category | No graph condition needed - `Handoff.target` is authoritative |

## Code

!!! tip
    The router uses real Sonnet (the classification is the
    LLM-driven part). Each specialist also uses real Sonnet to give
    a domain-flavoured reply.

```python
"""Cookbook 07 - Context-Aware Routing pattern.

A router agent reads the user's request, classifies it into a
category, and returns Handoff(target=specialist) from the classify
tool. The framework routes directly to the named specialist without
any ContextEquals graph conditions.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_CHANNEL_CLOSED,
    EV_PACKET,
    EV_TEXT,
    WORKFLOW_TYPE,
    AgentTarget,
    FromSpeaker,
    Handoff,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    TerminateTarget,
    Transition,
    TransitionGraph,
)
from autogen.beta.testing import TestConfig

load_dotenv()

async def classify_as_billing(reason: str) -> Handoff:
    """Classify the request as billing and route to the billing specialist."""
    print(f"  [tool] classify_as_billing({reason!r})")
    return Handoff(target="billing", reason=reason)

async def classify_as_technical(reason: str) -> Handoff:
    """Classify as technical and route to the technical specialist."""
    print(f"  [tool] classify_as_technical({reason!r})")
    return Handoff(target="technical", reason=reason)

async def classify_as_general(reason: str) -> Handoff:
    """Classify as general support and route to the general specialist."""
    print(f"  [tool] classify_as_general({reason!r})")
    return Handoff(target="general", reason=reason)

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    hub_obj = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub_obj)

    user_hc = HubClient(link, hub=hub_obj)
    router_hc = HubClient(link, hub=hub_obj)
    billing_hc = HubClient(link, hub=hub_obj)
    technical_hc = HubClient(link, hub=hub_obj)
    general_hc = HubClient(link, hub=hub_obj)

    user_agent = Agent("user", config=TestConfig())

    router_agent = Agent(
        "router",
        prompt=(
            "You are the routing agent. Classify the user's request "
            "into ONE of three categories and call the matching tool.\n"
            "\n"
            "Categories:\n"
            "* `classify_as_billing` - payment, refund, invoice, "
            "subscription tier, pricing.\n"
            "* `classify_as_technical` - bug, error, integration, "
            "API, setup, connectivity. Anything technical.\n"
            "* `classify_as_general` - account info, policy, FAQ, "
            "anything not billing or technical.\n"
            "\n"
            "Call exactly ONE tool with a short `reason` argument."
        ),
        config=config,
    )
    router_agent.tool(classify_as_billing)
    router_agent.tool(classify_as_technical)
    router_agent.tool(classify_as_general)

    billing_agent = Agent(
        "billing",
        prompt=(
            "You are the billing specialist. Reply in 1-2 sentences "
            "with concrete next steps for billing/payment/subscription "
            "issues. Don't escalate - just answer."
        ),
        config=config,
    )
    technical_agent = Agent(
        "technical",
        prompt=(
            "You are the technical specialist. Reply in 1-2 sentences "
            "with concrete diagnostic next steps for bugs, API errors, "
            "or integration problems. Don't escalate - just answer."
        ),
        config=config,
    )
    general_agent = Agent(
        "general",
        prompt=(
            "You are the general support specialist. Reply in 1-2 "
            "sentences answering account, policy, or FAQ questions. "
            "Don't escalate - just answer."
        ),
        config=config,
    )

    user = await user_hc.register(user_agent, Passport(name="user"), Resume())
    router = await router_hc.register(router_agent, Passport(name="router"), Resume())
    billing = await billing_hc.register(billing_agent, Passport(name="billing"), Resume())
    technical = await technical_hc.register(technical_agent, Passport(name="technical"), Resume())
    general = await general_hc.register(general_agent, Passport(name="general"), Resume())

    graph = TransitionGraph(
        initial_speaker=user.agent_id,
        transitions=[
            # Specialist's reply terminates.
            Transition(when=FromSpeaker(billing.agent_id),   then=TerminateTarget("billing_resolved")),
            Transition(when=FromSpeaker(technical.agent_id), then=TerminateTarget("technical_resolved")),
            Transition(when=FromSpeaker(general.agent_id),   then=TerminateTarget("general_resolved")),
            # User's question -> router. Routing to the specialist is
            # via Handoff returns from classify tools - no ContextEquals
            # rules needed.
            Transition(when=FromSpeaker(user.agent_id), then=AgentTarget(router.agent_id)),
        ],
        default_target=TerminateTarget("fall_through"),
        max_turns=10,
    )

    channel = await user.open(
        type=WORKFLOW_TYPE,
        target=[router.agent_id, billing.agent_id, technical.agent_id, general.agent_id],
        knobs={"graph": graph.to_dict()},
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        user.agent_id: "user",
        router.agent_id: "router",
        billing.agent_id: "billing",
        technical.agent_id: "technical",
        general.agent_id: "general",
    }

    await channel.send(
        "I tried to upgrade my subscription but the API is returning a "
        "500 error. The status page says everything is green. Help?"
    )

    close_env = await user.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=120.0,
    )

    for env in await hub_obj.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")

    await user_hc.close()
    await router_hc.close()
    await billing_hc.close()
    await technical_hc.close()
    await general_hc.close()
    await hub_obj.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

```console
channel: 8b4d...

           user: I tried to upgrade my subscription but the API is returning a 500 error. The status page says everything is green. Help?
  [tool] classify_as_technical('API returning 500 error during subscription upgrade - technical issue')
         router: [Handed off via classify_as_technical] API returning 500 error during subscription upgrade - technical issue
      technical: Capture the full request (endpoint, payload, response headers, and request-id) and re-try the upgrade - if the 500 persists, share the request-id so we can trace it server-side; check whether the failure is tied to a specific plan tier, payment method, or coupon, as those code paths are the most common 500 sources during upgrades.

closed: reason='technical_resolved'
```

---

# Triage with Tasks

Source: https://docs.ag2.ai/latest/docs/beta/network/pattern_cookbook/triage_with_tasks/

The Triage with Tasks pattern breaks a complex request into typed
tasks (research -> writing -> review). Each task type routes to a
specialist; tasks process sequentially, respecting prerequisite
ordering. A triage agent up front produces the plan that downstream
specialists work from.

**Classic (non-beta) primitives:** `DefaultPattern`, `OnContextCondition`
checking `current_task_type`, `ReplyResult` advancing the task
index.

### Key Characteristics

* **Triage produces a plan.** The triage agent's only job is to
  write a 2-3 sentence plan naming the three tasks and what each
  will produce for THIS specific request. Downstream specialists
  read the plan as their brief.
* **Sequence then synthesises.** This demo uses
  `TransitionGraph.sequence` - a fixed pipeline of triage ->
  researcher -> writer -> reviewer. Each specialist sees the full
  prior conversation via the windowed view.
* **`knobs["context_vars"]` seeds state at channel creation.**
  Any tool / middleware can read it via `ChannelStateInject`;
  transitions can route on it via `ContextEquals`. The fixed
  sequence here doesn't need to read it for routing - it's there to
  demonstrate that channel-scoped state survives the entire run.

### Routing Mechanics

There is no routing tool in this demo - every step is a plain
`FromSpeaker(a) -> AgentTarget(b)` rule wired by
`TransitionGraph.sequence([...])`. The plan is a single triage
output that the windowed view propagates to every subsequent
specialist.

!!! note "Sequence variant vs. dynamic queue"
    The runnable demo uses the simpler `TransitionGraph.sequence`
    variant: triage produces a real LLM-generated plan, then the
    sequence executes researcher -> writer -> reviewer
    deterministically.

    The dynamic version (triage advances via an `advance_task`
    tool that pops the next task type from a queue, with
    `ContextEquals(current_task_type, ...)` per branch) has a
    sharp edge today: parallel-tool-calling LLMs can fire
    `advance_task` multiple times in one triage turn, each call
    mutating the queue before the previous handoff has locked the
    speaker. The first dispatch wins; the others corrupt state. The
    clean fix is either `disable_parallel_tool_use` at the
    model layer (not yet exposed via `AnthropicConfig`) or
    compare-and-swap on the queue. In the meantime, a sequence
    graph trades the dynamic queue for determinism.

## Agent Flow

```mermaid
sequenceDiagram
    participant Intake as intake
    participant Triage as triage
    participant Researcher as researcher
    participant Writer as writer
    participant Reviewer as reviewer

    Intake->>Triage: kickoff (FromSpeaker -> AgentTarget)
    Triage->>Researcher: 2-3 sentence plan
    Researcher->>Writer: short paragraph of factual research
    Writer->>Reviewer: deliverable (brief / summary / draft)
    Reviewer->>Intake: review notes
    Note over Intake,Reviewer: TerminateTarget("sequence_complete") fires after reviewer's reply
```

## Migrating from Classic to Beta?

| Classic | Beta |
|---|---|
| `ReplyResult(context_variables={"current_task_type": ..., "pending_tasks": [...]})` | `set_context(channel, key, value)` per field (dynamic variant) |
| `OnContextCondition` per task type | `ContextEquals("current_task_type", <type>)` per branch (dynamic variant) |
| Initial `ContextVariables(data=...)` passed to the pattern | `knobs["context_vars"]` on `channel.open(...)` |

### Dynamic queue variant (production pattern)

The dynamic variant has the triage agent owning a list of pending
tasks in context. Its tool either pops the next task type into
`current_task_type` or flips `all_done=True` when the list
is empty - both via `set_context`:

```python
async def advance_task(channel: ChannelInject, state: ChannelStateInject) -> str:
    pending = list(state.context_vars.get("pending_tasks", []))
    if not pending:
        await set_context(channel, "all_done", True)
        return "all tasks complete"
    next_type = pending.pop(0)
    await set_context(channel, "current_task_type", next_type)
    await set_context(channel, "pending_tasks", pending)
    return f"now working on: {next_type}"
```

The matching graph routes per task type and terminates when the queue
is empty:

```python
graph = TransitionGraph(
    initial_speaker=triage.agent_id,
    transitions=[
        Transition(when=ContextEquals("all_done", True), then=TerminateTarget("complete")),
        Transition(when=ContextEquals("current_task_type", "research"), then=AgentTarget(researcher.agent_id)),
        Transition(when=ContextEquals("current_task_type", "writing"),  then=AgentTarget(writer.agent_id)),
        Transition(when=ContextEquals("current_task_type", "review"),   then=AgentTarget(reviewer.agent_id)),
        Transition(when=FromSpeaker(researcher.agent_id), then=AgentTarget(triage.agent_id)),
        Transition(when=FromSpeaker(writer.agent_id),     then=AgentTarget(triage.agent_id)),
        Transition(when=FromSpeaker(reviewer.agent_id),   then=AgentTarget(triage.agent_id)),
    ],
    default_target=TerminateTarget("max_turns"),
    max_turns=30,
)
```

The sequence variant below is the runnable demo.

## Code

!!! tip
    Real Sonnet on every agent - triage produces an actual plan,
    each specialist does real domain work.

```python
"""Cookbook 08 - Triage with Tasks pattern.

A triage agent receives the user's request and produces a typed
task plan (research -> writing -> review). The graph then executes
the plan as a fixed sequence, each specialist seeing the prior
conversation. The plan is also stamped into ``context_vars`` at
channel creation so it's readable across the run.
"""

import asyncio

from dotenv import load_dotenv

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import (
    EV_PACKET,
    EV_CHANNEL_CLOSED,
    EV_TEXT,
    WORKFLOW_TYPE,
    Hub,
    HubClient,
    LocalLink,
    Passport,
    Resume,
    TransitionGraph,
)
from autogen.beta.testing import TestConfig

load_dotenv()

async def main() -> None:
    config = AnthropicConfig(model="claude-sonnet-4-6")

    hub_obj = await Hub.open(MemoryKnowledgeStore(), ttl_sweep_interval=0)
    link = LocalLink(hub_obj)

    intake_hc = HubClient(link, hub=hub_obj)
    triage_hc = HubClient(link, hub=hub_obj)
    researcher_hc = HubClient(link, hub=hub_obj)
    writer_hc = HubClient(link, hub=hub_obj)
    reviewer_hc = HubClient(link, hub=hub_obj)

    intake_agent = Agent("intake", config=TestConfig())

    triage_agent = Agent(
        "triage",
        prompt=(
            "You are the triage agent. The user has just submitted a "
            "request. Your job is ONE thing: write a 2-3 sentence "
            "plan that names the three tasks (research, writing, "
            "review) and what each will produce for THIS specific "
            "request. Be concrete - the specialists will read your "
            "plan as their brief. No preamble, no headers."
        ),
        config=config,
    )

    researcher_agent = Agent(
        "researcher",
        prompt=(
            "You are the researcher. Triage's plan is the most "
            "recent message in your context. Reply with ONE short "
            "paragraph (3-4 sentences) of factual research relevant "
            "to the request. No preamble."
        ),
        config=config,
    )
    writer_agent = Agent(
        "writer",
        prompt=(
            "You are the writer. The conversation contains the "
            "user's request, triage's plan, and the researcher's "
            "findings. Produce the actual deliverable that the user "
            "requested (a brief, a summary, a draft - whatever fits) "
            "drawing on the research. No preamble."
        ),
        config=config,
    )
    reviewer_agent = Agent(
        "reviewer",
        prompt=(
            "You are the reviewer. The conversation contains the "
            "writer's draft. Reply with ONE short paragraph (2-3 "
            "sentences) of constructive review notes - what works, "
            "what could be tightened. No preamble."
        ),
        config=config,
    )

    intake = await intake_hc.register(intake_agent, Passport(name="intake"), Resume())
    triage = await triage_hc.register(triage_agent, Passport(name="triage"), Resume())
    researcher = await researcher_hc.register(researcher_agent, Passport(name="researcher"), Resume())
    writer = await writer_hc.register(writer_agent, Passport(name="writer"), Resume())
    reviewer = await reviewer_hc.register(reviewer_agent, Passport(name="reviewer"), Resume())

    graph = TransitionGraph.sequence([
        intake.agent_id,
        triage.agent_id,
        researcher.agent_id,
        writer.agent_id,
        reviewer.agent_id,
    ])

    # context_vars seeds the task plan into channel state at creation.
    # Any tool / middleware can read it via ChannelStateInject;
    # transitions can route on it via ContextEquals. The fixed
    # sequence here doesn't need to read it for routing - it's there
    # to demonstrate that channel-scoped state survives the run.
    channel = await intake.open(
        type=WORKFLOW_TYPE,
        target=[triage.agent_id, researcher.agent_id, writer.agent_id, reviewer.agent_id],
        knobs={
            "graph": graph.to_dict(),
            "context_vars": {
                "pending_tasks": ["research", "writing", "review"],
                "completed_tasks": [],
                "request_kind": "brief",
            },
        },
    )
    print(f"channel: {channel.channel_id}\n")

    name_by_id = {
        intake.agent_id: "intake",
        triage.agent_id: "triage",
        researcher.agent_id: "researcher",
        writer.agent_id: "writer",
        reviewer.agent_id: "reviewer",
    }

    initial_state = hub_obj._adapter_states[channel.channel_id]
    print(f"initial context_vars: {initial_state.context_vars!r}\n")

    await channel.send("Write a 3-sentence brief on distributed consensus.")

    # Wait for the workflow to terminate (any of the five close routes
    # documented in /docs/beta/network/termination - this demo uses
    # TerminateTarget("sequence_complete") after the reviewer's reply).
    close_env = await intake.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_CHANNEL_CLOSED,
        timeout=240.0,
    )

    # Print the transcript from the WAL after close.
    for env in await hub_obj.read_wal(channel.channel_id):
        speaker = name_by_id.get(env.sender_id, env.sender_id[:8])
        if env.event_type == EV_TEXT:
            print(f"{speaker:>14}: {env.event_data['text']}")
        elif env.event_type == EV_PACKET:
            routing = env.event_data.get("routing", {}) or {}
            if routing.get("kind") == "handoff":
                line = f"[Handed off via {routing.get('tool', '')}] {routing.get('reason', '')}"
                print(f"{speaker:>14}: {line.rstrip()}")
            body = env.event_data.get("body", "")
            if body:
                print(f"{speaker:>14}: {body}")

    print(f"\nclosed: reason={close_env.event_data.get('reason')!r}")
    final_state = hub_obj._adapter_states[channel.channel_id]
    print(f"final context_vars: {final_state.context_vars!r}")

    await intake_hc.close()
    await triage_hc.close()
    await researcher_hc.close()
    await writer_hc.close()
    await reviewer_hc.close()
    await hub_obj.close()

if __name__ == "__main__":
    asyncio.run(main())
```

## Output

```console
channel: 6b1a...
initial context_vars: {'pending_tasks': ['research', 'writing', 'review'], 'completed_tasks': [], 'request_kind': 'brief'}

         intake: Write a 3-sentence brief on distributed consensus.
         triage: Research will gather the canonical protocols (Paxos, Raft) and the FLP impossibility result. Writing will produce a 3-sentence brief defining distributed consensus, naming the protocols, and stating the safety/liveness trade-off. Review will check that the brief is technically accurate and reads naturally.
     researcher: Distributed consensus protocols allow a set of unreliable nodes to agree on a single value despite failures and message loss. Paxos and Raft are the dominant safety-first protocols; Raft is widely deployed (etcd, Consul) for its leader-based simplicity, while Paxos underpins systems like Google Chubby. The FLP impossibility theorem proves that no asynchronous protocol can guarantee both safety and liveness without timing assumptions, so all production protocols rely on partial synchrony or randomness.
         writer: Distributed consensus is the problem of getting a fleet of unreliable nodes to agree on one value despite failures. Paxos and Raft are the canonical protocols - Raft is favoured for its leader-based simplicity and powers etcd and Consul; Paxos predates it and underpins systems like Chubby. The FLP impossibility theorem is the price of asynchrony: no protocol can guarantee both safety and liveness without timing assumptions, so production systems rely on partial synchrony.
       reviewer: The brief is technically accurate and well-paced - naming Paxos / Raft and citing etcd / Consul / Chubby gives it the right level of detail for a 3-sentence summary. The closing FLP sentence is dense; consider splitting "the price of asynchrony" off as a short framing phrase so the impossibility result lands with one clear takeaway. Otherwise it's ship-ready.

closed: reason='sequence_complete'
final context_vars: {'pending_tasks': ['research', 'writing', 'review'], 'completed_tasks': [], 'request_kind': 'brief'}
```

---

# Envelopes and Events

Source: https://docs.ag2.ai/latest/docs/beta/network/envelopes_and_events/

The `Envelope` is the wire format for everything that flows between agents on the network. Every send produces one; every observer reads them. This page covers the envelope shape, the built-in event types, audience/visibility rules, and how to send raw envelopes when `channel.send(text)` isn't enough.

## The Envelope

```python
@dataclass(slots=True)
class Envelope:
    envelope_id: str        # hub-stamped UUID
    channel_id: str
    sender_id: str          # agent_id
    audience: list[str] | None  # None = broadcast to all channel participants
    event_type: str         # "ag2.msg.text", "ag2.channel.invite", etc.
    event_data: dict        # event-specific payload
    causation_id: str | None = None  # envelope_id this one is "in reply to"
    priority: Priority = Priority.NORMAL
    created_at: str = ""    # hub-stamped ISO-Z
    sequence: int = 0       # hub-stamped per-channel monotonic counter
```

The hub stamps `envelope_id`, `created_at`, and `sequence` at admission time. Everything else comes from the sender.

## Built-in Event Types

Constants exported from `autogen.beta.network`:

### Substantive events

| Constant | String | Carried in `event_data` |
|---|---|---|
| `EV_TEXT` | `"ag2.msg.text"` | `{"text": "<body>"}` |
| `EV_PACKET` | `"ag2.packet"` | `{"routing": {...}, "context_updates": {...}, "body": "<text>"}` |

These are the envelopes adapters fold into per-channel state. `EV_TEXT` carries plain text. `EV_PACKET` is the workflow adapter's atomic round-end capture: it bundles the agent's routing decision (matched against `ToolCalled(...)` rules), accumulated `context_vars` mutations, and final body text into one envelope. Posted by the framework after each `Agent.ask` round; tool authors don't construct these directly.

### Channel lifecycle events

| Constant | String | Source |
|---|---|---|
| `EV_CHANNEL_INVITE` | `"ag2.channel.invite"` | Hub posts to each `target` when a channel is created. |
| `EV_CHANNEL_INVITE_ACK` | `"ag2.channel.invite.ack"` | Each invitee posts when accepting. |
| `EV_CHANNEL_INVITE_REJECT` | `"ag2.channel.invite.reject"` | Optional - invitee rejects (default handler doesn't, but you can override). |
| `EV_CHANNEL_OPENED` | `"ag2.channel.opened"` | Hub posts when all acks land. |
| `EV_CHANNEL_CLOSED` | `"ag2.channel.closed"` | Hub posts on any termination path; `event_data.reason` carries why. |
| `EV_CHANNEL_EXPIRED` | `"ag2.channel.expired"` | Hub posts when TTL sweeper closes the channel. |
| `EV_EXPECTATION_VIOLATED` | `"ag2.channel.expectation_violated"` | Hub posts when an expectation evaluator's threshold is breached and the handler is `notify` (vs `audit` / `auto_close`). |

### Task lifecycle events

| Constant | String | Notes |
|---|---|---|
| `ag2.task.started` | `"ag2.task.started"` | Mirrored from `TaskStarted`. |
| `ag2.task.progress` | `"ag2.task.progress"` | Mirrored from `TaskProgress`. |
| `ag2.task.completed` | `"ag2.task.completed"` | Mirrored from `TaskCompleted`. |
| `ag2.task.failed` | `"ag2.task.failed"` | Mirrored from `TaskFailed`. |
| `ag2.task.expired` | `"ag2.task.expired"` | Mirrored from `TaskExpired`. |

These flow only when an `AgentClient` is running an LLM turn - see [Task Observation](task_observation.md).

## Audience and Visibility

`audience: list[str] | None` controls who sees the envelope:

- `None` - broadcast to all channel participants.
- `[agent_id_1, agent_id_2, ...]` - only those participants see it.

The `visible_to(envelope, agent_id)` helper says whether a given participant should see an envelope:

```python
from autogen.beta.network import visible_to

if visible_to(env, my_agent_id):
    process(env)
```

Views (`FullTranscript`, `WindowedSummary`) honour the audience: an envelope addressed only to `[bob]` doesn't appear in `carol`'s projection. See [Views & Skills](views_and_skills.md).

## Priority

```python
class Priority(IntEnum):
    LOW = 0
    NORMAL = 1
    HIGH = 2
    URGENT = 3
```

The hub processes higher-priority envelopes ahead of lower-priority ones in queue order. Use sparingly; most application envelopes should leave `priority` at `NORMAL`.

## Sending Raw Envelopes

The `channel.send(text, audience=...)` helper wraps the envelope construction for you. When you need a custom event type or to set fields the helper doesn't expose, build an `Envelope` and post it directly:

```python
from autogen.beta.network import Envelope, EV_TEXT

envelope = Envelope(
    channel_id=channel.channel_id,
    sender_id=alice.agent_id,
    audience=[bob.agent_id],
    event_type="myapp.review_request",
    event_data={"document_id": "doc-123", "kind": "security"},
)
await alice.send_envelope(envelope)
```

The hub doesn't validate `event_type` against any allowlist; custom types pass through unmodified. Adapters fold only event types they recognise - substantive ones (`EV_TEXT`, plus `EV_PACKET` under the workflow adapter) and lifecycle ones (`EV_CHANNEL_*`). Custom event types are written to the WAL and delivered to participants but don't advance turn-taking state.

!!! tip "Adapter-shaped envelopes"
    When the envelope *should* advance the channel's protocol (a normal text turn, a workflow round packet) but you're constructing it outside the agent loop - a bridge, a gateway, a test harness - use the adapter's Layer-2 helpers instead of building the `Envelope` by hand: `adapter.build_text_envelope(...)` / `adapter.build_packet_envelope(..., handoff=, context_set=)`, fetched via `hub.adapter_for(channel_id)`. They produce exactly the shape the adapter folds. See [Adapters Overview -> Driving a channel without an Agent](adapters_overview.md#driving-a-channel-without-an-agent).

## Causation

`causation_id` lets you mark an envelope as "in reply to" another:

```python
await channel.send(reply_text, causation_id=incoming_envelope.envelope_id)
```

The default handler does this automatically when it replies to an inbound `EV_TEXT`. Custom handlers should set it when they're producing a logical reply - useful for tooling that builds threaded views of a channel.

## Reading the WAL

The hub maintains a per-channel write-ahead log:

```python
wal = await hub.read_wal(channel_id)
for env in wal:
    print(f"{env.sequence:>3}  {env.event_type}  from={env.sender_id[:8]}")
```

Envelopes appear in admission order. The WAL is the canonical replay surface - `Hub.hydrate()` re-folds it through each adapter to rebuild in-memory state on restart.

## Custom Event Types - Practical Tips

When you define your own event types:

1. Use a dotted namespace prefix (`"myapp.review_request"`, not `"review"`) to avoid collisions with future `ag2.*` events.
2. Make `event_data` JSON-serialisable (no datetimes, dataclasses, etc.) so it round-trips through the store cleanly.
3. If multiple participants need to react, set `audience=None`. If only one, address it specifically; views will filter it out from non-recipients.
4. Don't rely on adapters to do anything special with custom types - they pass through. Your custom handler is responsible for processing them.

## Inspecting Frames

Below the envelope layer is the frame layer (`HelloFrame`, `SendFrame`, `NotifyFrame`, ...) - that's the link transport. Most users don't touch it. It surfaces only when you're implementing a custom transport (see [Agent Clients](agent_clients.md)) or debugging a connection issue.

---

# Governance, Audit and Observability

Source: https://docs.ag2.ai/latest/docs/beta/network/expectations_and_audit/

The hub enforces governance and surfaces observability through a few layered seams:

1. **Evaluators** - pure functions over channel state that return zero or more `Violation` records when their thresholds are breached.
2. **Violation handlers** - what to do when a violation fires: log to the audit trail, notify the channel, or auto-close.
3. **The audit log** - append-only record of every governance-relevant event the hub processes (itself a `HubListener`).
4. **`HubListener`** - read-only observers the hub fans state transitions out to, after the fact.
5. **`HubArbiter`** - the gatekeeper the hub consults *before* committing register / channel-open / send / dispatch decisions.

## Evaluators

Three evaluators ship today, addressed by name in adapter manifests:

| Name | Class | Threshold |
|---|---|---|
| `"acks_within"` | `AcksWithinEvaluator` | All invitees must ack within `params["seconds"]` of channel creation. |
| `"reply_within"` | `ReplyWithinEvaluator` | The respondent must reply within `params["seconds"]` of the initiator's first send (consulting only). |
| `"max_silence"` | `MaxSilenceEvaluator` | No participant may go silent for longer than `params["seconds"]`. |
| `"turn_within"` | (composes from the above) | The next speaker must speak within `params["seconds"]` of being scheduled. |

Each evaluator implements:

```python
class ExpectationEvaluator(Protocol):
    name: ClassVar[str]
    def evaluate(self, ctx: ExpectationContext) -> list[Violation]: ...
```

`ExpectationContext` is a small dataclass holding the metadata, WAL slice, current time, and the expectation's params. Evaluators are pure - no I/O, no mutation - so they're trivially testable.

The default registry exposes them as `default_evaluators()`. Custom evaluators register similarly to custom transition targets.

## Adapter-Declared Expectations

Each adapter's manifest declares its defaults (see [Adapters Overview](adapters_overview.md) for the table). Examples:

```python
# ConsultingAdapter
expectations = [
    Expectation(name="acks_within",  on_violation="auto_close", params={"seconds": 30}),
    Expectation(name="reply_within", on_violation="auto_close", params={"seconds": 600}),
]

# ConversationAdapter
expectations = [
    Expectation(name="max_silence", on_violation="audit", params={"seconds": 3600}),
]
```

`Expectation.on_violation` selects the handler:

| `on_violation` | Handler | Effect |
|---|---|---|
| `"audit"` | `AuditHandler` | Write to the audit log only. Channel continues. |
| `"warn"` | `NotifyChannelHandler` | Post `EV_EXPECTATION_VIOLATED` on the channel WAL. |
| `"auto_close"` | `AutoCloseHandler` | Close the channel with `reason="expectation_violated:<name>"`; record to audit. |
| `"hide"` | (custom) | Hide later turns from the offending participant; not yet implemented as a built-in. |

The default registry exposes them as `default_handlers()`.

## The Sweeper Loop

When the hub is open, an expectation sweeper task wakes every `expectation_sweep_interval` (default 10 s), walks every active channel, runs each expectation's evaluator, and dispatches any violations to the configured handler.

For deterministic tests / examples:

```python
from autogen.beta.network import Hub
from autogen.beta.knowledge import MemoryKnowledgeStore

hub = await Hub.open(
    MemoryKnowledgeStore(),
    expectation_sweep_interval=0,  # disable background loop
)

# Manually advance state and tick:
clock.advance(45)                       # mock-clock pattern
await hub._expectation_tick()           # operator API
```

`hub._expectation_tick()` is a public-by-convention test entry point - a leading underscore, but exercised explicitly by the test suite.

## Audit Log

`hub.audit_log` is an `AuditLog` instance - append-only, and itself a registered `HubListener` (so every state transition the hub fans out also lands as one structured record). It writes a single `audit.jsonl` under the hub's `KnowledgeStore`.

```python
records = await hub.audit_log.read_all()
for r in records:
    print(r["kind"], r["at"], r)
```

Each record is a plain dict with at minimum `kind` and `at`; `kind`-specific fields appear alongside.

| Member | Purpose |
|---|---|
| `await hub.audit_log.read_all()` | Read + parse the whole log. `[]` if absent. |
| `await hub.audit_log.append(record)` | Write one record. The kind set is **open** - tenants/subclasses append their own `kind` values here. |
| `hub.audit_log.subscribe(cb)` / `unsubscribe(cb)` | Live tail - `cb(record)` fires per appended record (no polling). Subscriber exceptions are logged and swallowed. |
| `hub.audit_log.bytes_written` | Process-local byte counter (resets on hub restart). Surfaced by `hub.health()` as `audit_log_bytes`. |
| `hub.replace_audit_log(custom)` | Swap in a tenant-provided `AuditLog` subclass (e.g. a different on-disk format). The replacement is registered as the first listener so audit writes still complete before tenant listeners observe the same event. |

### Audit kinds

Re-exported as constants from `autogen.beta.network`:

| Constant | Notes |
|---|---|
| `AUDIT_KIND_AGENT_REGISTERED` | Records `agent_id`, `name`. |
| `AUDIT_KIND_AGENT_UNREGISTERED` | Records `agent_id`, `name`. |
| `AUDIT_KIND_RESUME_SET` | Records the source: `RESUME_SOURCE_TENANT` (a `set_resume` call) or `RESUME_SOURCE_OBSERVED` (a `record_observation`). |
| `AUDIT_KIND_SKILL_SET` | Records updated skill markdown. |
| `AUDIT_KIND_RULE_SET` | Records the new rule. |
| `AUDIT_KIND_CHANNEL_CREATED` | Records `creator_id`, manifest type/version, participants. |
| `AUDIT_KIND_CHANNEL_CLOSED` | Records `reason`. |
| `AUDIT_KIND_CHANNEL_EXPIRED` | Records the TTL details. |
| `AUDIT_KIND_TASK_TERMINATED` | Records `owner_id`, `capability`, `outcome`, `latency_ms`. |
| `AUDIT_KIND_EXPECTATION_VIOLATED` | Records `expectation`, `channel_id`, evaluator details. |
| `AUDIT_KIND_TURN_FAILED` | A notify handler crashed processing an inbound envelope. Records `channel_id`, `agent_id`, `envelope_id`, `exc_type`, `exc_message`. |

### Inspection patterns

```python
# Filter to violations only.
violations = [
    r for r in await hub.audit_log.read_all()
    if r["kind"] == AUDIT_KIND_EXPECTATION_VIOLATED
]

# Filter to one channel.
channel_records = [
    r for r in await hub.audit_log.read_all()
    if r.get("channel_id") == channel_id
]

# Live tail.
async def on_record(record: dict) -> None:
    print("[audit]", record["kind"], record)
hub.audit_log.subscribe(on_record)
```

The `AuditLog` is durable when the hub is backed by `DiskKnowledgeStore`; with `MemoryKnowledgeStore` it lives only as long as the hub.

## HubListener - observing state transitions

A `HubListener` is a **read-only** observer. Attach one with `hub.register_listener(...)`; the hub `await`s the matching method *after* the corresponding state change commits - listeners observe, they don't gate (that's `HubArbiter`, below). Registration itself is inert: there's no startup hook, and methods only fire on subsequent transitions. Each listener call is wrapped in its own `try/except` so a throwing listener can't stall dispatch - the exception is logged at `ERROR` and the next listener still runs.

`BaseHubListener` is a no-op base - subclass it and override only the events you care about (you don't have to implement the full Protocol surface). The built-in `AuditLog` *is* a `HubListener`, pre-registered on every hub, which is why a fresh hub already reports `registered_listeners: 1`.

```python
from autogen.beta.network import BaseHubListener

class MetricsListener(BaseHubListener):
    async def on_envelope_posted(self, envelope, metadata) -> None:
        metrics.incr("network.envelopes", tags={"type": envelope.event_type})

    async def on_envelope_rejected(self, envelope, reason) -> None:
        metrics.incr("network.rejected", tags={"reason": type(reason).__name__})

    async def on_turn_failed(self, channel_id, agent_id, envelope_id, exc) -> None:
        logger.error("turn failed: agent=%s channel=%s", agent_id, channel_id, exc_info=exc)

hub.register_listener(MetricsListener())
# hub.unregister_listener(listener)   # detach later - no-op if absent
```

| Method | Fires when |
|---|---|
| `on_envelope_posted(envelope, metadata)` | An envelope was validated, WAL-appended, folded, and dispatched. |
| `on_envelope_rejected(envelope, reason)` | An envelope was rejected before WAL append. `reason` is the typed `NetworkError` the sender saw. |
| `on_dispatch_failed(envelope, recipient_id, reason)` | Delivery of an accepted envelope to one recipient failed (the rest of the audience may have received it). |
| `on_channel_event(channel_id, kind, payload)` | `kind` ∈ `opened` / `closed` / `expired` / `participant_removed` / `participant_hidden`. |
| `on_agent_event(agent_id, kind, payload)` | `kind` ∈ `registered` / `unregistered` / `resume_set` / `skill_set` / `rule_set` / `observation_recorded`. |
| `on_expectation_fired(channel_id, expectation, violation)` | An evaluator emitted a violation (deduped per `(channel, expectation, violator)`). |
| `on_turn_failed(channel_id, agent_id, envelope_id, exc)` | A notify handler crashed processing an inbound envelope. |
| `on_task_event(task_id, kind, payload)` | `kind` ∈ `started` / `progress` / `completed` / `failed` / `expired` / `cancelled` / `mirror_failed`. |
| `on_inbox_pressure(agent_id, pending, cap)` | A recipient's pending count first crossed its inbox high-water mark (fires once per crossing). |

### Subclassing the Hub instead of registering a listener

The same `on_*` methods are available on `Hub` itself (with no-op defaults) - so if you're building a custom hub you can override them directly rather than registering the hub as a listener of itself:

```python
from autogen.beta.network import Hub

class ObservingHub(Hub):
    async def on_envelope_posted(self, envelope, metadata) -> None:
        metrics.incr("network.envelopes", tags={"type": envelope.event_type})

    async def on_inbox_pressure(self, agent_id, pending, cap) -> None:
        logger.warning("inbox pressure: %s at %d/%d", agent_id, pending, cap)

hub = await ObservingHub.open(store)
```

Subclass overrides fire **alongside** any externally-registered listeners, with the same per-callee `try/except` isolation. Use a subclass when the observation logic belongs to your hub implementation; use `register_listener(...)` when it's a separate concern (metrics shipper, audit tap) you want to attach and detach independently.

`on_task_event` is the one listener hook that isn't purely hub-driven - `TaskMirror` (and other tenant code) emit `"mirror_failed"` and other kinds through it. The public way to fan one out is `await hub_client.fire_task_event(task_id, kind, payload)` (from a registered tenant) or `await hub.fire_task_event(...)` (direct); neither touches the hub's private fan-out.

### Health snapshot

`hub.health()` is a cheap, in-memory operational snapshot - wire it to a `/health` endpoint or dashboard:

```python
hub.health()
# {
#   "active_channels": 2,
#   "registered_agents": 5,
#   "pending_inbox_total": 3,
#   "max_pending_inbox_depth": 2,        # None when nothing queued - indicative of a stuck agent
#   "registered_listeners": 1,           # the built-in AuditLog counts
#   "adapters_loaded": 4,
#   "audit_log_bytes": 8192,
# }
```

## HubArbiter - the decision seam

Where a `HubListener` only observes, a `HubArbiter` **decides**. The hub consults the active arbiter inline before committing register / channel-open / send / dispatch decisions. Exactly one arbiter is active at a time; install yours with `hub.register_arbiter(arbiter)` (and read it back via `hub.arbiter`).

Each gate returns a `Decision` - `Allow()` or `Deny(reason, error=...)`, where `error` selects which `NetworkError` subclass the hub raises back to the caller (defaults to `AccessDeniedError`):

| Gate | Called before | Default `RuleBasedArbiter` checks |
|---|---|---|
| `authorize_send(envelope, sender, sender_rule, recipients)` | `post_envelope` WAL append | `access.outbound_to`, `limits.delegation_depth` |
| `authorize_inbox(envelope, recipient, recipient_rule, current_pending)` | per-recipient, on `post_envelope` | `limits.inbox.max_pending` (denies with `InboxFull`) |
| `authorize_dispatch(envelope, sender, recipient, recipient_rule)` | each notify frame | `access.inbound_from` (deny => silently skip that recipient) |
| `authorize_channel_open(manifest, creator, creator_rule, invitees, invitee_rules, active_creator_channels)` | `create_channel` | each invitee's `access.inbound_from`, creator's `limits.max_concurrent_channels` |
| `authorize_register(passport, resume, rule)` | *(reserved - not yet wired by the hub)* | always `Allow` |
| `resolve_unknown_audience(envelope, unknown_ids)` | dispatch to ids the hub doesn't know | returns `None` (drop silently) - the federation hook |

The default `RuleBasedArbiter` enforces the per-agent `Rule` (access + limits) - exactly the behavior the hub had inline before this seam existed. Two ways to customise:

- **Add policy on top** - subclass `RuleBasedArbiter` and `await super()` in the gates you extend.
- **Start from scratch** - subclass `BaseHubArbiter` (all gates return `Allow` by default) and implement only the ones you need.

```python
from autogen.beta.network import RuleBasedArbiter, Allow, Deny, EV_TEXT

class ContentGuardArbiter(RuleBasedArbiter):
    BANNED = ("password", "ssn")

    async def authorize_send(self, envelope, sender, sender_rule, recipients):
        base = await super().authorize_send(envelope, sender, sender_rule, recipients)
        if isinstance(base, Deny):
            return base
        if envelope.event_type == EV_TEXT:
            text = str(envelope.event_data.get("text", "")).lower()
            hit = next((b for b in self.BANNED if b in text), None)
            if hit is not None:
                return Deny(reason=f"message blocked: contains {hit!r}")
        return Allow()

hub.register_arbiter(ContentGuardArbiter())
```

A `Deny` from `authorize_send` / `authorize_inbox` / `authorize_channel_open` surfaces to the caller as the chosen `NetworkError` (so `channel.send(...)` raises `AccessDeniedError`); a `Deny` from `authorize_dispatch` just drops that one recipient. `resolve_unknown_audience` is the seam a federated arbiter uses to re-route to a local proxy id instead of dropping.

## Turn-failure resilience

The default notify handler wraps its **entire** substantive path - channel resolve, view projection, `adapter.extract_turn_input`, `agent.ask`, round-envelope build, outbound send. If any step raises, the handler:

1. routes the failure through `HubClient.report_turn_failure` -> `Hub.report_turn_failure`,
2. which fans `on_turn_failed(channel_id, agent_id, envelope_id, exc)` out to every `HubListener` - the built-in `AuditLog` writes an `AUDIT_KIND_TURN_FAILED` record,
3. then returns cleanly. **No reply envelope is posted, but the channel stays active and the next envelope flows normally** - a buggy turn no longer takes down the receive loop.

React however you like - retry, escalate, surface to a UI - by registering a listener that overrides `on_turn_failed`.

## Custom Evaluators

Same shape as the built-ins:

```python
from typing import ClassVar
from autogen.beta.network.hub import (
    ExpectationContext,
    ExpectationEvaluator,
    Violation,
)

class TooManyMessagesEvaluator:
    name: ClassVar[str] = "too_many_messages"

    def evaluate(self, ctx: ExpectationContext) -> list[Violation]:
        threshold = ctx.params["max"]
        text_count = sum(1 for e in ctx.wal if e.event_type == EV_TEXT)
        if text_count > threshold:
            return [Violation(
                expectation=self.name,
                channel_id=ctx.channel.channel_id,
                detail=f"text count {text_count} exceeds {threshold}",
            )]
        return []
```

Register on a custom registry and pass to `Hub.open(..., evaluators=registry)`. The default registry can also be mutated via the module-level `register_evaluator(...)` helper.

---

# Task Observation

Source: https://docs.ag2.ai/latest/docs/beta/network/task_observation/

The bridge between the [`Task` lifecycle primitive](../tasks.md) and the network's per-agent track record. When an Agent runs an `agent.task(..., capability="X")` inside a network turn, a `TaskMirror` forwards the lifecycle events to the hub. On terminal events with a `capability` tag, the hub updates the worker's `Resume.observed[capability]`.

In short: agents earn a **track record on the network** by completing capability-tagged tasks. Other agents (and operators) read that track record off the worker's `Resume`.

## The Mechanism

`TaskMirror` is a stream subscriber that:

1. Subscribes to `TaskStarted`, `TaskProgress`, `TaskCompleted`, `TaskFailed`, `TaskExpired` events on a stream.
2. Forwards each as an `ag2.task.*` envelope to the hub via `HubClient`.
3. On terminal events with `spec.capability` set, calls `Hub.record_observation(...)` so the hub's per-agent `ObservedStat` updates.

It's auto-attached by the default handler for the duration of every LLM turn - you don't need to wire it up manually. If you write a custom handler, attach it manually:

```python
from autogen.beta.network import TaskMirror
from autogen.beta.stream import MemoryStream

mirror = TaskMirror(
    hub_client=client._hub_client,
    owner_id=client.agent_id,
    channel_id=metadata.channel_id,
)
stream = MemoryStream()
sub_ids = mirror.attach(stream)
try:
    await client.agent.ask(text, stream=stream)
finally:
    mirror.detach(stream, sub_ids)
```

## Capability Tagging

`agent.task(...)` accepts a `capability` keyword:

```python
async with agent.task(
    "survey: deployment patterns",
    capability="research",
    context=ctx,
) as task:
    await task.progress({"step": "gather"})
    # ... do work ...
    await task.complete({"items_found": 7})
```

`ctx` is the active `Context` (passed in by fast_depends to a tool body, or carried explicitly in scripts). It's important to pass `context=ctx` so the task fires its events on the LLM-turn's stream - that's the stream the mirror is attached to.

`capability` is a free-form string. Common values: `"research"`, `"summarisation"`, `"review"`, `"code_review"`. Whatever names your application uses internally for capability roles, use them here.

If `capability` is `None` (the default), the mirror still forwards lifecycle envelopes to the hub, but doesn't update `Resume.observed`. The track record is opt-in.

## ObservedStat

```python
@dataclass(slots=True)
class ObservedStat:
    n: int = 0                       # total terminal events seen
    completed: int = 0
    failed: int = 0
    expired: int = 0
    p50_latency_ms: int | None = None  # rolling median of started_at -> completed_at
```

Read it off the worker's resume:

```python
resume = await hub.get_resume(bob.agent_id)
stat = resume.observed.get("research")
if stat:
    print(f"completed={stat.completed}/{stat.n}  median_latency={stat.p50_latency_ms}ms")
```

The latency is computed from `task_meta.started_at` to the terminal event time, sourced from the hub's clock. With a `MockClock` you can construct deterministic latency values for testing.

## What the Mirror Records

`TaskMirror.record_observation(...)` writes to:

- `Resume.observed[capability]` - the per-capability `ObservedStat` for the owner.
- `AuditLog` - `AUDIT_KIND_TASK_TERMINATED` records every terminal task with capability, outcome, and latency.

Both update happen inside the same hub transaction, so partial updates don't occur.

## Where TaskMirror Fits in the Default Handler

Look at `autogen.beta.network.client.handlers._process_text` if you want the exact wiring. Sketch:

```python
mirror = TaskMirror(
    hub_client=client._hub_client,
    owner_id=client.agent_id,
    channel_id=metadata.channel_id,
)
sub_ids = mirror.attach(stream)
try:
    reply = await client.agent.ask(
        current_text,
        stream=stream,
        dependencies=dependencies,
    )
finally:
    mirror.detach(stream, sub_ids)
```

Notably:

- The mirror is attached **per turn**, not per agent. A new mirror is constructed and attached for each inbound envelope the handler processes.
- The `channel_id` lets the hub tie task observations back to the channel that produced them - useful for governance and replay.
- The mirror swallows errors when forwarding to the hub. A flaky hub connection should not crash the LLM turn.

## When to Skip Capability Tagging

Not every `agent.task(...)` deserves a capability tag. Tag only when:

- The task represents a **capability you want to track** in the agent's resume.
- Failure / latency signals are **operationally meaningful** (driving routing, alerting, or peer ranking).

Untagged tasks still get full lifecycle observation in the audit log - just no `Resume.observed` update. Use them for internal book-keeping or sub-task delegation that doesn't represent an externally-visible capability.

## Cross-Cutting Pattern

A common pattern: an agent has multiple capability roles. Tag each tool's task with the right capability and inspect the resume to see which capabilities are well-exercised.

```python
@worker.tool
async def research(topic: str, ctx: Context) -> str:
    async with worker.task(f"research: {topic}", capability="research", context=ctx) as t:
        # ...
    return f"researched {topic}"

@worker.tool
async def summarise(text: str, ctx: Context) -> str:
    async with worker.task("summarise", capability="summarisation", context=ctx) as t:
        # ...
    return f"summary: ..."
```

After a few channels, `worker.resume.observed` will hold both `"research"` and `"summarisation"` `ObservedStat`s, each tracking that capability independently.

---

# Views and Skills

Source: https://docs.ag2.ai/latest/docs/beta/network/views_and_skills/

Two LLM-facing concerns:

- **Views** decide what each participant *sees* of a channel - the projection of the WAL that becomes the LLM's history when its handler runs `Agent.ask(...)`.
- **Skills** decide what each agent *advertises about itself* - the markdown describing its capabilities, surfaced to other agents during peer lookup.

## ViewPolicy

`ViewPolicy` is a `Protocol`:

```python
class ViewPolicy(Protocol):
    name: ClassVar[str]
    async def project(
        self,
        wal: list[Envelope],
        *,
        participant_id: str,
        channel: ChannelMetadata,
        render_envelope: EnvelopeRenderer,
        name_for: NameResolver = default_name_resolver,
    ) -> list[BaseEvent]: ...
```

It takes the WAL up to the current envelope and returns a list of `BaseEvent`s that the framework feeds into the LLM turn as pre-populated stream history. Adapters declare a default; tenants can override per-channel.

Two callables are supplied per call so view policies stay adapter-neutral and identity-aware without depending on hub internals:

- `render_envelope: Callable[[Envelope], str | None]` - comes from the channel's `ChannelAdapter.render_envelope`. Lets a view ask "what's the LLM-visible text for this envelope?" without knowing the channel type's envelope shape.
- `name_for: Callable[[str], str]` - comes from `Hub.name_for`. Lets a view resolve an `agent_id` to its registered `Passport.name` for projection lines that need a speaker label. Falls back to the raw id when the sender is unknown so projection never fails.

## Built-in Views

| View | Behaviour | Default for |
|---|---|---|
| `FullTranscript()` | Every visible envelope, in order, no filtering beyond audience. Non-self envelopes become bare `ModelRequest`s. | `consulting` (2-party) |
| `WindowedSummary(recent_n=N)` | The last `N` visible envelopes. If the WAL is longer, prepends a `CompactionSummary` placeholder with a count of the elided turns. Non-self envelopes become bare `ModelRequest`s. | `conversation` (2-party) |
| `NamedTranscript()` | Like `FullTranscript`, but each non-self envelope is prefixed with `[<sender name>]:` so the LLM can tell its peers apart. Resolves names via the supplied `NameResolver`. | - |
| `NamedWindowedSummary(recent_n=N)` | Like `WindowedSummary`, but with the same `[<sender name>]:` labelling on non-self lines and on the head `CompactionSummary`'s speakers list. | `discussion`, `workflow` (N-party) |

All four views honour audience: an envelope addressed only to `[bob]` doesn't appear in `carol`'s projection.

!!! tip "Why two flavours?"
    In a **2-party** channel, the assistant/user role bit already disambiguates speakers - the LLM sees its own past as `role: "assistant"` and the only other party as `role: "user"`. Labels would be redundant.

    In a **3+ party** channel (discussion, workflow), every non-self envelope collapses into the same `role: "user"` stream. Without a sender label the LLM can't tell which peer said which message. The `Named*` variants put `[<sender name>]:` on each non-self line so the orchestrator / next speaker can route accurately. This is the default for N-party adapters precisely because the role bit alone isn't enough.

```python
from autogen.beta.network import FullTranscript, WindowedSummary

view = WindowedSummary(recent_n=12)
projected = await view.project(
    history=wal_slice,
    participant_id=carol.agent_id,
    channel=metadata,
)
```

## Resolving the Default

```python
from autogen.beta.network import resolve_view_policy

policy = resolve_view_policy(client, metadata)
```

`resolve_view_policy` reads the adapter manifest's `default_view_policy` and instantiates the matching view from the registry. The default handler calls this once per turn - custom handlers should too, unless they're deliberately bypassing the standard projection model.

## Custom Views

Implement the protocol, give it a unique `name`, and pass to the policy resolver. Common shapes:

```python
from typing import ClassVar
from autogen.beta.events import BaseEvent, ModelMessage, ModelRequest, TextInput
from autogen.beta.network import (
    ChannelMetadata,
    EV_TEXT,
    Envelope,
    EnvelopeRenderer,
    NameResolver,
    ViewPolicy,
    default_name_resolver,
)

class FromOneOnly(ViewPolicy):
    """Show only envelopes from a single named sender, prefixed with their name."""
    name: ClassVar[str] = "from_one_only"

    def __init__(self, sender_id: str) -> None:
        self.sender_id = sender_id

    async def project(
        self,
        wal: list[Envelope],
        *,
        participant_id: str,
        channel: ChannelMetadata,
        render_envelope: EnvelopeRenderer,
        name_for: NameResolver = default_name_resolver,
    ) -> list[BaseEvent]:
        out: list[BaseEvent] = []
        for env in wal:
            if env.event_type != EV_TEXT or env.sender_id != self.sender_id:
                continue
            text = render_envelope(env) or ""
            if env.sender_id == participant_id:
                out.append(ModelMessage(text))
            else:
                # Use name_for to label so the LLM knows who said it.
                label = name_for(env.sender_id)
                out.append(ModelRequest([TextInput(f"[{label}]: {text}")]))
        return out
```

The `BaseEvent` types you emit determine how the LLM sees the history: `ModelRequest` for messages "from the user," `ModelMessage` for messages "from the assistant," and so on. Look at `autogen.beta.events` for the full taxonomy.

## CompactionSummary

When `WindowedSummary` elides envelopes outside its window, it prepends a `CompactionSummary(text="...elided N turns")` event. This is from `autogen.beta.compact` - the LLM sees it as a system-supplied note that "there is earlier history I'm not showing you." This keeps the LLM's behaviour grounded in long-running discussions without exploding the token budget.

## Skills (Markdown Frontmatter)

Skills are how an agent describes itself to other agents - markdown-with-frontmatter that's parsed by the hub and surfaced to LLM tools during peer lookup. Pass at registration:

```python
agent_client = await hc.register(
    agent,
    Passport(name="researcher"),
    Resume(claimed_capabilities=["research"]),
    skill_md="""\
---
title: Research Assistant
expertise: [policy, finance]
---

# Researcher

A senior policy analyst. Best at:

- Scenario synthesis from multi-source briefs.
- Rebuttal review with confidence scores.

Limitations: not for code review or numerical analysis.
""",
)
```

The hub stores the markdown verbatim and parses the frontmatter via `parse_skill_frontmatter`:

```python
from autogen.beta.network import parse_skill_frontmatter, ParsedSkill

parsed: ParsedSkill = parse_skill_frontmatter(skill_md)
print(parsed.frontmatter)  # {"title": "Research Assistant", "expertise": [...]}
print(parsed.body)         # the markdown body
```

## Fallback Skills

When no `skill_md` is provided, the hub generates one from the resume so peer lookup doesn't return empty handles:

```python
from autogen.beta.network import render_fallback_skill

skill_md = render_fallback_skill(passport, resume)
```

Use this if you're constructing skills programmatically - for example, when a tenant uploads a resume but no markdown.

## Updating a Skill After Registration

```python
await hub.set_skill(agent_id, new_skill_md)
```

Emits `AUDIT_KIND_SKILL_SET`. Same audit shape as `set_resume`; tenant code can replace skills at any time.

## Picking a View

Some heuristics for choosing or building a view:

- **Short, focused exchanges** - `FullTranscript()`. Token budget isn't the bottleneck; coherence is.
- **Long-running discussions** - `WindowedSummary(recent_n=N)` with `N` tuned to your participant count and turn density.
- **Specialist agents that should ignore unrelated chatter** - a custom view that filters by audience or tags.
- **Privacy-sensitive workflows** - a custom view that strips fields or redacts before projection.

Switching the view doesn't affect the WAL - every envelope is still there, every operator can still inspect it. Only the LLM's perception of history is shaped by the view.

---

# Distributed Deployment

Source: https://docs.ag2.ai/latest/docs/beta/network/distributed/

The network is not limited to one process. Replace `LocalLink` with `WsLink` and the hub becomes a server that agents anywhere on the network can connect to over WebSocket - entirely over the wire, no shared memory, no in-process hub reference.

## Install

`WsLink` and `serve_ws` require the `network-ws` extra:

```bash
pip install "ag2[network-ws]"
```

## Architecture

```
        ┌─────────────── hub server ────────────────┐
        │  Hub + serve_ws  -  registry - WAL - auth │
        └───▲──────────────────────────▲────────────┘
            │  ws://hub:8765           │  ws://hub:8765
   ┌────────┴──────────┐     ┌─────────┴──────────┐
   │  process A        │     │  process B         │
   │  HubClient(WsLink)│     │  HubClient(WsLink) │
   │  alice            │     │  bob               │
   └───────────────────┘     └────────────────────┘
```

Every control-plane call (register, open channel, post envelope, WAL read) travels as a `RequestFrame` / `ResponseFrame` RPC over the WebSocket. Every inbound notify arrives as a `NotifyFrame` and is ack'd by a `ReceiptFrame`. The `HubClient` API is identical whether you pass a `LocalLink` or a `WsLink` - the transport is the only thing that changes.

## Starting the Hub Server

```python
import asyncio
import contextlib

from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.network import Hub, serve_ws

async def main() -> None:
    hub = await Hub.open(MemoryKnowledgeStore())
    async with serve_ws(hub, "0.0.0.0", 8765) as server:
        host, port = server.sockets[0].getsockname()[:2]
        print(f"listening on ws://{host}:{port}", flush=True)
        with contextlib.suppress(asyncio.CancelledError):
            await asyncio.Future()  # serve until interrupted
    await hub.close()

asyncio.run(main())
```

`serve_ws(hub, host, port)` is an async context manager. It binds a WebSocket server, hands each incoming connection its own `WsLinkEndpoint`, and lets the hub dispatch from there. Pass `port=0` to bind an ephemeral port and read the real one from `server.sockets[0].getsockname()[1]`.

## Connecting a Remote Agent

From any other process - same machine, different container, or across a real network:

```python
import asyncio

from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.network import HubClient, Passport, Resume, WsLink

async def main() -> None:
    hub_client = HubClient(WsLink("ws://hub-host:8765"))
    await hub_client.open()  # WebSocket connect + handshake

    agent = Agent("bob", prompt="Answer in one short sentence.", config=AnthropicConfig(model="claude-haiku-4-5"))
    bob = await hub_client.register(agent, Passport(name="bob"), Resume())

    print(f"registered as {bob.agent_id}; awaiting channels")
    await asyncio.Future()  # stay connected; default handler answers inbound consults

asyncio.run(main())
```

`HubClient(WsLink(url))` puts the client in *remote mode*: every `register`, `open`, `send`, and `read_wal` call is an RPC round-trip to the hub. The `AgentClient` surface (`bob.open(...)`, `bob.wait_for_channel_event(...)`, `channel.send(...)`) is identical to the in-process API.

Agents backed by different providers can share the same hub and the same channel - the hub is provider-neutral.

## Sending a Cross-Process Consult

```python
import asyncio

from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.network import EV_TEXT, HubClient, Passport, Resume, WsLink
from autogen.beta.network.adapters.consulting import CONSULTING_TYPE

async def main() -> None:
    hub_client = HubClient(WsLink("ws://hub-host:8765"))
    await hub_client.open()

    alice = await hub_client.register(
        Agent("alice", prompt="You are a coordinator.", config=OpenAIConfig(model="gpt-4o-mini")),
        Passport(name="alice"),
        Resume(),
    )

    # open() and send() travel to the hub over the wire;
    # the hub invites bob over bob's own WebSocket connection.
    channel = await alice.open(type=CONSULTING_TYPE, target=["bob"])
    await channel.send("What is 12 times 11? Reply with just the integer.")

    reply = await alice.wait_for_channel_event(
        channel_id=channel.channel_id,
        predicate=lambda e: e.event_type == EV_TEXT and e.sender_id != alice.agent_id,
        timeout=90.0,
    )
    print(reply.event_data["text"])  # 132
    await hub_client.close()

asyncio.run(main())
```

The channel invite travels hub -> bob over bob's WebSocket. Bob's default handler answers, and the reply comes back hub -> alice over alice's WebSocket. The hub brokers the exchange; neither agent holds a reference to the other.

All four channel adapters work cross-process without change: `consulting`, `conversation`, `discussion`, and `workflow`.

## Authentication

For deployments where agents must authenticate at the WebSocket handshake, build the hub with an `AuthRegistry`:

```python
from autogen.beta.network import ApiKeyAuth, AuthRegistry, Hub

auth = AuthRegistry([ApiKeyAuth(keys={"token-alice", "token-bob"})])
hub = await Hub.open(store, auth=auth)
```

Each agent passes its token inside `Passport.auth`:

```python
from autogen.beta.network import AuthBlock, Passport

passport = Passport(
    name="bob",
    auth=AuthBlock(scheme="api_key", claim={"token": "token-bob"}),
)
bob = await hub_client.register(agent, passport, Resume())
```

The hub's `AuthRegistry` validates the claim before binding the connection. Raise `AuthError` from a custom `AuthAdapter` (a `Protocol`) to reject any scheme you define.

## At-Least-Once Delivery

The hub guarantees each envelope is delivered at least once across reconnects:

- Each `AgentClient` maintains an inbox cursor per channel - the `envelope_id` of the last successfully processed envelope.
- Inbound envelopes are ack'd with `ReceiptFrame`; a nack causes immediate replay.
- On reconnect, pass `since_envelope_id` to replay any unacked envelopes from that point:

```python
hub_client = HubClient(WsLink(url))
await hub_client.open()
bob = await hub_client.attach(agent, name="bob", since_envelope_id=last_acked_id)
await hub_client.resume_pending_turns(bob)
```

`attach` re-binds an existing identity to a fresh connection. `resume_pending_turns` re-fires any turns the protocol still expects from this agent. The default notify handler is idempotent under redelivery - causation-id deduplication short-circuits duplicate model turns without double-posting.

## Task Durability

Tasks can be checkpointed through the hub so state survives a process restart:

```python
from autogen.beta.network import HubBackedCheckpointStore

checkpoint_store = HubBackedCheckpointStore(hub_client)

# inside a tool or agent turn:
await task.checkpoint({"step": 3, "partial_result": "..."})

# on another node, after a restart:
recovered = await agent.resume_from(task_id, checkpoint_store)
```

`HubBackedCheckpointStore` satisfies the `CheckpointStore` Protocol by delegating writes and reads to the hub's `KnowledgeStore`. Pass a `Hub` for in-process durability or a `HubClient` for cross-process. For checkpoints that survive a hub restart, use `DiskKnowledgeStore(path)` on the hub.

## Production Notes

| Concern | Recommendation |
|---|---|
| TLS | Pass an `ssl_context` to `serve_ws(...)` and use `wss://` in `WsLink`. |
| Auth | Build the hub with `AuthRegistry([ApiKeyAuth(keys=...)])` and pass `Passport(auth=AuthBlock(...))` from each agent. |
| Durability | Use `DiskKnowledgeStore(path)` on the hub so the registry and channel WALs survive a restart. |
| Reconnect | On disconnect, build a fresh `HubClient`, call `open()`, then `attach(agent, name=..., since_envelope_id=last_id)`. The hub replays any unacked envelopes past that cursor. |
| Federation | Register a `RemoteAgentProxy` on the hub to route envelopes addressed to agents with `kind="remote_agent"` across hub boundaries. |

## Where to Next

- [Hub & Identity](hub_and_identity.md) - `AuthRegistry`, `ApiKeyAuth`, governance rules.
- [Agent Clients](agent_clients.md) - custom envelope handlers, replacing the default handler.
- [Governance, Audit & Observability](expectations_and_audit.md) - per-channel expectations and the hub audit log.

---

# Agent Tools

Source: https://docs.ag2.ai/latest/docs/beta/tools/tools/

# Agent Tools

Tools allow agents to interact with the outside world. By providing tools, you enable your agents to perform actions such as executing code, fetching data from APIs, querying databases, or performing complex calculations.

Under the hood, a tool is a standard Python function accompanied by a schema that describes its purpose, inputs, and outputs to the underlying Large Language Model (LLM).

## Creating Agent Tools

The easiest way to create a tool is by using the `@tool` decorator. This decorator automatically parses your function's signature, type hints, and docstring to generate a schema that the LLM can understand.

For the best results, **always provide clear type hints and a descriptive docstring**. The LLM relies heavily on these to know when and how to invoke your tool.

```python
from autogen.beta import tool

@tool
def calculate_shipping_cost(destination: str, weight_kg: float) -> str:
    """Calculates the shipping cost for a package based
    on its destination and weight.
    """
    return "$15.00"
```

Once defined, you can equip an agent with this capability by passing the tool to the `tools` list during the agent's initialization.

```python
from autogen.beta import Agent

agent = Agent(name="ShippingAssistant", tools=[calculate_shipping_cost])
```

!!! note
    For simpler use cases, you can pass an undecorated Python function directly to the agent's `tools` list. The framework will automatically convert it into a fully-fledged tool under the hood, extracting the schema from the signature and docstring just like the decorator would.

    ```python linenums="1"
    from autogen.beta import Agent

    def get_weather(location: str) -> str:
        """Returns the current weather for a given location."""
        return "Sunny, 22°C"

    # get_weather is automatically converted to a tool
    agent = Agent(name="WeatherBot", tools=[get_weather])
    ```

### Registering Tools via a Decorator

Alternatively, you can register a tool directly with an agent instance using its `@my_agent.tool` decorator.

This approach is particularly useful when you need to dynamically add capabilities to an agent after it has been created, or when you are logically organizing your code by attaching specific tools to specific agent instances.

```python
from autogen.beta import Agent

agent = Agent(name="CalculatorBot")

@agent.tool
def multiply(a: int, b: int) -> int:
    """Multiplies two integers and returns the result."""
    return a * b
```

### Tool middleware

To run async hooks around **one** tool (e.g. argument normalization, result redaction, bundled auditing), pass `middleware=[...]` with `ToolMiddleware` callables. See the dedicated [Tool middleware](tool_middleware.md) page.

## Synchronous and Asynchronous Tools

Agent interactions are naturally asynchronous. To support this, tools are executed within an asynchronous event loop by default. However, you have the flexibility to define your tool functions as either synchronous (`def`) or asynchronous (`async def`).

To ensure that heavy computational tasks or blocking I/O operations do not freeze the entire application, **synchronous tools are automatically executed in a separate thread** by default.

```python
# This synchronous tool runs in a separate thread to prevent blocking
@tool
def fetch_data_sync(url: str) -> str:
    """Fetches data from a URL using a blocking request library."""
    import requests
    return requests.get(url).text

# This native asynchronous tool runs directly in the main event loop
@tool
async def fetch_data_async(url: str) -> str:
    """Fetches data from a URL using an async request library."""
    import aiohttp
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()
```

### Disabling Threaded Execution

If you have a synchronous function that executes very quickly (e.g., simple string manipulation or math) and you want to avoid the minor overhead of thread creation, you can disable threaded execution by passing `sync_to_thread=False` to the `@tool` decorator.

```python
# This tool runs synchronously in the main event loop,
# without a separate thread
@tool(sync_to_thread=False)
def format_name(first_name: str, last_name: str) -> str:
    """Formats a full name."""
    return f"{last_name.upper()}, {first_name.capitalize()}"
```

!!! warning
    When `sync_to_thread=False` is set, the synchronous tool runs directly within the asynchronous context. If the function performs time-consuming operations (like network requests or large loops), **it will block the entire event loop** until it finishes, preventing other agents or asynchronous tasks from making progress.

## Customizing Tool Schemas

LLMs perform best when they have precise constraints and detailed instructions. The framework automatically generates a tool's schema from its function signature and docstring. For more granular control, you can define minimum/maximum values, enforce specific formats, or override the tool's name.

You can override the basic properties directly in the decorator:

```python
@tool(
    name="custom_math_tool",
    description="Performs advanced mathematical operations.",
)
def math_op(a: int, b: int) -> int:
    return a + b
```

!!! note
    Explicitly setting the name and description overrides the automatically generated values.

### Deep Schema Customization with Pydantic

Under the hood, arguments are serialized and validated using [Pydantic](https://docs.pydantic.dev/latest/) schemas. This means you can use standard `pydantic.Field` annotations to deeply customize individual schema parameters, providing the LLM with strict guidelines on what values are acceptable.

```python
from typing import Annotated
from pydantic import Field

@tool
def set_temperature(
    temp: Annotated[
        int,
        Field(
            ...,
            description="The target temperature.",
            ge=10,
            le=30,
        ),
    ],
    mode: Annotated[
        str,
        Field(
            ...,
            description="The thermostat mode.",
            pattern="^(heat|cool|auto)$",
        ),
    ]
) -> str:
    """Sets the thermostat to a specific temperature and mode."""
    return f"Set to {temp}°C in {mode} mode."
```

### Complete Custom Schema Example

A complete example combining custom tool properties and strict parameter validation looks like this:

```python
from typing import Annotated
from pydantic import Field
from autogen.beta import tool

@tool(
    name="create_user_profile",
    description="Creates a new user profile in the database.",
)
def create_profile(
    username: Annotated[
        str,
        Field(
            ...,
            description="The chosen username. Must be alphanumeric.",
            min_length=3,
            max_length=20,
        )
    ],
    age: Annotated[
        int,
        Field(
            ...,
            description="The user's age. Must be 18 or older.",
            ge=18,
        ),
    ],
) -> str:
    return f"Profile for {username} created."
```

This configuration generates the following detailed JSON schema, ensuring the LLM understands exactly what inputs are required and valid:

```json
{
    "description": "Creates a new user profile in the database.",
    "name": "create_user_profile",
    "parameters": {
        "properties": {
            "username": {
                "description": "The chosen username. Must be alphanumeric.",
                "maxLength": 20,
                "minLength": 3,
                "title": "Username",
                "type": "string"
            },
            "age": {
                "description": "The user's age. Must be 18 or older.",
                "minimum": 18,
                "title": "Age",
                "type": "integer"
            }
        },
        "required": [
            "username",
            "age"
        ],
        "type": "object"
    }
}
```

## Execution Context

Tools often need access to the broader execution context, such as injected dependencies, variables, or mechanisms for human-in-the-loop interactions. The AG2 framework supports these features natively.

!!! note
    Under the hood, these contextual capabilities are powered by the [FastDepends](https://github.com/Lancetnik/FastDepends) library, ensuring robust and FastAPI-like dependency management.

To access the execution context from within your tool, you can simply type-hint an argument with the `Context` object.

```python
from autogen.beta import Context, tool

@tool
async def my_tool(context: Context) -> str:
    # Access context variables or dependencies here
    return f"Execution context: {context}"
```

For more detailed information on specific context features, see [Dependency Injection](../context/inject.md), [Context Variables](../context/variables.md), [Depends](../depends.md), [Human-in-the-loop](../context/human_in_the_loop.md).

## Returning Rich Tool Results

By default, returning a plain `str` from a tool is the simplest option - the framework wraps it in a `TextInput` automatically. When you need more control over the returned content, you can use typed `Input` classes or compose multiple outputs with `ToolResult`.

### Input Types

The framework provides typed input classes for text, structured data, images, audio, video, documents, and raw binary payloads. See [Multimodal Inputs](../multimodal/inputs.md) for the full reference including provider support and factory variants.

Return any input type directly from a tool function just like you would a string:

```python
from autogen.beta import DataInput, TextInput, tool

@tool
def get_status(task_id: str) -> TextInput:
    """Returns a human-readable status update."""
    return TextInput(f"Task {task_id} is in progress.")

@tool
def get_user_profile(user_id: str) -> DataInput:
    """Returns a structured user profile."""
    return DataInput({"id": user_id, "name": "Alice", "role": "admin"})
```

### Returning Images and Binary Data

Use `ImageInput` when a tool needs to hand an image back to the model for further reasoning. The factory accepts a URL, a local file path, a pre-uploaded file ID, or raw bytes:

```python
from autogen.beta import ImageInput, ToolResult, tool

@tool
def fetch_chart(chart_id: str) -> ImageInput:
    """Fetches a chart image by ID and returns it for visual analysis."""
    url = f"https://charts.example.com/{chart_id}.png"
    return ImageInput(url)

@tool
def capture_screenshot(page: str) -> ImageInput:
    """Takes a screenshot of a page and returns it."""
    import subprocess
    raw = subprocess.check_output(["screenshot-cli", page])
    return ImageInput(data=raw, media_type="image/png")
```

When you have raw bytes of an arbitrary format, use `BinaryInput` directly and set the media type explicitly:

```python
from autogen.beta import BinaryInput, ToolResult, tool

@tool
def export_pdf(report_id: str) -> ToolResult:
    """Exports a report as a PDF and returns it alongside a summary."""
    pdf_bytes = _render_pdf(report_id)
    return ToolResult(
        f"Report {report_id} exported successfully.",
        BinaryInput(pdf_bytes, media_type="application/pdf"),
    )
```

!!! note
    Not all LLM providers support every media type. Check your provider's documentation for the list of accepted MIME types and file formats. The framework passes the binary payload through as-is; format validation is the provider's responsibility.

### Returning Multiple Inputs

Use `ToolResult` to combine multiple inputs into a single tool response. Each positional argument becomes a separate part the model receives:

```python
from autogen.beta import ImageInput, ToolResult, tool

@tool
def analyze_product(product_id: str) -> ToolResult:
    """Returns a product image alongside its structured metadata."""
    return ToolResult(
        ImageInput(f"https://cdn.example.com/products/{product_id}.jpg"),
        {"id": product_id, "name": "Widget Pro", "stock": 42},
    )
```

## Returning a Final Tool Result

By default, a tool result is sent back to the model so the agent can decide what to say next.
When the tool itself already knows the exact final answer, you can return `ToolResult(..., final=True)` to end the turn immediately without another model round-trip.

`ToolResult` accepts any `str` or `Input` as its first positional argument:

```python
from autogen.beta import Agent, DataInput, TextInput, ToolResult, tool

@tool
def handoff_to_human(ticket_id: str) -> ToolResult:
    """Escalates a request and returns the final user-facing message."""
    return ToolResult(
        f"Ticket {ticket_id} was escalated to a human agent.",
        final=True,
    )

@tool
def get_exchange_rate(currency: str) -> ToolResult:
    """Returns the current exchange rate as structured data."""
    return ToolResult(
        DataInput({"currency": currency, "rate": 1.23, "base": "USD"}),
        final=True,
    )

agent = Agent(name="SupportBot", tools=[handoff_to_human])

reply = await agent.ask("I need help with my ticket #123456.")
print(reply.body)
# Output: "Ticket 123456 was escalated to a human agent."
```

!!! note
    A `ToolResult` with `final=True` must contain **exactly one** part - either a `TextInput` or a `DataInput`. The content is returned as-is: `TextInput` produces the text directly, `DataInput` is JSON-serialized.

This is especially useful for tools that:

- perform an authoritative action and already know the exact reply
- return a message that should not be paraphrased by the model
- want to skip an extra LLM call for latency or cost reasons

---

# Toolkits

Source: https://docs.ag2.ai/latest/docs/beta/tools/toolkits/

# Toolkits

A `Toolkit` groups related tools into a single, reusable unit. Instead of passing individual tools one by one, you can bundle them into a toolkit and pass the whole collection to an agent. This is useful for organizing domain-specific capabilities (e.g., all database tools, all file-system tools) and sharing them across multiple agents.

```python
from autogen.beta import Agent
from autogen.beta.tools import Toolkit

def search_orders(query: str) -> str:
    """Searches the order database."""
    return "Order #123"

def cancel_order(order_id: str) -> str:
    """Cancels an order by its ID."""
    return f"Order {order_id} cancelled."

support_tools = Toolkit(search_orders, cancel_order)

agent = Agent(name="SupportBot", tools=[support_tools])
```

A toolkit accepts both plain functions and `@tool`-decorated functions in its `tools` list. Plain functions are automatically converted, just like when passing them directly to an agent.

## Registering Tools via a Decorator

You can also add tools to a toolkit using the `@toolkit.tool` decorator, following the same pattern as `@agent.tool`:

```python
from autogen.beta.tools import Toolkit

inventory = Toolkit()

@inventory.tool
def check_stock(item_id: str) -> int:
    """Returns the current stock count for an item."""
    return 42

@inventory.tool(
    name="reorder_item",
    description="Places a reorder for a low-stock item.",
)
def reorder(item_id: str, quantity: int) -> str:
    return f"Reordered {quantity} of {item_id}."
```

The toolkit can then be passed to any number of agents:

```python
from autogen.beta import Agent

warehouse_agent = Agent(name="WarehouseBot", tools=[inventory])
sales_agent = Agent(name="SalesBot", tools=[inventory])
```

## Combining Toolkits with Standalone Tools

You can freely mix toolkits and individual tools in an agent's `tools` list:

```python
from autogen.beta import tool

@tool
def escalate(reason: str) -> str:
    """Escalates the conversation to a human agent."""
    return "Escalated."

agent = Agent(
    name="SupportBot",
    tools=[support_tools, inventory, escalate],
)
```

## Toolkit middleware

Pass `middleware=[...]` to the `Toolkit` constructor to apply hooks to **every** tool in the set - both tools passed to the constructor and tools added later via `@toolkit.tool`:

```python
from autogen.beta import Context
from autogen.beta.events import ToolCallEvent, ToolResultEvent
from autogen.beta.middleware import ToolExecution
from autogen.beta.tools import Toolkit

async def log_calls(
    call_next: ToolExecution, event: ToolCallEvent, context: Context,
) -> ToolResultEvent:
    print(f"Calling {event.name}")
    return await call_next(event, context)

support_tools = Toolkit(search_orders, cancel_order, middleware=[log_calls])
```

Toolkit middleware is the **outermost** layer: it runs before any per-tool middleware defined with `@tool(middleware=[...])`.

### Per-tool middleware

`@toolkit.tool` also accepts the same `middleware=[...]` option as `@tool` and `@agent.tool`. These per-tool hooks run **inside** the toolkit-level middleware. See [Tool middleware](tool_middleware.md).

---

# Common Tools

Source: https://docs.ag2.ai/latest/docs/beta/tools/common_toolkits/

# Common Tools

AG2 ships with ready-made tools and toolkits that bundle related function tools into a single `Toolkit`. Unlike [built-in provider tools](builtin_tools.md), these run locally as regular Python functions and work with **every** provider.

## FilesystemToolkit

`FilesystemToolkit` gives an agent the ability to read, write, update, delete, and search files within a sandboxed directory. All paths are resolved relative to a configurable `base_path`, and a path-traversal guard prevents access outside it.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import FilesystemToolkit

fs = FilesystemToolkit(base_path="/tmp/workspace")

agent = Agent(
    "assistant",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[fs],
)
```

### Available tools

| Tool | Description |
| :--- | :--- |
| `read_file` | Read the contents of a file |
| `write_file` | Create or overwrite a file (creates parent directories automatically) |
| `update_file` | Replace the first occurrence of a string in a file |
| `delete_file` | Delete a file |
| `find_files` | Search for files matching a glob pattern (supports recursive `**` patterns) |

### Read-only mode

Pass `read_only=True` to expose only `read_file` and `find_files`:

```python
fs = FilesystemToolkit(base_path="./docs", read_only=True)
```

### Using individual tools

Every tool is available as an attribute on the toolkit instance. You can pass individual tools to an agent instead of the whole set:

```python
fs = FilesystemToolkit(base_path="/tmp/workspace")

agent = Agent(
    "reader",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[fs.read_file(), fs.find_files()],
)
```

### Using a temporary directory

For throwaway workspaces, use `tempfile.TemporaryDirectory` so the directory and all its contents are automatically cleaned up when the context manager exits:

```python
import tempfile
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import FilesystemToolkit

async def main() -> None:
    with tempfile.TemporaryDirectory() as tmpdir:
        fs = FilesystemToolkit(base_path=tmpdir)

        agent = Agent(
            "assistant",
            config=AnthropicConfig(model="claude-sonnet-4-6"),
            tools=[fs],
        )
        await agent.ask("Create a hello.py file that prints 'Hello, World!'")
```

!!! tip
    Prefer `tempfile.TemporaryDirectory` over hardcoded `/tmp` paths. It guarantees a unique directory per run and cleans up after itself, avoiding leftover files and collisions between concurrent executions.

### Path safety

All paths are resolved relative to `base_path`. Any attempt to escape the base directory (e.g. `../../etc/passwd`) raises a `PermissionError`:

```python
fs = FilesystemToolkit(base_path="/tmp/sandbox")

# The agent can access /tmp/sandbox/data.txt
# but NOT /tmp/sandbox/../../etc/passwd
```

---

## DuckDuckSearchTool

`DuckDuckSearchTool` gives an agent the ability to search the web using DuckDuckGo. No API key is required.

!!! note
    Requires the `ddgs` extra: `pip install ag2[ddgs]`

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import DuckDuckSearchTool

agent = Agent(
    "researcher",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[DuckDuckSearchTool()],
)
```

### Configuration

```python
tool = DuckDuckSearchTool(
    max_results=10,       # default: 5
    region="uk-en",       # default: "us-en"
    safesearch="strict",  # default: "moderate" - options: "on", "moderate", "off"
)
```

All parameters accept a `Variable` for dynamic values resolved at execution time.

---

## PerplexitySearchToolkit

`PerplexitySearchToolkit` gives an agent two related tools powered by [Perplexity](https://www.perplexity.ai): raw web search via the Search API and LLM-grounded answers with citations via Sonar - sharing a single client.

!!! note
    Requires the `perplexity` extra and an API key: `pip install ag2[perplexity]`

```python
import os
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import PerplexitySearchToolkit

agent = Agent(
    "researcher",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[PerplexitySearchToolkit(api_key=os.environ["PERPLEXITY_API_KEY"])],
)
```

If `api_key` is omitted, the Perplexity SDK reads the `PERPLEXITY_API_KEY` environment variable automatically.

### Tools

| Tool | Description |
| :--- | :--- |
| `perplexity_search` | Raw web search via the [Search API](https://docs.perplexity.ai/docs/search/quickstart) - ranked title/url/snippet/date results, no LLM hop |
| `perplexity_answer` | LLM-generated answer with citations via [Sonar Chat Completions](https://docs.perplexity.ai/docs/sonar/openai-compatibility) - also returns search results, citations, and optional images |

### Picking a subset of tools

Each tool is exposed as a factory method on the toolkit (`toolkit.search()`, `toolkit.answer()`). Call the method to get a ready-to-use tool, then pass only the ones you need to the agent:

```python
toolkit = PerplexitySearchToolkit(api_key=...)

agent = Agent(
    "researcher",
    config=config,
    tools=[toolkit.search()],
)
```

### Per-tool configuration

Per-call parameters live on the factory methods, not on the toolkit itself:

```python
toolkit = PerplexitySearchToolkit(api_key=...)

search_tool = toolkit.search(
    max_results=10,
    max_tokens_per_page=512,
    search_domain_filter=["arxiv.org", "-medium.com"],  # prefix '-' to exclude
    search_recency_filter="week",                       # "hour" | "day" | "week" | "month" | "year"
    search_after_date_filter="1/1/2025",                # MM/DD/YYYY
    search_before_date_filter="12/31/2025",
)

answer_tool = toolkit.answer(
    model="sonar-pro",              # "sonar" | "sonar-pro" | "sonar-reasoning" | "sonar-reasoning-pro" | "sonar-deep-research" - default: "sonar"
    max_tokens=2000,                # default: 1000
    search_context_size="high",     # "low" | "medium" | "high" - default: "high"
    search_mode="academic",         # "web" | "academic" | "sec"
    search_recency_filter="month",  # "hour" | "day" | "week" | "month" | "year"
    return_images=True,             # include image URLs in the response
    return_related_questions=True,  # include suggested follow-up questions
    search_domain_filter=["arxiv.org", "nature.com"],
)

agent = Agent("researcher", config=config, tools=[search_tool, answer_tool])
```

### HTTP and SDK options

The toolkit constructor accepts options for the underlying `httpx.AsyncClient` and the Perplexity SDK client. Any extra keyword arguments are forwarded directly to `AsyncPerplexity(...)` (e.g. `base_url`, `max_retries`, `default_headers`):

```python
toolkit = PerplexitySearchToolkit(
    api_key=...,
    proxy="http://proxy.company.com:8080",  # passed to httpx.AsyncClient
    verify=False,                            # disable TLS verification (httpx)
    timeout=30.0,                            # httpx timeout in seconds
    # extra kwargs below are forwarded to AsyncPerplexity
    base_url="https://custom.perplexity.example",
    max_retries=5,
    default_headers={"X-Trace-Id": "abc-123"},
)
```

### Result

Both tools return a `PerplexitySearchResponse` with these fields:

| Field | Description |
| :--- | :--- |
| `query` | The original search query |
| `results` | List of `PerplexitySearchResult` (`title`, `url`, `snippet`, `date`) |
| `content` | LLM-generated answer (filled by `perplexity_answer`; empty for `perplexity_search`) |
| `citations` | URLs the model cited inline (filled by `perplexity_answer`) |
| `images` | List of `PerplexityImageMeta` when `return_images=True` on `perplexity_answer` |

When `return_images=True`, image URLs are also surfaced as `ImageInput` parts on the tool result so the next model turn receives them as proper image inputs.

!!! tip
    Use `perplexity_search` when the agent only needs raw ranked URLs (cheaper, no LLM hop). Use `perplexity_answer` when a grounded answer with citations is helpful.

!!! tip
    `search_domain_filter` on `perplexity_answer` is a Pro-tier feature on the Perplexity API; see [usage tiers](https://docs.perplexity.ai/guides/usage-tiers).

---

## TavilySearchTool

`TavilySearchTool` gives an agent advanced web search capabilities via the [Tavily](https://tavily.com) API. Results include relevance scores and optional LLM-generated answers, raw page content, and images.

!!! note
    Requires the `tavily` extra and an API key: `pip install ag2[tavily]`

```python
import os
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import TavilySearchTool

agent = Agent(
    "researcher",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[TavilySearchTool(api_key=os.environ["TAVILY_API_KEY"])],
)
```

If `api_key` is omitted, Tavily reads the `TAVILY_API_KEY` environment variable automatically.

### Configuration

```python
tool = TavilySearchTool(
    max_results=5,
    search_depth="advanced",   # "basic" | "advanced" | "fast" | "ultra-fast"
    topic="news",              # "general" | "news" | "finance"
    include_answer=True,       # add an LLM-generated summary to the response
    include_raw_content=True,  # include full page text alongside the snippet
    include_images=True,       # include image URLs in the response
    time_range="week",         # "day" | "week" | "month" | "year"
    start_date="2024-01-01",   # YYYY-MM-DD
    end_date="2024-12-31",     # YYYY-MM-DD
    days=7,
    include_domains=["reuters.com", "bbc.com"],
    exclude_domains=["example.com"],
    country="US",              # ISO country code for localized results
    auto_parameters=True,      # let Tavily auto-tune query parameters
    include_favicon=True,      # include result favicons in the response
)
```

All search parameters accept a `Variable` for dynamic values resolved at execution time.

### HTTP and SDK options

The constructor also accepts options for the underlying `httpx.AsyncClient` and the Tavily SDK client. Any extra keyword arguments are forwarded directly to `AsyncTavilyClient(...)` (e.g. `api_base_url`, `company_info_tags`, `project_id`):

```python
tool = TavilySearchTool(
    api_key=...,
    proxy="http://proxy.company.com:8080",  # passed to httpx.AsyncClient
    verify=False,                            # disable TLS verification (httpx)
    timeout=30.0,                            # httpx timeout in seconds
    # extra kwargs below are forwarded to AsyncTavilyClient
    api_base_url="https://custom.tavily.example",
    company_info_tags=("news", "finance"),
)
```

---

## SandboxShellTool

`SandboxShellTool` gives an agent the ability to run shell commands inside an environment you choose. With no argument it uses a `LocalEnvironment` with a temporary working directory that is cleaned up on process exit. See the [Sandbox Shell page](local_shell.md) for the full guide.

!!! warning
    A `LocalEnvironment` executes arbitrary shell commands on your machine. Use `allowed`, `blocked`, or `readonly` to restrict what the agent can run - or a `DockerEnvironment` / `DaytonaEnvironment` for real isolation.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import SandboxShellTool, LocalEnvironment

agent = Agent(
    "engineer",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[SandboxShellTool(LocalEnvironment("/tmp/my_project"))],
)
```

The first argument is the environment; the backend (where commands run) is configured there. Passing nothing uses a temporary local directory.

### Restricting commands

Command policy lives on the tool:

```python
from autogen.beta.tools import SandboxShellTool, LocalEnvironment

# Allow only specific commands
sh = SandboxShellTool(LocalEnvironment("/tmp/my_project"), allowed=["git", "python", "pip"])

# Block dangerous commands
sh = SandboxShellTool(LocalEnvironment("/tmp/my_project"), blocked=["rm -rf", "curl", "wget"])

# Read-only mode - agent can inspect but not modify
sh = SandboxShellTool(LocalEnvironment("/tmp/my_project"), readonly=True)

# Hide sensitive files from the agent
sh = SandboxShellTool(LocalEnvironment("/tmp/my_project"), ignore=["**/.env", "*.key", "secrets/**"])
```

---

# MCP Servers

Source: https://docs.ag2.ai/latest/docs/beta/tools/mcp_servers/

# MCP Servers

[MCP (Model Context Protocol)](https://modelcontextprotocol.io/) is a protocol introduced by Anthropic that aims to standardize how tools and prompts are exposed to LLMs.
It can be thought of as a superset of regular [Tools](tools.md), created to solve two problems:

1. Each LLM provider had its own tool schema and types, making tools non-portable across providers.
2. There was no standard way to give an LLM a scoped, context-optimized surface to invoke APIs, RPCs, and similar remote capabilities.

## Two ways to connect an MCP server

Autogen supports both ways MCP servers are typically wired into an agent:

- **Client-side connection** - Autogen connects to the MCP server itself, discovers the tools, and executes them locally. The LLM only ever sees ordinary function tools. Works with every provider. Supports both **remote** servers (HTTP / streamable-http) and **local** servers (subprocess speaking MCP over stdin/stdout).
- **Provider-side connection** - the MCP server URL and credentials are forwarded to the LLM provider, which connects to the server and invokes the tools on its end. Only works with providers that natively support it (e.g. Anthropic).

| | `MCPToolkit` (client-side) | `MCPServerTool` (provider-side) |
|---|---|---|
| Who connects to the MCP server | Autogen | LLM provider |
| Who executes tool calls | Autogen | LLM provider |
| Works with any LLM provider | Yes | No - provider must support MCP passthrough |
| Supports local stdio servers | Yes | No - provider only accepts URLs |
| Credentials leave your infra | No | Yes - forwarded to the LLM provider |
| Custom middleware on tool calls | Yes | No |
| Lifecycle / connection pooling | Handled by Autogen | Handled by provider |

!!! tip
    Pick `MCPToolkit` when you want provider-agnostic behavior, local control over tool execution, when you need to run a local stdio MCP server, or when your MCP credentials must stay inside your infrastructure. Pick `MCPServerTool` when you're only targeting a provider that supports it and you'd rather let the provider manage the MCP lifecycle for you.

## Client-side: `MCPToolkit`

`MCPToolkit` is a [Toolkit](toolkits.md) - it discovers the server's tools lazily and exposes each one as a regular function tool to the agent. It accepts either a remote URL/`MCPServerConfig` or an `MCPStdioServerConfig` for a locally-launched subprocess; the rest of the agent doesn't care which transport is in use.

### Remote servers (HTTP)

The simplest form takes a URL string and uses the streamable-http transport:

```python
from autogen.beta import Agent
from autogen.beta.tools import MCPToolkit

agent_with_mcp = Agent(
    name="Weather bot",
    tools=[MCPToolkit("https://my-mcp-url.example.com")],
)
```

Most real-world MCP servers require authentication. Use `MCPServerConfig` for typed configuration:

```python
from autogen.beta import Agent
from autogen.beta.tools import MCPToolkit, MCPServerConfig

agent_with_mcp = Agent(
    name="Weather bot",
    tools=[
        MCPToolkit(
            MCPServerConfig(
                server_url="https://my-mcp-url.example.com",
                authorization_token="XXXXXX",
            )
        )
    ],
)
```

`MCPServerConfig` also accepts `headers`, `allowed_tools`, `blocked_tools`, `description`, and a `server_label` for logging, plus transport-level knobs `connection_timeout` (default `30.0`), `proxy`, and `verify` (TLS verification, default `True`). The first group - `server_url`, `server_label`, `authorization_token`, `description`, `allowed_tools`, `blocked_tools`, `headers` - can also be a `Variable` if the value is only known at runtime; `connection_timeout`, `proxy`, and `verify` take plain values only.

### Local servers (stdin/stdout)

Many MCP servers ship as CLIs that speak MCP over their own stdin/stdout - `npx -y @modelcontextprotocol/server-filesystem`, `uvx some-mcp-server`, a Python script in your repo, etc. Use `MCPStdioServerConfig` to launch one as a subprocess; Autogen pipes the MCP protocol through its stdio.

```python
from autogen.beta import Agent
from autogen.beta.tools import MCPToolkit, MCPStdioServerConfig

agent_with_local_mcp = Agent(
    name="Filesystem bot",
    tools=[
        MCPToolkit(
            MCPStdioServerConfig(
                command="npx",
                args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp/workspace"],
            )
        )
    ],
)
```

You can pass environment variables, a working directory, and the usual filtering options:

```python
from autogen.beta import Agent
from autogen.beta.tools import MCPToolkit, MCPStdioServerConfig

agent_with_local_mcp = Agent(
    name="GitHub bot",
    tools=[
        MCPToolkit(
            MCPStdioServerConfig(
                command="uvx",
                args=["mcp-server-github"],
                env={"GITHUB_TOKEN": "ghp_XXXXXX"},
                cwd="/srv/workspace",
                allowed_tools=["list_issues", "create_issue"],
                server_label="github",
            )
        )
    ],
)
```

`MCPStdioServerConfig` accepts `command`, `args`, `env`, `cwd`, `server_label`, `description`, `allowed_tools`, `blocked_tools`, and `encoding` (default `"utf-8"`). All of these except `encoding` can be a `Variable` if the value is only known at runtime - handy for injecting per-conversation tokens or workspace paths.

!!! note
    The subprocess is launched lazily, on the first tool-discovery / tool-call. A short-lived MCP session is opened for each operation, so there's no persistent process to manage from your code.

### Multiple servers

`MCPToolkit` is just a toolkit, so you can register as many as you need - and freely mix remote and local ones:

```python
from autogen.beta import Agent
from autogen.beta.tools import MCPToolkit, MCPStdioServerConfig

agent_with_mcp = Agent(
    name="Mixed bot",
    tools=[
        MCPToolkit("https://my-mcp-url.example.com"),
        MCPToolkit("https://my-mcp-url2.example.com"),
        MCPToolkit(
            MCPStdioServerConfig(
                command="npx",
                args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
            )
        ),
    ],
)
```

## Provider-side: `MCPServerTool`

`MCPServerTool` does not open a connection - it ships the server URL and credentials to the LLM provider as part of the request. The provider is then responsible for connecting to the MCP server and dispatching tool calls. Because the provider only accepts a URL, this path does **not** support local stdio servers - use `MCPToolkit` with `MCPStdioServerConfig` for those.

```python
from autogen.beta import Agent
from autogen.beta.tools import MCPServerTool

agent_with_mcp = Agent(
    name="Weather bot",
    tools=[
        MCPServerTool(
            server_url="https://my-mcp-url.example.com",
            server_label="weather",
            authorization_token="XXXXXX",
        ),
    ],
)
```

`MCPServerTool` also accepts `description`, `allowed_tools`, `blocked_tools`, and `headers`. All constructor parameters can be a `Variable` if the value is only known at runtime.

!!! warning
    With provider-side connections, your MCP credentials are sent to the LLM provider on every request. Only use this path with providers and servers you trust with those credentials.

---

# Built-in Provider Tools

Source: https://docs.ag2.ai/latest/docs/beta/tools/builtin_tools/

# Built-in Provider Tools

AG2 includes built-in tools that map to server-side capabilities offered by LLM providers. These tools are executed by the provider's API - not locally - and require no function implementation on your side.

| Tool | Anthropic | OpenAI | Gemini |
| :--- | :---: | :---: | :---: |
| `CodeExecutionTool` | ✓ | ✓ | ✓ |
| `WebSearchTool` | ✓ | ✓ | ✓ |
| `WebFetchTool` | ✓ | ✗ | ✓ |
| `ShellTool` | ✓ | ✓ | ✗ |
| `MCPServerTool` | ✓ | ✓ | ✗ |
| `ImageGenerationTool` | ✗ | ✓ | ✗ |
| `MemoryTool` | ✓ | ✗ | ✗ |

## Web Search

Gives the model access to real-time web search results.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import WebSearchTool, UserLocation

agent = Agent(
    "researcher",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[
        WebSearchTool(
            max_uses=5,
            user_location=UserLocation(country="US"),
            allowed_domains=["github.com", "pypi.org"],
            blocked_domains=["pinterest.com"],
        ),
    ],
)
```

Not all parameters are supported by every provider. Unsupported parameters are silently ignored.

| Parameter | Anthropic | OpenAI | Gemini |
| :--- | :---: | :---: | :---: |
| `max_uses` | ✓ | ✓ | ✗ |
| `user_location` | ✓ | ✓ | ✗ |
| `search_context_size` | ✗ | ✓ | ✗ |
| `allowed_domains` | ✓ | ✓ | ✗ |
| `blocked_domains` | ✓ | ✗ | ✓ |

## Web Fetch

Fetches full content from specific URLs. Useful for reading documentation, articles, or PDFs.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import WebFetchTool

agent = Agent(
    "researcher",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[
        WebFetchTool(
            max_uses=3,
            max_content_tokens=50000,
            citations=True,
        ),
    ],
)
```

| Parameter | Anthropic | Gemini |
| :--- | :---: | :---: |
| `max_uses` | ✓ | ✗ |
| `allowed_domains` | ✓ | ✗ |
| `blocked_domains` | ✓ | ✗ |
| `citations` | ✓ | ✗ |
| `max_content_tokens` | ✓ | ✗ |

!!! note
    OpenAI does not support web fetch. Using `WebFetchTool` with an OpenAI config will raise an error.

## Code Execution

Lets the model write and run code inline during a conversation.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import CodeExecutionTool

agent = Agent(
    "analyst",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[CodeExecutionTool()],
)
```

The tool accepts a `version` parameter for provider version pinning:

```python
CodeExecutionTool(version="code_execution_20250825")
```

## Memory

Enables Claude to store and retrieve information across conversations. Claude can create, read, update, and delete files in a `/memories` directory.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import MemoryTool

agent = Agent(
    "assistant",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[MemoryTool()],
)
```

!!! note
    `MemoryTool` is currently only supported by Anthropic.

## Shell

Gives the model the ability to run shell commands. The execution environment depends on the provider.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import ShellTool

agent = Agent(
    "devops",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[ShellTool()],
)
```

OpenAI supports configuring the execution environment:

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIResponsesConfig
from autogen.beta.tools import ShellTool
from autogen.beta.tools.builtin.shell import ContainerAutoEnvironment, NetworkPolicy

agent = Agent(
    "devops",
    config=OpenAIResponsesConfig(model="gpt-4.1"),
    tools=[
        ShellTool(
            environment=ContainerAutoEnvironment(
                network_policy=NetworkPolicy(allowed_domains=["pypi.org"]),
            ),
        ),
    ],
)
```

| Environment | Description |
| :--- | :--- |
| `ContainerAutoEnvironment` | Provider-managed container with optional network policy |
| `ContainerReferenceEnvironment` | Reference an existing container by ID |

!!! warning
    `ShellTool` gives the model direct shell access. Use it only with trusted prompts and consider restricting the environment.

## MCP Server

Integrates external [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) servers, giving the model access to remote tools.

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import MCPServerTool

agent = Agent(
    "assistant",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[
        MCPServerTool(
            server_url="https://mcp.example.com/sse",
            server_label="my-tools",
            allowed_tools=["search", "summarize"],
        ),
    ],
)
```

| Parameter | Anthropic | OpenAI |
| :--- | :---: | :---: |
| `server_url` | ✓ | ✓ |
| `server_label` | ✓ | ✓ |
| `authorization_token` | ✓ | ✗ |
| `description` | ✓ | ✗ |
| `allowed_tools` | ✓ | ✓ |
| `blocked_tools` | ✓ | ✗ |
| `headers` | ✗ | ✓ |

## Image Generation

`ImageGenerationTool` instructs the model to generate images inline during a conversation. Generated images are returned via `reply.files`.

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIResponsesConfig
from autogen.beta.tools import ImageGenerationTool

agent = Agent(
    "designer",
    config=OpenAIResponsesConfig(model="gpt-4.1"),
    tools=[
        ImageGenerationTool(
            quality="high",
            size="1024x1024",
            output_format="png",
            background="transparent",
        ),
    ],
)

reply = await agent.ask("Generate a logo for a coffee shop.")
for image in reply.files:
    print(image.metadata.get("media_type"), len(image.data))
```

!!! note
    `ImageGenerationTool` is only supported by OpenAI (Responses API). Gemini generates images through a response modality rather than a tool. See [Image Generation](../multimodal/image_generation.md) for the full guide covering both providers.

## Anthropic Tool Versions

Anthropic versions their server-side tools. Newer versions support dynamic filtering (Claude writes code to filter results before loading into context), but require Opus 4.6 or Sonnet 4.6.

Set the version on each built-in tool (defaults match the older Anthropic tool revisions):

```python
from autogen.beta.tools import WebFetchTool, WebSearchTool

tools = [
    WebSearchTool(version="web_search_20260209"),  # default: web_search_20250305
    WebFetchTool(version="web_fetch_20260209"),    # default: web_fetch_20250910
]
```

The default versions are compatible with all Claude models including Haiku.

---

# Code Execution

Source: https://docs.ag2.ai/latest/docs/beta/tools/code_execution/

# Code Execution

AG2 supports two ways to let an agent run code: have the LLM provider execute it inside their own sandbox, or run it client-side through a sandboxed backend you control. Both produce the same conversational pattern - the model writes code, code runs, results come back - but the trade-offs are different.

| | Built-in Provider | Remote (`SandboxCodeTool`) |
| :--- | :--- | :--- |
| **Where it runs** | Provider's sandbox | A `CodeEnvironment` you supply (Daytona, Docker, custom) |
| **Setup** | Add the tool, done | Choose / configure a backend |
| **Cost** | Bundled in provider tokens | Your sandbox bill (Daytona) or free (local Docker) |
| **Custom packages, images** | No | Yes |
| **State persistence** | Provider-defined | Per-environment instance |
| **Provider support** | Only providers with native code-exec | Any provider |

## Built-in Provider Code Execution

Some providers expose a server-side Python sandbox the model can drive directly. AG2 surfaces this through [`CodeExecutionTool`](builtin_tools.md#code-execution) - a declaration-only tool that maps to each provider's native capability:

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import CodeExecutionTool

agent = Agent(
    "analyst",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[CodeExecutionTool()],
)
```

See the [Built-in Tools page](builtin_tools.md#code-execution) for the provider support matrix and version pinning.

!!! note "When to use this"
    Cheapest to wire up, no infrastructure to run. The trade-off is no control over the runtime - you can't preinstall packages, persist files between calls, or use this on a provider without native code-execution support.

## Remote Code Execution

`SandboxCodeTool` exposes a `run_code(code, language)` function the agent can call. Because it's a regular function tool, it works on **any** model provider. Where the code actually runs is decided by the [`CodeEnvironment`](https://github.com/ag2ai/ag2/blob/main/autogen/beta/tools/code/environment/base.py) you hand it.

!!! note "`environment` is required"
    `SandboxCodeTool` has no default backend. Pass `DaytonaEnvironment`, `DockerEnvironment`, or your own `CodeEnvironment` implementation.

!!! note "When to use this"
    You need custom packages or images, persistent state across calls, your own infrastructure, or you're working with a provider that doesn't have a native code-execution capability.

Two environments are available: **Daytona** (hosted) and **Docker** (local container). You can also implement your own - see the **Custom** tab below.

=== "Daytona"
    [Daytona](https://www.daytona.io/) is a hosted sandbox service. Strongest isolation; pay per use.

    ```bash
    pip install "daytona>=0.171.0,<1"
    ```

    ```python linenums="1"
    from autogen.beta import Agent
    from autogen.beta.config import AnthropicConfig
    from autogen.beta.tools import SandboxCodeTool
    from autogen.beta.extensions.daytona import DaytonaEnvironment

    agent = Agent(
        "analyst",
        config=AnthropicConfig(model="claude-sonnet-4-6"),
        tools=[SandboxCodeTool(DaytonaEnvironment())],
    )

    reply = await agent.ask("Compute the 50th Fibonacci number in Python.")
    print(await reply.content())
    ```

    `DaytonaEnvironment` reads `DAYTONA_API_KEY`, `DAYTONA_API_URL`, and `DAYTONA_TARGET` from the environment by default. Out-of-the-box supported languages: `python`, `bash`, `javascript`, `typescript`.

=== "Docker"
    A local container managed via the Docker daemon. Free, cross-platform (Mac/Linux/Win via Docker Desktop), real container isolation.

    ```bash
    pip install "ag2[docker]"
    ```

    ```python linenums="1"
    from autogen.beta import Agent
    from autogen.beta.config import AnthropicConfig
    from autogen.beta.tools import SandboxCodeTool
    from autogen.beta.extensions.docker import DockerEnvironment

    agent = Agent(
        "analyst",
        config=AnthropicConfig(model="claude-sonnet-4-6"),
        tools=[SandboxCodeTool(DockerEnvironment(image="python:3.12-slim"))],
    )
    ```

    Default supported languages: `python` and `bash` (both ship in `python:3.12-slim`). Add `"javascript"` / `"typescript"` only if your image has `node` / `ts-node` installed.

    Safety defaults are deliberately strict:

    - `network_mode="none"` - no network access. Set to `"bridge"` to opt in.
    - `mem_limit="512m"` - caps runaway processes.
    - `auto_remove=True` - container is removed on stop.
    - `user=None` - runs as the image's default user. For images that ship a `nobody` user, `user="nobody"` is recommended.

=== "Custom"
    `SandboxCodeTool` only depends on the `CodeEnvironment` protocol, so any backend that satisfies it works - e2b, an SSH host, an internal CI runner.

    ```python linenums="1"
    from autogen.beta.tools import SandboxCodeTool
    from autogen.beta.tools.code import CodeEnvironment, CodeLanguage, CodeRunResult

    class MyEnvironment(CodeEnvironment):
        @property
        def supported_languages(self) -> tuple[CodeLanguage, ...]:
            return ("python",)

        async def run(self, code: str, language: CodeLanguage, *, context=None) -> CodeRunResult:
            # ship code to wherever you run it; return stdout + exit code
            ...
            return CodeRunResult(output="...", exit_code=0)

    sandbox = SandboxCodeTool(MyEnvironment())
    ```

    The `context` argument is the active `ConversationContext`, forwarded so backends can resolve [`Variable`](tools.md#variables) markers from `context.variables` (e.g. per-tenant credentials). Backends with no runtime-configurable parameters can ignore it.

### Lifecycle

The sandbox / container is created lazily on the first `run_code` call and reused for the lifetime of the environment instance. Cleanup is registered via `atexit` so resources are released even if you forget to close the environment. For tighter scoping, use the environment as an async context manager:

```python
async with DaytonaEnvironment(image="python:3.12") as env:
    agent = Agent(
        "analyst",
        config=AnthropicConfig(model="claude-sonnet-4-6"),
        tools=[SandboxCodeTool(env)],
    )
    await agent.ask("...")
# sandbox deleted here
```

The same pattern works with `DockerEnvironment` (container stopped + removed) and any other backend that implements `__aenter__` / `__aexit__`.

### Credentials and runtime config

`DaytonaEnvironment` accepts `Variable` markers for `api_key`, `api_url`, `target`, `image`, `snapshot`, and `env_vars`. `DockerEnvironment` accepts them for `image`, `env_vars`, and `network_mode`. Variables resolve from `context.variables` on the first `run_code` call - useful for multi-tenant setups:

```python
from autogen.beta import Variable
from autogen.beta.extensions.daytona import DaytonaEnvironment

env = DaytonaEnvironment(
    api_key=Variable("daytona_key"),  # resolved from ctx.variables["daytona_key"]
    image=Variable("tenant_image"),
)
```

### State persistence

A single `CodeEnvironment` instance reuses the **same** sandbox / container across every `run_code` call routed through it. Files written in one call are visible in the next, and packages installed once stay installed. Each snippet still runs as a fresh process, so Python globals defined in one call are not visible to the next - persist state on disk.

| Scenario | Same sandbox? |
| :--- | :---: |
| Multiple `run_code` calls within one `agent.ask(...)` | yes |
| `agent.ask(...)` -> `reply.ask(...)` (same agent, same tool) | yes |
| Two agents sharing the same `SandboxCodeTool` instance | yes (shared filesystem state) |
| New `CodeEnvironment(...)` per request | no - each spins up its own sandbox |
| After `await env.aclose()` or process exit | no - sandbox is deleted |

For most chat-style agents, instantiate one `CodeEnvironment` per agent (or per conversation) so state is scoped the way you'd expect. The first tool call pays the sandbox-creation round-trip; subsequent calls reuse it.

---

# Sandbox Shell Tool

Source: https://docs.ag2.ai/latest/docs/beta/tools/local_shell/

# Sandbox Shell Tool

`SandboxShellTool` lets an agent run shell commands inside an **environment** you choose - a local subprocess, a Docker container, a Daytona sandbox, or any custom backend. Unlike the provider-native [`ShellTool`](builtin_tools.md#shell) (which runs server-side and only on Anthropic/OpenAI), it executes client-side, so it works with **any model provider**.

The design has two orthogonal pieces:

- The **environment** decides *where* commands run and carries all backend config (image, env vars, network, timeout, ...).
- The **tool** decides the agent-facing policy (`allowed` / `blocked` / `ignore` / `readonly`).

The same environment can back both a `SandboxShellTool` and a [`SandboxCodeTool`](code_execution.md).

## Quick Start

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import SandboxShellTool

agent = Agent(
    "coder",
    "You write and run Python code.",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[SandboxShellTool()],
)

reply = await agent.ask("Write a hello world script and run it.")
print(await reply.content())
```

With no arguments, `SandboxShellTool` uses a `LocalEnvironment` with a temporary working directory that is cleaned up when the process exits.

## Choosing an Environment

The first argument is the environment. Pass a `LocalEnvironment`, `DockerEnvironment`, or `DaytonaEnvironment`:

```python
from autogen.beta.tools import SandboxShellTool, LocalEnvironment
from autogen.beta.extensions.docker import DockerEnvironment
from autogen.beta.extensions.daytona import DaytonaEnvironment

# Local subprocess in a specific directory
sh = SandboxShellTool(LocalEnvironment("/tmp/my_project"))

# Docker container - configure the backend once
sh = SandboxShellTool(DockerEnvironment(image="python:3.12-slim", network_mode="none"))

# Daytona hosted sandbox
sh = SandboxShellTool(DaytonaEnvironment(image="python:3.12"))
```

A `LocalEnvironment` directory is created automatically if it does not exist, and is **not** deleted on exit when an explicit path is given.

## Command Filtering

Filtering policy lives on the tool, not the environment:

```python
from autogen.beta.tools import SandboxShellTool, LocalEnvironment

sh = SandboxShellTool(
    LocalEnvironment("/tmp/my_project"),
    allowed=["python", "uv run", "git"],
    blocked=["rm -rf", "curl", "wget"],
    ignore=["**/.env", "*.key", "secrets/**"],
)
```

### Tool Parameters

| Parameter | Default | Description |
| :--- | :--- | :--- |
| `environment` | `None` | Backend: `LocalEnvironment` / `DockerEnvironment` / `DaytonaEnvironment`. `None` -> `LocalEnvironment()` |
| `allowed` | `None` | Whitelist of command prefixes. `None` -> all commands allowed |
| `blocked` | `None` | Blacklist of command prefixes. `None` -> nothing blocked |
| `ignore` | `None` | Gitignore-style path patterns. Commands referencing matching paths return `"Access denied: <path>"` |
| `readonly` | `False` | When `True` and `allowed` is not set, restricts to a built-in read-only list (`cat`, `ls`, `grep`, `git log`, ...) |

### LocalEnvironment Parameters

| Parameter | Default | Description |
| :--- | :--- | :--- |
| `path` | `None` | Working directory. `None` -> temporary dir prefixed `ag2_sandbox_`, deleted on exit |
| `cleanup` | `None` | `None` -> auto (`True` when `path=None`, `False` otherwise) |
| `timeout` | `60` | Per-command timeout in seconds. Returns an exit code `124` result on expiry |
| `max_output` | `100_000` | Maximum characters in the returned output; truncated output gets a `[truncated: ...]` suffix |
| `env_vars` | `None` | Environment variables merged into every command |

### Filter Order

Filtering is applied in this order on every `run_shell_command(command)` call:

1. **`allowed`** - if set, the command must match at least one prefix. Otherwise: `"Command not allowed: <cmd>"`.
2. **`blocked`** - if set, the command must not match any prefix. Otherwise: `"Command not allowed: <cmd>"`.
3. **`ignore`** - literal file paths parsed from the command string are resolved and checked against the patterns. On match: `"Access denied: <path>"`.
4. **Execute** - the command runs in the environment.

!!! note
    `ignore` checks only literal path tokens in the command string. Paths computed dynamically inside the shell (variable substitution, command substitution, glob expansion) are not inspected.

## Read-Only Mode

Use `readonly=True` to let the agent inspect files without modifying anything:

```python
from autogen.beta.tools import SandboxShellTool, LocalEnvironment

sh = SandboxShellTool(LocalEnvironment("/my/codebase"), readonly=True)
```

This restricts commands to `cat`, `head`, `tail`, `ls`, `grep`, `find`, `git log`, `git diff`, `git status`, and a few others. Pass an explicit `allowed` list to override this set.

## Accessing the Working Directory

`SandboxShellTool` exposes the resolved working directory via the `workdir` property:

```python
sh = SandboxShellTool(LocalEnvironment("/tmp/my_project"))
print(sh.workdir)  # PosixPath('/tmp/my_project')
```

## Stateful Multi-Turn Conversations

Because files persist in `workdir` across `ask()` calls, the agent can build on prior work in a chained conversation:

```python
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import SandboxShellTool, LocalEnvironment

sh = SandboxShellTool(LocalEnvironment("/tmp/counter_demo"))
agent = Agent("coder", "You manage files.", config=AnthropicConfig(model="claude-sonnet-4-6"), tools=[sh])

reply1 = await agent.ask("Create counter.txt with value 0")
reply2 = await reply1.ask("Increment the counter by 1")
reply3 = await reply2.ask("Read the counter and tell me the value")
```

!!! warning
    A `LocalEnvironment` gives the agent direct access to your filesystem and the ability to run arbitrary commands. Always set `allowed`, `blocked`, or `readonly` when exposing it to untrusted prompts, or use a `DockerEnvironment` / `DaytonaEnvironment` for real isolation.

## SandboxShellTool vs ShellTool

| | `SandboxShellTool` | `ShellTool` |
| :--- | :--- | :--- |
| **Execution** | Client-side, in your environment | Provider-side (Anthropic / OpenAI) |
| **Provider support** | Any provider | Anthropic, OpenAI only |
| **Environment control** | Full (`allowed`, `blocked`, `ignore`, backend choice) | Limited (provider-dependent) |
| **Import** | `autogen.beta.tools.SandboxShellTool` | `autogen.beta.tools.ShellTool` |

---

# Tool middleware

Source: https://docs.ag2.ai/latest/docs/beta/tools/tool_middleware/

# Tool middleware

Tool middleware lets you wrap **one** function tool with async hooks that run immediately around its execution-the same idea as an agent's `on_tool_execution()` [Middleware](../middleware.md), but attached at **tool definition** time with plain callables (no `BaseMiddleware` subclass).

Use this pattern when behavior is specific to a single tool. For policies that apply to **every** tool on an agent, register [`BaseMiddleware`](../middleware.md) with `on_tool_execution()` instead.

## Why use it

Tool-scoped hooks are **optional**. They help when:

- **Colocation** - Validation, redaction, or metrics for one tool live next to that implementation instead of in shared agent middleware.
- **Clear contracts** - Libraries can ship a tool with hooks that always run (normalize arguments, scrub secrets) without requiring consumers to register matching agent middleware.
- **Simpler agents** - You avoid a large `on_tool_execution` full of `if event.name == ...` when only a few tools need special handling.

Use agent **`middleware=[...]`** when the policy is global or shared across most tools. Use **`middleware=[...]`** on [`@tool`](tools.md), [`@agent.tool`](tools.md#registering-tools-via-a-decorator), or [`@toolkit.tool`](toolkits.md) when the behavior belongs to that tool only.

## Typical cases

| Scenario | Reason to use tool-scoped hooks |
| -------- | ------------------------------- |
| Normalize or validate arguments for one function | Only that tool's schema needs the transform; keeps agent middleware small. |
| Redact or reshape results before they return to the model | Per-tool privacy or formatting (for example strip internal IDs). |
| Light auditing or metrics for a sensitive action | The hook is bundled with the tool so it is hard to forget at agent setup. |
| Retry or fallback tied to one integration | Failure handling stays next to the API client without naming the tool in global middleware. |
| **Approve or reject** before the tool body runs | Gate dangerous or irreversible tools on policy, session flags, or a human decision without scattering checks inside the implementation. |

## API

The public type alias is **`ToolMiddleware`** in `autogen.beta.middleware`.
A hook is an **async** callable with the same parameters as `BaseMiddleware.on_tool_execution`, except the first argument is the inner **`ToolExecution`** (the next step in the chain), not `self`.

- Pass **`middleware=[hook, ...]`** to **`@tool`**, **`Agent.tool`**, or **`Toolkit.tool`**.
- Multiple hooks use the **same nesting as agent tool middleware**: the **first** entry in the list is the **outermost** layer around the tool body.
- If the agent **also** registers `BaseMiddleware` with `on_tool_execution`, **agent middleware runs outside** tool-scoped hooks (it sees the full execution, including hooks).

```python
from typing import Annotated

from autogen.beta import Agent, Context, tool, Variable
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import ToolCallEvent, ToolResultEvent
from autogen.beta.middleware import ToolExecution

async def add_request_id(
    call_next: ToolExecution,
    event: ToolCallEvent,
    context: Context,
) -> ToolResultEvent:
    context.variables.setdefault("request_id", "unknown")
    return await call_next(event, context)

@tool(middleware=[add_request_id])
def search(query: str, request_id: Annotated[str, Variable()]) -> str:
    """Runs a search. request_id may be injected by middleware."""
    return f"results-for-{query}-{request_id}"

agent = Agent("assistant", config=OpenAIConfig("gpt-4o-mini"))
```

!!! note
    Tool-scoped hooks are plain callables. They do **not** use the `Middleware(...)` factory or `BaseMiddleware`.

## Adding middleware to an existing tool

Use `tool.with_middleware()` to wrap a tool with additional hooks **without modifying** the original. The returned tool is an independent copy with the new middleware as the outermost layer:

```python
from autogen.beta import tool
from autogen.beta.events import ToolCallEvent, ToolResultEvent
from autogen.beta.middleware import ToolExecution

async def audit(
    call_next: ToolExecution, event: ToolCallEvent, context: Context,
) -> ToolResultEvent:
    print(f"audit: {event.name}")
    return await call_next(event, context)

@tool
def delete_record(record_id: str) -> str:
    """Deletes a record."""
    return f"Deleted {record_id}"

audited_delete = delete_record.with_middleware(audit)

# delete_record is unchanged; audited_delete runs audit -> delete_record
```

## Toolkit-level middleware

Pass `middleware=[...]` to a [`Toolkit`](toolkits.md) constructor to apply hooks to **all** tools in the set. Toolkit middleware is the outermost layer - it runs before any per-tool hooks. See [Toolkit middleware](toolkits.md#toolkit-middleware).

---

# Approval Required

Source: https://docs.ag2.ai/latest/docs/beta/tools/approval_required/

# Approval Required

`approval_required()` is a built-in [tool middleware](tool_middleware.md) that gates tool execution on **human approval**. When the agent tries to call a tool decorated with this middleware, the user is prompted to approve or deny the call before it runs.

This is useful for tools that perform **irreversible**, **expensive**, or **sensitive** actions - sending emails, modifying databases, executing payments, or deleting resources.

## Quick start

```python
import asyncio

from autogen.beta import Agent, tool
from autogen.beta.config import OpenAIConfig
from autogen.beta.middleware import approval_required

@tool(
    middleware=[approval_required()],
)
def delete_account(user_id: str) -> str:
    """Deletes a user account by ID permanently."""
    return f"Account {user_id} deleted."

agent = Agent(
    "assistant",
    config=OpenAIConfig("gpt-4o-mini"),
    tools=[delete_account],
    hitl_hook=lambda event: input(event.content),
)

async def main() -> None:
    reply = await agent.ask("Delete the account for user abc-123.")
    print(await reply.content())

asyncio.run(main())
```

When the agent calls `delete_account`, the user sees:

```
Agent tries to call tool:
`delete_account`, {"user_id": "abc-123"}
Please approve or deny this request.
Y/N?
```

Typing **y** lets the tool run. Any other input denies it - the agent receives the denied message and can adjust.

!!! note
    `approval_required()` relies on the agent's `hitl_hook` to collect user input. If no `hitl_hook` is configured, `context.input()` will raise an error at runtime. Read more about [Human in the Loop](../context/human_in_the_loop.md) to learn how to configure a HITL hook.

## Customizing the prompt

Override the `message` parameter to tailor the approval prompt:

```python
@tool(
    middleware=[approval_required(
        message="⚠️ The agent wants to run `{tool_name}` with {tool_arguments}. Allow? (y/n)",
        denied_message="Operation blocked by user.",
    )],
)
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to the given address."""
    return f"Email sent to {to}."
```

---

# Exa Search

Source: https://docs.ag2.ai/latest/docs/beta/extensions/tools/search/exa/

`ExaToolkit` gives an agent four related tools powered by the [Exa](https://exa.ai) neural search engine: web search, find-similar, content retrieval, and AI-powered answers - all sharing a single client.

!!! note
    Requires the `exa-py` package and an API key: `pip install "exa-py>=2.12.1,<3"`

```python
import os
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.extensions.tools.search import ExaToolkit

agent = Agent(
    "researcher",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[ExaToolkit(api_key=os.environ["EXA_API_KEY"])],
)
```

If `api_key` is omitted, the Exa SDK reads `EXA_API_KEY` from the environment automatically.

## Tools

| Tool | Description |
| :--- | :--- |
| `exa_search` | Neural web search with filters (domains, dates, type, category) |
| `exa_find_similar` | Find pages similar to a given URL |
| `exa_get_contents` | Fetch full text content for specific URLs |
| `exa_answer` | Get an AI-generated answer with citations |

## Shared defaults

`num_results` and `max_characters` on the constructor are applied to the default `exa_search` and `exa_find_similar` tools:

```python
toolkit = ExaToolkit(
    api_key=...,
    num_results=10,         # applies to search & find_similar
    max_characters=2000,    # per-result text cap for search; None = metadata-only
)
```

## Picking a subset of tools

Each tool is exposed as a factory method on the toolkit (`toolkit.search()`, `toolkit.find_similar()`, `toolkit.get_contents()`, `toolkit.answer()`). Call the method to get a ready-to-use tool, then pass only the ones you need to the agent:

```python
toolkit = ExaToolkit(api_key=...)

agent = Agent(
    "researcher",
    config=config,
    tools=[toolkit.search(), toolkit.answer()],
)
```

## Per-tool configuration

Per-call parameters (filters, domains, dates, `num_results`, `max_characters`, etc.) live on the factory methods, not on the toolkit itself:

```python
toolkit = ExaToolkit(api_key=...)

search_tool = toolkit.search(
    num_results=5,
    max_characters=2000,           # triggers search_and_contents for full text
    search_type="neural",          # "neural" | "keyword" | "hybrid" | "auto" | "fast" | "deep"
    category="research paper",     # e.g. "news", "github", "pdf", ...
    include_domains=["arxiv.org"],
    exclude_domains=["medium.com"],
    start_published_date="2024-01-01",
    end_published_date="2024-12-31",
    use_autoprompt=True,
    livecrawl="always",            # "never" | "fallback" | "always" | "preferred"
)

agent = Agent("researcher", config=config, tools=[search_tool, toolkit.answer()])
```

When `max_characters` is set, `exa_search` calls Exa's `search_and_contents` endpoint so each result carries `text`. When `max_characters` is `None`, only metadata is returned (cheaper and faster).

All runtime parameters accept `Variable` for deferred context resolution.

---

# TinyFish Search

Source: https://docs.ag2.ai/latest/docs/beta/extensions/tools/search/tinyfish/

`TinyFishSearchToolkit` gives an agent two related tools powered by [TinyFish](https://www.tinyfish.ai/): web search and browser-rendered page fetch. Use it when an agent needs current search results and then needs to read the full content of selected URLs.

!!! note
    Requires the `tinyfish` package and an API key: `pip install "tinyfish>=0.2.3"`

```python
import os
from autogen.beta import Agent
from autogen.beta.config import AnthropicConfig
from autogen.beta.extensions.tools.search import TinyFishSearchToolkit

agent = Agent(
    "researcher",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[TinyFishSearchToolkit(api_key=os.environ["TINYFISH_API_KEY"])],
)
```

If `api_key` is omitted, the TinyFish SDK reads the `TINYFISH_API_KEY` environment variable automatically.

## Tools

| Tool | Description |
| :--- | :--- |
| `tinyfish_search` | Search the web and return ranked results with position, site name, title, snippet, and URL |
| `tinyfish_fetch` | Fetch and extract clean content from up to 10 URLs, with per-URL errors returned separately |

## Shared defaults

Constructor defaults are applied to the toolkit's default tools:

```python
toolkit = TinyFishSearchToolkit(
    api_key=...,
    location="US",       # default location for tinyfish_search
    language="en",       # default language for tinyfish_search
    format="markdown",   # default output format for tinyfish_fetch
    links=True,          # include page links in fetch results
    image_links=False,   # include image links in fetch results
)
```

## Picking a subset of tools

Each tool is exposed as a factory method on the toolkit (`toolkit.search()`, `toolkit.fetch()`). Call the method to get a ready-to-use tool, then pass only the ones you need to the agent:

```python
toolkit = TinyFishSearchToolkit(api_key=...)

agent = Agent(
    "researcher",
    config=config,
    tools=[toolkit.search(location="US", language="en")],
)
```

## Search configuration

`tinyfish_search` accepts a required `query` at execution time. Configure the optional location and language defaults on `toolkit.search()`:

```python
toolkit = TinyFishSearchToolkit(api_key=...)

search_tool = toolkit.search(
    location="US",
    language="en",
)

agent = Agent("researcher", config=config, tools=[search_tool])
```

## Fetch configuration

`tinyfish_fetch` accepts a list of URLs at execution time. Configure the output format and extracted link fields on `toolkit.fetch()`:

```python
toolkit = TinyFishSearchToolkit(api_key=...)

fetch_tool = toolkit.fetch(
    format="markdown",  # "markdown" | "html" | "json"
    links=True,
    image_links=True,
)

agent = Agent("researcher", config=config, tools=[fetch_tool])
```

All configurable defaults on `TinyFishSearchToolkit`, `toolkit.search()`, and `toolkit.fetch()` accept a `Variable` for deferred context resolution.
`tinyfish_fetch` accepts only `http` and `https` URLs.

## Result

`tinyfish_search` returns a `TinyFishSearchResponse`:

| Field | Description |
| :--- | :--- |
| `query` | The search query TinyFish executed |
| `results` | List of `TinyFishSearchResult` (`position`, `site_name`, `title`, `snippet`, `url`) |
| `total_results` | Number of results returned |

`tinyfish_fetch` returns a `TinyFishFetchResponse`:

| Field | Description |
| :--- | :--- |
| `results` | List of successfully fetched pages with metadata, extracted `text`, links, and image links |
| `errors` | List of per-URL fetch failures (`url`, `error`) |

!!! tip
    Use `tinyfish_search` first to discover candidate pages, then `tinyfish_fetch` to read the pages that look relevant. TinyFish Search and Fetch are separate from the goal-directed TinyFish Agent API.

---

# Conversation Variables

Source: https://docs.ag2.ai/latest/docs/beta/context/variables/

# Conversation Variables

Variables provide a flexible way to store, share, and inject contextual information between agents and tools. Unlike hardcoded configurations, variables allow you to dynamically pass state-like API keys, session data, or user preferences-during runtime without exposing them to the underlying LLM.

## Passing Variables to a Conversation

You can pass variables at different levels depending on how long they need to persist and which parts of the system need access to them.

### Agent Variables

When you want a variable to be available across all interactions with a specific agent, you can initialize the agent with default variables.

```python
from autogen.beta import Agent

# This variable will be available in all tool calls executed by this agent
agent = Agent(
    name="WeatherBot",
    variables={"api_key": "your_global_api_key"}
)
```

### Conversation Variables

If a variable is only relevant for a specific conversation, you can pass it directly when calling the `ask` method.

```python
await agent.ask(
    "What is the weather?",
    variables={"session_id": "12345"},
)
```

### Mixed Variables

When both agent-level and call-level variables are provided, they are merged. If there is a key collision, the variables provided during the `ask` call will override the agent's default variables.

```python
agent = Agent(
    name="Bot",
    variables={"global_param": "A", "override_me": "AgentLevel"}
)

# Inside the tool, variables will be:
# {"global_param": "A", "override_me": "CallLevel", "call_param": "B"}
await agent.ask(
    "Hello!",
    variables={"override_me": "CallLevel", "call_param": "B"}
)
```

## Context Variables Access

The most straightforward way to access variables inside a tool is by requesting the `Context` object. By adding an argument annotated with `Context`, the framework will automatically inject the current execution context, which contains the `.variables` dictionary.

```python
from autogen.beta import Context, tool

@tool
def process_data(context: Context) -> str:
    # Access variables directly from the context dictionary
    api_key = context.variables.get("api_key")
    session = context.variables.get("session_id")

    return f"Processed using key: {api_key}"
```

## Special Variable Access

Instead of passing the entire `Context` object, you can instruct the framework to inject specific variables directly into your tool's arguments using the `Variable` annotation. This makes your tool's signature cleaner and more explicit.

```python
from typing import Annotated
from autogen.beta import Variable, tool

@tool
def fetch_user_data(
    user_id: str,
    # The framework automatically looks for a variable named "api_key"
    # and injects its value here.
    api_key: Annotated[str, Variable()],
) -> str:
    return f"Fetching {user_id} using {api_key}"
```

If the name of the argument in your function doesn't match the key in the variables dictionary, you can specify the key explicitly:

```python
@tool
def fetch_user_data(
    user_id: str,
    # Looks for "api_key" in variables, but binds it to the "key" argument
    key: Annotated[str, Variable("api_key")],
) -> str:
    return f"Fetching {user_id} using {key}"
```

!!! note
    Important remark: these annotations have no effect on the tool schema for the LLM.
    They are only used for the framework to inject the value into the function.

## Variables with Default Values

Sometimes a variable might not be provided by the user. You can define fallback behaviors directly within the `Variable` annotation using either `default` or `default_factory`.

### Static Defaults

Use `default` for simple, immutable fallback values.

```python
@tool
def get_settings(
    # If "theme" is not provided in the variables, it defaults to "dark"
    theme: Annotated[str, Variable(default="dark")],
) -> str:
    return f"Using theme: {theme}"
```

### Dynamic Defaults

For mutable objects (like lists or dictionaries) or values that need to be computed at runtime, use `default_factory`. The framework ensures that the factory function is only called once per execution context, preserving state across multiple tool calls in the same turn.

```python
def create_default_state() -> dict:
    return {"status": "init"}

@tool
def update_status(
    state: Annotated[
        dict[str, str],
        # Uses the dynamically created dictionary if "state" wasn't provided
        Variable(default_factory=create_default_state),
    ],
) -> str:
    state["status"] = "running"
    return "Status updated"
```

## Updating Variables in Tools

Variables are mutable. If a tool updates the `context.variables` dictionary, that change is preserved in the context and will be available to subsequent tool calls within the same conversation.

This is highly useful for sharing state or caching results between different tools without requiring the LLM to pass the data back and forth.

```python
@tool
def authenticate(context: Context) -> str:
    # Generate and store a token in the context variables
    context.variables["auth_token"] = "abc-123"
    return "Successfully authenticated."

@tool
def fetch_secure_data(
    auth_token: Annotated[str | None, Variable(default=None)],
) -> str:
    if not auth_token:
        return "Error: Not authenticated."
    return f"Data fetched with token {auth_token}"
```

---

# Dependency Injection

Source: https://docs.ag2.ai/latest/docs/beta/context/inject/

# Dependency Injection

## What is Dependency Injection?

Dependency Injection (DI) is a design pattern used to pass complex objects or services into your tools at runtime, rather than hardcoding them or re-instantiating them repeatedly. This keeps your tools pure, testable, and completely decoupled from external resource management.

## Dependencies

The key difference between [Variables](../variables){.internal-link} and Dependencies is their intended use case. Variables are designed to pass lightweight, serializable state (like strings, flags, or configuration IDs) between the LLM, tools, and agents. Dependencies, on the other hand, are meant for complex objects that have specific behaviors and lifecycle management, such as database connections, HTTP sessions, or clients for external APIs.

### Agent Dependencies

If a dependency is required across all conversations for a specific agent, you can provide it directly when initializing the agent.

```python
from autogen.beta import Agent
import aiohttp

# The session object is available to all tool calls made by this agent
agent = Agent(
    name="WebScraper",
    dependencies={"http_session": aiohttp.ClientSession()}
)
```

### Conversation Dependencies

If a dependency is only relevant for a single conversation, you can inject it dynamically when calling the `ask` method.

```python
await agent.ask(
    "Query the user database",
    # The db_connection is injected only for this specific conversation
    dependencies={"db": create_db_connection()}
)
```

### Mixed Dependencies

When you provide both agent-level and conversation-level dependencies, the framework automatically merges them. If there is a key collision, the dependencies provided during the `ask` call take precedence and override the agent's default dependencies.

```python
agent = Agent(
    name="DataAgent",
    dependencies={"default_db": db_1, "active_db": db_1}
)

# During this call, "active_db" is overridden by db_2
await agent.ask(
    "Check the backup database",
    dependencies={"active_db": db_2}
)
```

## Context Dependency Access

The simplest way to access your dependencies inside a tool is through the `Context` object. By adding an argument annotated with `Context`, the framework injects the current execution context, which includes the `.dependencies` dictionary.

```python
from autogen.beta import Context, tool

@tool
def query_database(query: str, context: Context) -> str:
    # Access the complex dependency directly from the context
    db = context.dependencies.get("db")

    result = db.execute(query)
    return f"Result: {result}"
```

## Accessing Dependencies with Inject

Instead of passing the entire context object, you can explicitly request specific dependencies directly in your tool's function signature. This approach clarifies your tool's requirements and automatically handles validation.

Use the `Inject` annotation to pull a dependency from the context dictionary by its key. By default, `Inject` looks for a key that matches the argument's name.

```python
from typing import Annotated
from autogen.beta import Inject, tool

@tool
def fetch_data(
    url: str,
    # Automatically looks for "http_session" in the dependencies dictionary
    http_session: Annotated[object, Inject()]
) -> str:
    pass
```

If your argument name differs from the dependency key, you can provide the exact key explicitly:

```python
@tool
def fetch_data(
    url: str,
    # Looks for "http_session", but assigns it to the "session" argument
    session: Annotated[object, Inject("http_session")]
) -> str:
    pass
```

!!! note "LLM Tool Schema"
    Dependency injection annotations (like `Inject` and `Depends`) do not affect the tool schema provided to the LLM. They are purely an internal framework mechanism for injecting dependencies into your functions.

## Dependencies with Default Values

Sometimes a dependency might not be provided by the user. You can define fallback behaviors directly within the `Inject` annotation using either `default` or `default_factory`. Without a default, a missing dependency will raise a `ValidationError`.

### Static Defaults

Use `default` for simple, immutable fallback values.

```python
@tool
def process_data(
    # If "client" is not provided in dependencies, it defaults to None
    client: Annotated[object | None, Inject(default=None)]
) -> str:
    if client is None:
        return "No client provided."
    return "Processing..."
```

### Dynamic Defaults

For mutable objects or dependencies that need to be instantiated at runtime, use `default_factory`. The framework ensures that the factory function is called when the dependency is missing.

```python
def create_default_client() -> object:
    return DefaultDatabaseClient()

@tool
def update_record(
    # Uses the dynamically created client if "db" wasn't provided
    db: Annotated[object, Inject(default_factory=create_default_client)]
) -> str:
    db.save()
    return "Record updated"
```

---

# Human in the Loop

Source: https://docs.ag2.ai/latest/docs/beta/context/human_in_the_loop/

# Human in the Loop

Agents often need guidance or approval from human users to proceed safely and effectively. The Human-in-the-Loop (**HITL**) feature allows an agent to temporarily pause its execution and wait for human input before continuing.

## What is Human in the Loop? Why do we need it?

Human-in-the-Loop (**HITL**) is a pattern where a human user is integrated into the decision-making process of an autonomous system. In the context of the agent framework, it provides a structured way to ask for human confirmation or additional information during the execution of a tool.

We need HITL for several reasons:

- **Safety**: To prevent the agent from performing irreversible or harmful actions (e.g., executing arbitrary code, dropping a database table, or making financial transactions).
- **Quality Assurance**: To allow humans to review and approve generated artifacts (e.g., code, emails, or reports) before they are finalized.
- **Handling Ambiguity**: To provide the agent with additional context or clarification when it encounters a situation it cannot resolve autonomously.

### Typical Use Cases

- **Execution Approval**: Asking the user "Are you sure you want to execute this shell command?" before proceeding.
- **Requesting Missing Information**: Prompting the user for a password, an API key, or a specific piece of data required by a tool.
- **Content Review**: Displaying an AI-generated draft to the user and asking for approval or edit suggestions.

## HITL Usage from Context

You can request human input directly from within a tool using the `Context` object. The `ctx.input()` method pauses the tool's execution until the user provides a response.

Here is an example of a tool that asks for human confirmation:

```python
from autogen.beta import Context, Agent, tool

@tool
async def execute_query(context: Context) -> str:
    # Pause and ask for human input
    user_response = await context.input(
        "Are you sure you want to run this query? (yes/no)",
        timeout=60.0
    )

    if user_response.strip().lower() != "yes":
        return "Query cancelled."

    return "Query executed successfully."

agent = Agent(
    name="my_agent",
    tools=[execute_query]
)
```

!!! warning
    If the required human input is not provided (for example, if a HITL hook is not registered on the agent), the framework raises a `HumanInputNotProvidedError`.

## HITL Registration

To handle the input requests made by `context.input()`, you must register a **HITL hook** on your `Agent`.

A **HITL** hook is a callback function that consumes a `HumanInputRequest` event and returns a `HumanMessage`. Like tools, HITL hooks support all context features, including dependency injection and variables.

### By Argument

You can register a HITL hook by passing it as the `hitl_hook` argument when initializing the `Agent`.

```python
from autogen.beta import Agent
from autogen.beta.events import HumanInputRequest, HumanMessage

def my_hitl_hook(event: HumanInputRequest) -> HumanMessage:
    # event.content contains the prompt passed to context.input()
    print(f"Agent asks: {event.content}")

    # Collect input from the user (e.g., via standard input)
    user_input = input("Your answer: ")

    return HumanMessage(content=user_input)

agent = Agent(
    name="my_agent",
    hitl_hook=my_hitl_hook,
)
```

!!! note "Sync / Async"
    Your HITL hook can be defined as either a synchronous (`def`) or asynchronous (`async def`) function. The framework handles both seamlessly.

### By Decorator

Alternatively, you can register or override a HITL hook using the `@my_agent.hitl_hook` decorator after the agent has been created.

```python
from autogen.beta import Agent
from autogen.beta.events import HumanInputRequest, HumanMessage

agent = Agent(name="my_agent", tools=[execute_query])

@agent.hitl_hook
async def async_hitl_hook(event: HumanInputRequest) -> HumanMessage:
    # An asynchronous hook example
    print(f"Prompt: {event.content}")

    # A hypothetical async UI function
    user_input = await get_input_from_ui()

    return HumanMessage(content=user_input)
```

!!! warning "Overriding Hooks"
    If a HITL hook is already set (for instance, via the constructor argument) and you apply the `@my_agent.hitl_hook` decorator, the decorator will override the existing one.

---

# Evaluation

Source: https://docs.ag2.ai/latest/docs/beta/evaluation/evaluation/

Evaluation is how you measure whether your agent actually works. The framework runs your agent over a dataset of tasks, scores each run on multiple properties, and gives you aggregate metrics you can track over time.

One idea ties the whole framework together: a scorer grades the **recorded trace** of a run - the typed log of what the agent did. That's why the same scorers work whether you produce the trace by running the agent here, or grade a trace captured from production - and why runs persist as data you can compare over time.

## Testing vs evaluation

These sound similar but aren't.

- **Testing asserts correctness.** A unit test passes or fails. Either the function returned the right value or it didn't.
- **Evaluation measures performance.** Most agent outputs don't have a single right answer, so you grade them on many properties - did the agent call the right tool, did the final answer mention the right thing, did it stay under a token budget - and watch how the *aggregate* moves between releases.

The framework supports both. You can wrap eval metrics in pytest assertions ("pass rate must be ≥ 0.95"), but the underlying mechanism - scorers producing structured feedback that rolls up into aggregates - is built for measurement, not assertion.

## What you'll write

Every eval suite has four pieces. That's it.

1. A **dataset** - a `Suite` of tasks with inputs and (optional) expected outputs.
2. An **agent** - an `Agent` instance.
3. One or more **scorers** - functions that grade a run.
4. A call to **run_agent** - the framework's entry point.

The rest of these docs cover each piece in depth. Read the quick-start below first.

!!! note "Two ways in"
    `run_agent` runs your agent and grades it. If the traces already exist - captured from production, or produced by another tool - grade them directly with `evaluate_traces`, no agent run required. It reads both OpenTelemetry GenAI-semconv and OpenInference spans. Same scorers either way; see [Runs](runs.md).

## Quick start

The simplest plausible eval. Two tasks, one custom scorer, one prebuilt.

```python
import asyncio
from pathlib import Path

from autogen.beta import Agent, tool
from autogen.beta.config import GeminiConfig
from autogen.beta.events import ToolCallEvent
from autogen.beta.eval import Suite, run_agent, scorer
from autogen.beta.eval.scorers import tool_called

# 1. Dataset - inline tasks (or Suite.from_jsonl("tasks.jsonl"))
dataset = Suite.from_list([
    {"task_id": "t1", "inputs": {"input": "What's the weather in Tokyo?"},
     "reference_outputs": {"city": "Tokyo"{{ "}}" }},
    {"task_id": "t2", "inputs": {"input": "Weather in Paris?"},
     "reference_outputs": {"city": "Paris"{{ "}}" }},
])

# 2. Agent - one instance, reused across tasks
@tool
async def get_weather(city: str) -> str:
    return f"Sunny, 72F in {city}"

agent = Agent(
    "weather",
    config=GeminiConfig(model="gemini-3-flash-preview"),
    tools=[get_weather],
)

# 3. Scorer - a plain function with @scorer
@scorer
def called_get_weather(trace) -> bool:
    return len(trace.events_of(ToolCallEvent, name="get_weather")) == 1

# 4. Run
async def main():
    result = await run_agent(
        dataset,
        agent=agent,
        scorers=[
            called_get_weather,
            tool_called("get_weather"),  # prebuilt
        ],
        store_dir=Path("./runs"),
    )
    print(result.summary())

asyncio.run(main())
```

Output is a printed summary table plus a JSON file under `./runs/`.

```
Run abc123def
  Suite:       inline (2 tasks, source: inline)
  Runs:        2
  Duration:    3120ms
  Tokens:      input=423 output=78 total=501

Pass rates:
  called_get_weather        100.0% (2/2)
  tool_called[get_weather]  100.0% (2/2)
```

That's the whole framework in 30 lines. Everything else in these docs is variations on this shape.

!!! tip
    New to evals and this moved fast? The [Get started](getting-started.md) tutorial builds the same thing up one concept at a time.

## Determinism for CI

You don't want your eval suite to depend on a live LLM in pre-merge CI - too slow, too expensive, too flaky. The framework uses [TestConfig](../testing.md) cassettes to mock the model deterministically:

```python
from autogen.beta.testing import TestConfig

cassettes = {
    "t1": TestConfig(ToolCallEvent(name="get_weather", arguments='{"city":"Tokyo"}'), "Tokyo is sunny."),
    "t2": TestConfig(ToolCallEvent(name="get_weather", arguments='{"city":"Paris"}'), "Paris is sunny."),
}

result = await run_agent(
    dataset,
    agent=agent,
    scorers=[called_get_weather],
    model_config=cassettes,  # dict keyed by task_id
    store_dir=Path("./runs"),
)
```

Same suite, same scorers, no API key required. Output is identical on every machine.

## What you get back

Every `run_agent()` returns a `RunResult` and writes a JSON file to `store_dir`. The result exposes:

- `result.summary()` - a printable text table (above)
- `result.pass_rate("scorer_name")` - float between 0 and 1 for boolean scorers
- `result.score_stats("scorer_name")` - `mean / p50 / p95 / n` for numeric scorers
- `result.value_counts("scorer_name")` - `{label: count}` for categorical scorers
- `result.pass_rate("scorer_name", tag="hard")` - any accessor takes `tag=` to slice to one segment (`result.tags` lists them)
- `result.aggregates` - everything together
- `result.tasks` - per-task records with full `Trace`, feedback list, and budget status
- `result.diff(load_run("runs/old.json"))` - compare against a prior run; `.regressions` gates CI (`assert not diff.regressions`)

The JSON file is the persistence format. It's schema-versioned (`"0.1"`) and is what a future hosted dashboard will render. See [Runs](runs.md) for the full shape.

## Where to next

- **[Get started](getting-started.md)** - a paced, step-by-step first eval if this quick-start moved too fast.
- **[Scorers](scorers.md)** - how to write them, the six prebuilts (including `agent_judge` and `failure_attribution`), return-shape rules, exception handling.
- **[Runs](runs.md)** - `Suite`, the `run_agent()` signature, `RunResult` deep dive, `evaluate_traces` for existing traces, `repeats=`, and `stream=` to observe a run live.
- **[Variants](variants.md)** & **[Pairwise](pairwise.md)** - compare builds on a leaderboard, or head-to-head.
- **[Persistence & tracking](persistence.md)** - the run JSON, comparing runs for regressions, and tracking an agent across iterations.

## What's in scope

v0 ships **offline evaluation** (run a curated dataset with `run_agent`) **and grading of existing traces** (`evaluate_traces` over a directory or Grafana Tempo) - so you can already grade captured production traces with the same scorers. Still on the roadmap: *continuous* online evaluation and a hosted dashboard.

---

# Get started

Source: https://docs.ag2.ai/latest/docs/beta/evaluation/getting-started/

A hands-on walk-through: by the end you'll have run a real evaluation and understood every line. We build it up one concept at a time - no prior eval experience assumed.

## What an evaluation is

An evaluation is a repeatable test for an agent. You give it a suite of **tasks**, let your agent answer each one, then run **scorers** that grade the answers. The result is a scorecard you can track as your agent changes.

It's like unit testing, with one difference: most agent answers don't have a single right value, so instead of one pass/fail you grade several properties - did it call the right tool? is the answer correct? did it stay under budget? - and watch how the aggregate moves between versions.

Four pieces, that's all: a **dataset**, your **agent**, one or more **scorers**, and a call to **`run_agent`**.

## Step 1 - your first run

Start as small as possible: two questions, an agent, one check. This calls a real model, so set a key first (`export OPENAI_API_KEY=...`); Step 5 below shows how to run with none.

!!! note "Install the `tracing` extra"
    `run_agent` produces each task's trace via OpenTelemetry, so it requires the tracing extra: `pip install "ag2[tracing]"`. (Grading already-captured traces with `evaluate_traces` does not need the extra.)

```python
import asyncio

from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig
from autogen.beta.eval import Suite, run_agent
from autogen.beta.eval.scorers import final_answer_matches

# 1. the dataset - a couple of tasks, each with the expected answer
suite = Suite.from_list([
    {"task_id": "france", "inputs": {"input": "Capital of France?"}, "reference_outputs": {"answer": "Paris"{{ "}}" }},
    {"task_id": "japan", "inputs": {"input": "Capital of Japan?"}, "reference_outputs": {"answer": "Tokyo"{{ "}}" }},
])

# 2. the agent under test
agent = Agent("geographer", prompt="Answer with the capital city.", config=OpenAIConfig(model="gpt-4o-mini"))

async def main():
    # 3 + 4. score each answer against its expected value, and run
    result = await run_agent(
        suite,
        agent=agent,
        scorers=[final_answer_matches(field="answer", matcher="contains")],
        store_dir="./runs",
    )
    print(result.summary())

asyncio.run(main())
```

Run it and you get a scorecard:

```
Run a1b2c3d4
  Suite:       inline (2 tasks, source: inline)
  Runs:        2
Pass rates:
  final_answer_matches  100.0% (2/2)
```

That's a complete evaluation. The next steps unpack each piece.

## Step 2 - read the result

`run_agent` returns a `RunResult` and saves a JSON file under `store_dir`. A few accessors:

```python
result.summary()                            # the printable table above
result.pass_rate("final_answer_matches")    # 1.0 - the fraction of tasks that passed
```

Every scorer becomes a column, looked up by its **key** (here `"final_answer_matches"`). Boolean scorers give a pass-rate; numeric ones give `score_stats` (mean / p50 / p95); categorical ones give `value_counts`.

## Step 3 - the expected answer (`reference_outputs`)

Notice each task carried a `reference_outputs` - the **gold answer**, the thing you grade against:

```python
{"task_id": "france", "inputs": {"input": "Capital of France?"}, "reference_outputs": {"answer": "Paris"{{ "}}" }}
```

It's a small labelled record, not a bare string, so a scorer can pick out the field it cares about: `final_answer_matches(field="answer")` reads `reference_outputs["answer"]` and compares it to the agent's answer. `inputs` is what goes *in* (the prompt lives under `"input"`); `reference_outputs` is what *should* come out. Tasks graded purely from the trace ("did it call the tool?") don't need one.

!!! note "Matchers"
    `matcher="contains"` passes if the gold value appears anywhere in the answer - right for free-text replies like "The capital is Paris." Use `"casefold"` or `"exact"` when the answer should be exactly the value.

## Step 4 - ask more questions (scorers)

One check is rarely enough. A scorer is just a function that asks **one** question about a run. Mix the prebuilt ones with your own - a function decorated with `@scorer` that declares what it needs by name (`trace`, `outputs`, `reference_outputs`, ...; here just `outputs`, the final answer):

```python
from autogen.beta.eval import scorer
from autogen.beta.eval.scorers import no_tool_errors, token_budget

@scorer
def answered_briefly(outputs) -> bool:
    return len(outputs["body"]) < 100      # outputs["body"] is the final answer text

scorers = [
    final_answer_matches(field="answer", matcher="contains"),
    no_tool_errors(),
    token_budget(2_000),
    answered_briefly,
]
```

Keep each scorer to a single, specific question - three small checks tell you *what* broke when one fails; one big check just says "something's wrong." See [Scorers](scorers.md) for the prebuilts (including the `agent_judge` LLM scorer) and the return-type rules.

## Step 5 - run it with no API key (for CI)

You don't want pre-merge CI calling a live model - slow, costly, flaky. Swap the model for a `TestConfig` cassette: a canned reply per task, so the run is deterministic and free.

```python
from autogen.beta.testing import TestConfig

# Same agent as Step 1 - a canned reply per task_id
canned = {"france": TestConfig("Paris"), "japan": TestConfig("Tokyo")}

result = await run_agent(suite, agent=agent, scorers=scorers, model_config=canned, store_dir="./runs")
```

One thing changed: `model_config` supplies the canned replies keyed by `task_id`. It's passed through to `ask`, which **overrides** the agent's own config for that task - so the same `agent` from Step 1 runs deterministically with no API key, no separate build needed. Same suite, same scorers, identical output on every machine - wrap `result.pass_rate(...)` in a `pytest` assertion and you have a CI gate.

## Where to next

You've seen the whole loop. From here:

- **[Scorers](scorers.md)** - the six prebuilts, writing custom scorers, and `agent_judge` for grading subjective quality.
- **[Runs](runs.md)** - datasets and the `RunResult` in depth, grading existing traces (`evaluate_traces`), and observing a run live.
- **[Variants](variants.md)** & **[Pairwise](pairwise.md)** - compare builds on a leaderboard, or head-to-head.
- **[Persistence & tracking](persistence.md)** - save runs and compare them over time to catch regressions and confirm improvements.

---

# Scorers

Source: https://docs.ag2.ai/latest/docs/beta/evaluation/scorers/

A scorer is a function that grades one agent run and produces a structured feedback record. Run it across N tasks and you get N feedback records - which the framework aggregates into pass rates, score distributions, and label counts.

## Writing a scorer

Decorate a function with `@scorer`. The framework calls it once per task; the return value becomes feedback.

```python
from autogen.beta.events import ToolCallEvent
from autogen.beta.eval import scorer

@scorer
def called_get_weather(trace) -> bool:
    return len(trace.events_of(ToolCallEvent, name="get_weather")) == 1
```

That's it. Three things to notice:

1. **It takes only what it needs.** The decorator inspects the signature and injects only the parameters you declared.
2. **It returns a `bool`.** The framework turns that into a pass rate.
3. **It's pure.** No I/O, no global state. Scorers run concurrently across tasks; impure scorers race.

## What you can ask for

A scorer can declare any subset of these five parameters, by name:

| Parameter | Type | What it is |
|---|---|---|
| `inputs` | `dict[str, Any]` | The task's input payload - typically `{"input": "the user's prompt"}`. |
| `outputs` | `dict[str, Any]` | The agent's final answer projected from the trace, mirroring the reply API: `{"body": <final text>, "content": <typed answer>}`. `body` is the text (like `reply.body`); `content` is the parsed value when the answer is JSON - e.g. a `response_schema` agent - otherwise the text (like `await reply.content()`). Read structured fields via `outputs["content"]["answer"]`. |
| `reference_outputs` | `dict[str, Any] \| None` | The task's expected output, if the dataset provided one. |
| `trace` | `Trace` | The typed events (model responses, tool calls / results, ...) plus tokens, duration, and exception. |
| `task` | `Task` | The task record (id, tags, metadata). |

Declare only what you use:

```python
@scorer
def called_get_weather(trace) -> bool: ...                 # reference-free

@scorer
def city_argument_correct(trace, reference_outputs) -> bool: ...  # reference-based

@scorer
def answer_mentions_city(outputs, reference_outputs) -> bool: ... # output-only
```

## Three return shapes -> three aggregation behaviors

```python
@scorer
def called_get_weather(trace) -> bool: ...        # -> pass_rate

@scorer
def extra_tool_calls(trace) -> int: ...           # -> score_stats (mean / p50 / p95)

@scorer
def termination_reason(trace) -> str: ...         # -> value_counts
```

The framework routes by return type:

- **`bool`** lands in `result.pass_rate("scorer_name")` as `passes / total`.
- **`int` / `float`** lands in `result.score_stats("scorer_name")` as `ScoreStats(mean, p50, p95, n)`.
- **`str`** lands in `result.value_counts("scorer_name")` as `{label: count}`. Useful for slicing - "of 100 runs, 95 completed, 5 errored".
- **`None`** is treated as "skip - no feedback recorded for this task".
- A `Feedback` instance or `list[Feedback]` lets you set the key explicitly, attach a comment, or emit multiple records from one call.

!!! note
    `bool` is a subclass of `int` in Python, so the framework checks `isinstance(value, bool)` first - `True` always becomes a pass-rate feedback, never a numeric one. On the flip side, returning `1` / `0` (an `int`) is a *numeric* score and lands in `score_stats`, **not** `pass_rate` - return `True` / `False` for pass/fail.

## Reference-based vs reference-free

A load-bearing distinction:

- **Reference-based** scorers compare what happened to what *should have* happened. They need `reference_outputs` from a labelled dataset - so they can't grade arbitrary production traffic that has no gold answer.
- **Reference-free** scorers judge from the trace alone - so the same code grades `run_agent` traces, stored traces, and live production traces (via `evaluate_traces`).

Write scorers reference-free whenever you can - the same code then runs everywhere.

!!! note "Guard reference-based scorers against missing references"
    A task without `reference_outputs` injects `reference_outputs=None`. Guard for it - `if reference_outputs is None: return None` to skip the task, or `return False` to count it as a fail. An unguarded `reference_outputs["city"]` raises on `None`, which the framework records as `score=None` (the run continues, but the task neither passes nor counts).

## One question per scorer

Don't bundle. A god-scorer like

```python
@scorer
def everything_is_fine(trace, outputs, reference_outputs) -> bool:
    return (
        len(trace.events_of(ToolCallEvent, name="get_weather")) == 1
        and reference_outputs["city"] in (outputs.get("body") or "")
        and trace.tokens.total < 2000
    )
```

...tells you nothing when it fails. Three scorers, one each, give you three signals you can trace.

```python
@scorer
def called_get_weather_once(trace) -> bool:
    return len(trace.events_of(ToolCallEvent, name="get_weather")) == 1

@scorer
def answer_mentions_city(outputs, reference_outputs) -> bool:
    return reference_outputs["city"] in (outputs.get("body") or "")

@scorer
def under_token_budget(trace) -> bool:
    return trace.tokens.total < 2000
```

## Prebuilt scorers

Six ship under `autogen.beta.eval.scorers`. Four are simple, deterministic checks (below); two richer ones - `agent_judge` and `failure_attribution` - get their own sections after.

| Scorer | Question | Type | When to use |
|---|---|---|---|
| `tool_called(name, *, exactly=None)` | Did the agent call this tool? | bool | Most tool-use scenarios. `exactly=N` for strict count. |
| `no_tool_errors()` | Were there zero `ToolErrorEvent`s? | bool | Catch tools that exploded. |
| `final_answer_matches(field, matcher)` | Does the answer match `reference_outputs[field]`? | bool | Closed-form correctness. Matcher: `"exact"`, `"casefold"`, `"contains"`. |
| `token_budget(max_tokens)` | Did the run stay under `max_tokens` total? | bool | Cost discipline as a pass/fail signal. |

Each is a *factory* - calling it returns a `Scorer`. Drop them straight into the `scorers=` list:

```python
from autogen.beta.eval.scorers import (
    final_answer_matches,
    no_tool_errors,
    token_budget,
    tool_called,
)

scorers = [
    tool_called("get_weather"),
    no_tool_errors(),
    final_answer_matches(field="city", matcher="contains"),
    token_budget(2_000),
]
```

!!! tip
    Two distinct `tool_called(...)` calls produce distinct keys (`tool_called[get_weather]` vs `tool_called[get_news]`), so multiple instances coexist in one run.

## Agent-as-a-judge - `agent_judge`

Some properties can't be checked with `==`: *is the answer helpful? well-reasoned? on-brand?* `agent_judge` hands the answer plus a criterion you write to a judge model, which returns a numeric score. This is a more capable take on a LLM-as-a-judge evaluation.

```python
from autogen.beta.config import OpenAIConfig
from autogen.beta.eval.scorers import agent_judge

scorers = [
    agent_judge(OpenAIConfig(model="gpt-4o-mini"), criterion="The answer resolves the user's request.", key="helpfulness"),
    agent_judge(OpenAIConfig(model="gpt-4o-mini"), criterion="The answer is concise.", key="conciseness"),
]
```

One judge = one criterion = one column. The score lands in `result.score_stats(key)`, so a *list* of judges is a multi-dimension **scorecard** - each criterion scored and aggregated independently. The numeric range defaults to `(0.0, 1.0)` and is **enforced** (out-of-range scores are clamped); pass `scale=(1, 5)` for a Likert range. The judge is an ordinary `Agent`, so it can be made deterministic using a `TestConfig` in CI.

By default the judge sees the task's gold answer (rendered as a `## Reference` section whenever `reference_outputs` is present) - correct for a *correctness* judge, where matching the reference is the point. But for dimensions that must grade the answer **on its own** - *faithfulness*, *grounding* - leaking the gold answer lets the judge reward an answer for matching it rather than for being grounded in the agent's tool results. Pass `include_reference=False` to withhold the reference from those judges:

```python
scorers = [
    agent_judge(config, criterion="The answer matches the reference.", key="correctness"),
    agent_judge(config, criterion="Every claim is grounded in the tool results.",
                key="faithfulness", include_reference=False),
]
```

## Pass / fail thresholds - `threshold`

A judge returns a *number*, but for automation you often want a hard verdict - *reject anything below 0.7*. Pass `threshold=` to gate the judge: its column then lands in `result.pass_rate(key)` (pass iff `score >= threshold`) instead of `score_stats`, and the raw number is recorded in the feedback's `detail`.

```python
agent_judge(OpenAIConfig(model="gpt-4o-mini"), criterion="The answer is helpful.", key="helpfulness", threshold=0.7)
```

`threshold=0.7` is shorthand for wrapping the judge in the generic `threshold(...)` combinator, which gates **any** numeric scorer (a judge, or your own `@scorer`) into a Pass/Fail - `scorer -> scorer`:

```python
from autogen.beta.eval.scorers import agent_judge, threshold

quality = agent_judge(config, criterion="...", key="quality")   # numeric scorer
gate = threshold(quality, at_least=0.7)                          # -> Pass/Fail scorer
```

A gated criterion emits one `Feedback`: `score` is the boolean (so it feeds `pass_rate` and shows up in `result.diff(baseline).regressions` when it flips), and the number + bounds are kept in `detail`. A scorer that produces no numeric grade - a judge with no verdict, or one that raised - counts as a **fail**. Bounds are inclusive; use `at_most` (or both) for "lower is better" metrics. In CI: `assert result.pass_rate("helpfulness") == 1.0`.

!!! note
    Gating is opt-in: without `threshold=`, `agent_judge` keeps its numeric `score_stats` behavior. For a per-task token/time *resource* gate (a different axis), see [`BudgetThresholds`](runs.md).

## Failure attribution - `failure_attribution`

When a task fails, *why?* `failure_attribution` labels each run with a failure mode (a `str`, so it rolls up in `result.value_counts(key)`):

```python
from autogen.beta.eval.scorers import failure_attribution

scorers = [failure_attribution(key="failure_mode")]
# result.value_counts("failure_mode")  ->  {"none": 41, "tool_failure": 6, "crash": 3}
```

Out of the box it runs **deterministic detectors** (a crash/exception, no final answer, a tool error) - no model needed. Pass a `config` to add an **LLM attributor** that classifies the subtler, semantic failures the detectors can't see (wrong answer, gave up, looped):

```python
failure_attribution(OpenAIConfig(model="gpt-4o-mini"), key="failure_mode")
```

## Exception handling

A scorer that raises does NOT fail the run. The framework catches the exception, records a `Feedback` with `score=None` and a comment explaining what blew up, logs a warning, and moves on:

```python
@scorer
def fragile(outputs) -> bool:
    return outputs["body"].startswith("Tokyo")  # raises if there's no final answer
```

If the trace had no final answer (`outputs` has no `"body"`) this raises `KeyError`. The feedback for that task becomes:

```python
Feedback(key="fragile", score=None, comment="scorer raised: KeyError: ...")
```

This is by design - one broken scorer should never kill an eval over 100 tasks. But it also means you should treat `score=None` as a signal worth investigating, not noise.

## Tips for good scorers

- **Be specific.** "Did the agent call `get_weather`?" beats "Was the output good?"
- **Be cheap.** Scorers run for every task. No API calls inside a scorer.
- **Prefer structured fields over text.** `trace.events_of(ToolCallEvent, name="get_weather")` beats regex-matching `outputs["body"]`.
- **Don't share state.** Two scorers grade independently. No mutable globals.
- **Return `None` when you can't grade.** A scorer that doesn't apply to a task should skip rather than fail.

## Sync vs async

Either works. The framework detects async with `inspect.iscoroutinefunction` and awaits accordingly:

```python
@scorer
async def llm_judge(trace, reference_outputs) -> bool:
    # ... call a judge model ...
    return verdict
```

Most scorers consume an in-memory `Trace` and stay sync; async is there for scorers that call out - `agent_judge`, for instance, runs its judge model under the hood.

## Custom keys with `Scorer` directly

If you want a key independent of the function name, construct a `Scorer` yourself:

```python
from autogen.beta.eval import Scorer, Trace

def _check(trace: Trace) -> bool:
    return len(trace.events_of(ToolCallEvent, name="get_weather")) == 1

my_scorer = Scorer(_check, key="weather-tool-used")
```

The prebuilts use this pattern internally - `tool_called("get_weather")` returns a `Scorer` with `key="tool_called[get_weather]"`.

## Where to next

- **[Runs](runs.md)** - the `Suite` API, the `run_agent()` signature, `RunResult` aggregation methods, and the persistence format.

---

# Datasets & Runs

Source: https://docs.ag2.ai/latest/docs/beta/evaluation/runs/

This page covers the core of the offline eval pipeline: building a `Suite` of tasks, calling `run_agent()`, reading the `RunResult`, and grading traces that already exist. (Saving runs and comparing them over time has its own page - [Persistence & tracking](persistence.md).)

## The trace is the foundation

Grading is a pure function of one thing: a **`Trace`** - the typed record of what the agent did on a task (model responses, tool calls, tool results, human input) plus token usage, duration, and any exception. Each of those steps is emitted as an OpenTelemetry span while the agent runs, and the framework reconstructs the `Trace` from those spans. Scorers only ever read the `Trace`.

It's the **same** reconstruction no matter where the spans came from - a `run_agent` call here, a folder of saved traces, or live production telemetry in Grafana Tempo. That's exactly what lets the framework's two halves - *producing* a trace and *grading* one - share one code path and the same scorers.

Three concrete consequences:

- **Multi-turn just works.** A `reply.ask(...)` continuation re-enters the agent loop, so its spans land in the same trace. Scorers see the whole conversation, not just the first turn.
- **Offline vs online is just where the trace came from.** Offline = run a curated dataset with `run_agent`. Online = grade traces captured from production with `evaluate_traces` (e.g. from Grafana Tempo). *Same scorers, different source.*
- **No special primitives for trajectory scoring.** Assembly-policy correctness, compaction faithfulness, sub-task tree quality - all event-pattern questions, answered by the events already in the trace.

You don't manage any of this: `run_agent` attaches a telemetry middleware, collects the spans the agent emits, and reconstructs the `Trace` for you. But the frame matters when you write a custom scorer - you're always reading a `Trace`, never a live stream.

## Datasets - the `Suite`

A `Suite` is an immutable collection of `Task` records. Build one from a JSONL file or inline.

### From JSONL (recommended)

```python
from autogen.beta.eval import Suite

suite = Suite.from_jsonl("eval/dataset.jsonl")
```

Each line in the file is a JSON object:

```json
{"task_id": "weather-001", "inputs": {"input": "What's the weather in Tokyo?"}, "reference_outputs": {"city": "Tokyo"}, "tags": ["happy-path"]}
{"task_id": "weather-002", "inputs": {"input": "Weather in Paris?"}, "reference_outputs": {"city": "Paris"{{ "}}" }}
```

Fields:

| Field | Required | What it is |
|---|---|---|
| `task_id` | optional | Stable identifier. If absent, auto-filled as `task-0000`, `task-0001`, .... |
| `inputs` | **required** | Dict containing at least `"input"` - the prompt passed to `agent.ask(...)`. |
| `reference_outputs` | optional | Expected output for reference-based scorers. |
| `tags` | optional | List of labels. Slice results by them - `result.pass_rate(key, tag="happy-path")`. |
| `metadata` | optional | Free-form dict. Surfaces in the persisted run JSON. |

!!! tip
    JSONL is the canonical format because it's `grep`-friendly, line-diffable, and HuggingFace-compatible. Blank lines are skipped; malformed lines raise with the line number for fast diagnosis.

### From a list (for quick experimentation)

```python
from autogen.beta.eval import Suite

suite = Suite.from_list([
    {"inputs": {"input": "What's the weather in Tokyo?"}, "reference_outputs": {"city": "Tokyo"{{ "}}" }},
    {"inputs": {"input": "Weather in Paris?"}, "reference_outputs": {"city": "Paris"{{ "}}" }},
])
```

Same shape as JSONL lines. Use this in notebooks or for ad-hoc suites that don't deserve a file yet.

## The agent - an `Agent` instance

`run_agent(agent=...)` takes a built **`Agent` instance**, reused across every task. Each task runs on a fresh stream, so conversation history never leaks between tasks - agents are effectively stateless across `ask` calls (history lives on the per-call stream), so reusing one instance is safe for any normal agent.

```python
from autogen.beta import Agent
from autogen.beta.config import GeminiConfig

weather_agent = Agent(
    "weather",
    prompt="You are a weather assistant. Use get_weather.",
    config=GeminiConfig(model="gemini-3-flash-preview"),
    tools=[get_weather],
)
```

Need a different model per task - a live `ModelConfig`, or a `TestConfig` cassette for deterministic CI? Pass `model_config=` to `run_agent()`; it is forwarded to `ask` and **overrides** the agent's own config for that task. One instance covers the whole suite - no per-task rebuild, no factory.

!!! note "Multi-agent flows go through `evaluate_traces`"
    `run_agent` is for an ask-shaped target - an `Agent`. Multi-agent / network / workflow flows aren't driven by a single `ask(prompt)`, so they don't go through `run_agent`. Run them however they run - they emit OpenTelemetry spans - and grade the reconstructed trace with `evaluate_traces`. Grading is decoupled from production via the `Trace`, so the same scorers work either way.

## Running

`run_agent` produces traces via OpenTelemetry, so it requires the tracing extra - `pip install "ag2[tracing]"`. Grading pre-existing traces with `evaluate_traces` does not need it.

```python
from autogen.beta.eval import BudgetThresholds, run_agent

result = await run_agent(
    suite,                                 # a Suite, or a bare str (single-prompt suite)
    agent=weather_agent,
    scorers=[...],                         # list of Scorer instances
    store_dir=Path("./runs"),              # optional - where the JSON lands (omit to skip persisting)
    model_config=cassettes,                # optional - None / ModelConfig / dict[task_id, ModelConfig]
    budgets=BudgetThresholds(              # optional - observational, doesn't abort
        max_tokens_per_task=2_000,
        max_seconds_per_task=15.0,
    ),
    concurrency=4,                         # parallel task cap
    repeats=1,                             # run each task N times (consistency); pools the pass-rate
    run_id="2026-05-11-weather-suite",     # optional - overrides the UUID4 default
    label="weather-eval",                  # optional - groups runs of the same eval over time
    stream=my_stream,                      # optional - observe lifecycle events (see "Observing a run")
    span_attributes={"ag2.org.id": "..."}, # optional - stamp extra attrs on every span (see "External telemetry")
    span_processors=[...],                 # optional - also export spans to your own backend (see "External telemetry")
)
```

### `model_config` modes

The same parameter is overloaded three ways:

| Value | Behavior |
|---|---|
| `None` (default) | The agent's own config is used. |
| A single `ModelConfig` | Same config for every task (overrides the agent's). |
| A `dict[task_id, ModelConfig]` | Per-task config. Standard pattern for cassette-based CI. |

### Repeats - consistency

`repeats=N` runs **each task N times** - the simplest way to ask "does my agent do this *consistently*?". The per-key `pass_rate` / `score_stats` pool across every run (so 8 of 10 passing shows as `80%`), and each run gets a distinct `task_id` suffix (`"weather-001#1"`, `"weather-001#2"`, ...).

```python
result = await run_agent(suite, agent=weather_agent, scorers=[...], store_dir="runs", repeats=10)
```

(At `temperature=0` consistency is near-trivial; `repeats` earns its keep when there's real nondeterminism.)

### Labels

`label="weather-eval"` stamps a **user-defined** identifier on the run, recorded at the top of the run JSON. Unlike `run_id` (unique per run), a `label` is meant to be *shared* across runs of the same eval - so a sequence of runs can be grouped and trended over time. The framework never fills it in.

### Concurrency

Tasks run in parallel up to `concurrency`, bounded by an `asyncio.Semaphore`. Default is 4. Raise it for I/O-bound suites against fast models; lower it when you're rate-limited.

### Budgets

`BudgetThresholds` records violations on each task's `budget_violation` flag but never aborts a task that goes over. The aggregate count surfaces in `result.aggregates.budget_violations` - useful as a CI regression signal ("zero tasks may exceed budget").

!!! warning
    Budgets are **observational** in v0. If you need a hard kill switch, use [observers](../advanced/observers.md) (`TokenMonitor` with `AlertPolicy(severity=FATAL)`) - the agent halts via a `HaltEvent` at runtime.

### External telemetry - exporting spans to your own backend

`run_agent` produces each task's `Trace` from OpenTelemetry spans (the same substrate `evaluate_traces` grades). Two parameters let a host platform fold those spans into its own observability stack - both default to off, so a plain run behaves exactly as before.

- **`span_attributes`** - a dict stamped on **every** span the agent emits. The run is auto-seeded with `ag2.eval.run_id`, and - when set - `ag2.eval.variant` / `ag2.eval.label`; each task additionally gets `ag2.eval.task_id`. Your own keys (e.g. `{"ag2.org.id": org_id}`) are added on top, so spans can be scoped per-org / per-run / per-task in the backend. Caller keys win on conflict.
- **`span_processors`** - extra OpenTelemetry `SpanProcessor`s attached to each task's tracer provider, **alongside** the in-memory exporter grading reads. Export is additive: the in-memory processor is never replaced, so grading output is identical whether or not you pass this. Typically a `BatchSpanProcessor(OTLPSpanExporter(...))`.

```python
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

result = await run_agent(
    suite, agent=weather_agent, scorers=[...], store_dir="runs",
    run_id="evalrun_abc", variant="v3", label="checkout-suite",
    span_attributes={"ag2.org.id": org_id},                   # stamped on every span
    span_processors=[BatchSpanProcessor(OTLPSpanExporter())],  # also export to your backend
)

for tr in result.tasks:
    save_row(task_id=tr.task.task_id, trace_id=tr.trace_ref.trace_id)  # real OTEL id -> deep-link
```

Each task's `TaskResult.trace_ref.trace_id` is the **real OpenTelemetry trace id** of that task's spans - so a stored result row deep-links straight to the trace in your backend (Grafana Tempo, Cloud Trace, ...). Multiple traces per task are possible (sub-agents); the root span's trace id is captured.

!!! warning "Flush before the process exits"
    `BatchSpanProcessor` batches spans and flushes on a timer. A short-lived eval script can exit before the last batch ships - call `processor.force_flush()` (or `shutdown()`) after `run_agent` returns when you need the spans guaranteed in the backend.

## What you get back - `RunResult`

```python
result = await run_agent(...)

# Run-level metadata
result.run_id              # str (UUID4 hex unless you set it) - unique per run
result.label               # str | None - user-defined; groups runs of the same eval over time
result.schema_version      # "0.1"
result.created_at          # ISO-8601 UTC
result.duration_ms         # int
result.suite               # the Suite that was executed

# Per-task records
result.tasks               # tuple[TaskResult, ...]
result.tasks[0].task       # Task
result.tasks[0].trace      # Trace
result.tasks[0].feedback   # tuple[Feedback, ...]
result.tasks[0].budget_violation  # bool
result.tasks[0].trace_ref  # TraceRef | None - .trace_id is the real OTEL trace id (deep-link)

# Aggregates
result.pass_rate("scorer_name")    # float - boolean scorers
result.score_stats("scorer_name")  # ScoreStats(mean, p50, p95, n) - numeric scorers
result.value_counts("scorer_name") # dict[label, count] - categorical scorers
result.pass_rate("scorer_name", tag="hard")  # any accessor takes tag= to slice to one segment
result.tags                        # frozenset[str] - the tags present, for slicing
result.aggregates                  # Aggregates - everything together
result.aggregates.tokens           # TokenUsage(input, output, cache_creation, cache_read)
result.diff(baseline)              # RunDiff vs a prior run - see the Persistence & tracking page
result.aggregates.errors           # int - tasks where the agent raised
result.aggregates.budget_violations  # int

# Human-readable
result.summary()           # printable multi-line table
result.save(path=None)     # re-save the run JSON (path defaults to store_dir/<run_id>.json)
```

### `summary()`

Returns a multi-line string suitable for a CI log:

```
Run 25be826dc1a94a4b9d50a4f94449139e
  Schema:      0.1
  Created:     2026-05-11T01:38:04.157919+00:00
  Duration:    5292ms
  Suite:       dataset (5 tasks, source: eval/dataset.jsonl)
  Runs:        5
  Concurrency: 4
  Errors:      0
  Budget violations: 0
  Tokens:      input=1544 output=174 total=1718

Pass rates:
  called_get_weather_once   100.0% (5/5)
  final_answer_matches      100.0% (5/5)
  no_tool_errors            100.0% (5/5)
  token_budget              100.0% (5/5)
  tool_called[get_weather]  100.0% (5/5)

Score stats:
  extra_tool_calls  mean=0.00 p50=0.00 p95=0.00 n=5

Value counts:
  termination_reason  completed=5
```

## Grading existing traces - `evaluate_traces`

`run_agent` is the *produce-and-grade* path: it runs your agent, then grades the trace. `evaluate_traces` is the *grade-only* path - for traces that already exist, captured elsewhere. The grading is identical; only the source differs.

```python
from autogen.beta.eval import DirectoryTraceSource, evaluate_traces

result = await evaluate_traces(
    DirectoryTraceSource("./captured-traces"),   # a folder of saved traces
    scorers=[...],
    suite=suite,            # optional - only needed for reference-based scorers (joined by task_id)
    store_dir="runs",
)
```

A **`TraceSource`** is anything that yields traces. Three ship:

| Source | Reads from |
|---|---|
| `InMemoryTraceSource` | traces you already hold in memory |
| `DirectoryTraceSource` | a folder of saved trace files (`save_trace` writes them) |
| `TempoTraceSource` | Grafana Tempo over OTLP - grade real production telemetry |

Because grading depends only on the reconstructed `Trace`, the **same scorers** work whether the trace came from `run_agent`, a directory, or production.

Reconstruction understands **two span dialects**, auto-detected per trace, so traces from a range of tools and frameworks grade unchanged:

- the [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) - `gen_ai.*` spans, what AG2's own `TelemetryMiddleware` emits, and
- [OpenInference](https://github.com/Arize-ai/openinference) - `openinference.span.kind` + `llm.*` / `tool.*`, emitted by the Arize/Phoenix instrumentors.

## Observing a run

Just like `agent.ask(stream=...)`, `run_agent`, `run_variants`, and `run_pairwise` all accept a `stream`. Pass one and the runner publishes **eval lifecycle events** to it as the run unfolds - so you observe an evaluation with the same machinery you use on an agent (`subscribe`, `where`, the watch system, persistent backends). Nothing is printed for you; you attach the observer and render however you like.

```python
from autogen.beta.stream import MemoryStream
from autogen.beta.eval import run_variants
from autogen.beta.eval.events import VariantCompleted

stream = MemoryStream()

async def on_variant(event: VariantCompleted) -> None:
    print(f"{event.variant}: {event.result.pass_rate('final_answer_matches'):.0%}")

stream.where(VariantCompleted).subscribe(on_variant)

board = await run_variants(suite, variants=variants, scorers=[...], store_dir="runs", stream=stream)
```

The events live in `autogen.beta.eval.events` and are all transient - observational only, since the durable record of a run is its persisted JSON:

| Event | Emitted by | When |
|---|---|---|
| `EvalStarted` | `run_agent` | a run begins |
| `TaskEvaluated` | `run_agent` | each task finishes (carries its `feedback`) |
| `EvalCompleted` | `run_agent` | a run finishes (carries the `RunResult`) |
| `VariantStarted` / `VariantCompleted` | `run_variants` | around each variant (carries the variant's `RunResult`) |
| `PairwiseStarted` / `PairwiseCompared` / `PairwiseCompleted` | `run_pairwise` | around the run and each head-to-head comparison |

!!! tip "Ready-made console output"
    For a quick console view without writing a callback, subscribe the built-in `console_reporter` (from `autogen.beta.eval`): `stream.subscribe(console_reporter)`. It's just an opt-in observer - no `run_agent()` flag - so swap in your own callback whenever you want different output.

## Exceptions don't abort the run

The framework catches exceptions at three levels and records them on the task instead of aborting:

| Level | What happens |
|---|---|
| `agent.ask()` raises | The exception lands on `trace.exception` (no events), and other tasks continue. |
| A scorer raises | The scorer's feedback becomes `Feedback(score=None, comment="scorer raised: ...")`. Other scorers run normally. |

The aggregate count of agent-level errors surfaces in `result.aggregates.errors`. Treat it as a CI signal - a healthy suite has zero.

## Where to next

- **[Scorers](scorers.md)** - what goes into `scorers=[...]`.
- **[Variants](variants.md)** - run several builds over the suite and rank them on a leaderboard.
- **[Pairwise](pairwise.md)** - judge two builds head-to-head.
- **[Persistence & tracking](persistence.md)** - compare runs over time, catch regressions, and track an agent across iterations.
- **[Testing](../testing.md)** - `TestConfig` cassettes for deterministic CI runs.
- **[Observers](../advanced/observers.md)** - runtime safety guards (token budgets that actually halt, loop detection).

---

# Comparing variants

Source: https://docs.ag2.ai/latest/docs/beta/evaluation/variants/

To answer "which is best?" - across models, prompts, tools, middleware, or whole builds - run the same suite under several named **variants** and rank them on a leaderboard.

## `run_variants`

A `Variants` is a mapping of name -> `Agent` instance, plus an `axis` label that records what you varied:

```python
from autogen.beta import Agent
from autogen.beta.config import GeminiConfig, OpenAIConfig
from autogen.beta.eval import Variants, run_variants

board = await run_variants(
    suite,
    variants=Variants({
        "gpt-4o": Agent("weather", config=OpenAIConfig("gpt-4o"), tools=[get_weather]),
        "flash":  Agent("weather", config=GeminiConfig("gemini-3-flash-preview"), tools=[get_weather]),
    }, axis="config"),
    scorers=[...],
    store_dir="runs",
    repeats=5,                                # optional - N runs per variant for stability
)

print(board.summary("final_answer_matches"))  # ranked leaderboard
board.best("final_answer_matches")             # the winning variant's name
board.results["gpt-4o"]                        # each variant's full RunResult
```

## One axis at a time

`Variants` holds prebuilt agents, so you vary whatever you like by constructing them accordingly. For a controlled comparison, vary **one** axis across the agents and hold the rest fixed - then set `axis=` to label what changed (it shows up in `summary()`; it defaults to `"variant"`):

| Axis | Vary across the agents | `axis=` |
|---|---|---|
| model / provider / params | `config=` | `"config"` |
| system prompt | `prompt=` | `"prompt"` |
| tool set | `tools=` | `"tools"` |
| middleware stack | `middleware=` | `"middleware"` |

```python
# Vary the system prompt, hold everything else fixed.
variants = Variants({
    "terse":   Agent("a", prompt="Answer in one word.", config=cfg, tools=tools),
    "verbose": Agent("a", prompt="Explain, then answer.", config=cfg, tools=tools),
}, axis="prompt")
```

Need a per-task model override *within* a variant? Pass `model_config=` to `run_variants` exactly as with `run_agent` - it reaches each variant's `run_agent` call.

Each variant runs via `run_agent()` (so `repeats`, persistence, everything applies) and is saved as its own `<run_id>-<variant>.json`. `VariantRunResult.leaderboard(key)` ranks variants by a scorer - pass-rate for boolean scorers, mean for numeric - best first; **tied scores share a rank**, and `best(key)` returns `None` when there's no unique winner. (A 3-way 100% tie usually means the eval isn't *discriminating* - make the task harder, or score quality with a judge - not that there's a true winner.)

## Where to next

- **[Pairwise](pairwise.md)** - when "A vs B" is easier to judge than scoring each variant on its own.
- **[Persistence & tracking](persistence.md)** - every variant's run is saved, so you can diff it against past runs.

---

# Pairwise comparison

Source: https://docs.ag2.ai/latest/docs/beta/evaluation/pairwise/

Sometimes "which answer is better?" is easier and more reliable than scoring each answer in isolation - especially for subjective quality. `run_pairwise` runs two variants over the same suite and asks a **comparator** to pick a winner per task, then reports B's win-rate with a confidence interval.

## `run_pairwise`

```python
from autogen.beta.eval import run_pairwise
from autogen.beta.eval.scorers import pairwise_judge

result = await run_pairwise(
    suite,
    variant_a=agent_v1,
    variant_b=agent_v2,
    comparators=[pairwise_judge(config, criterion="more helpful answer", key="quality")],
    store_dir="runs",
)
wr = result.win_rate("quality")        # win-rate for B, with a Wilson confidence interval
print(wr.rate, wr.ci, wr.ties)
```

`PairwiseRunResult` reports B's win-rate, ties, position flips, and - when you supply both a model and a human comparator - their agreement (Cohen's κ).

## Comparators

A comparator decides the winner of one pair. Two kinds ship - an LLM judge, and a human:

- **`pairwise_judge(config, criterion=, key=)`** - an LLM. It judges each pair in **both orders** (A-then-B and B-then-A) and only counts a win when the verdict is consistent, which cancels position bias. Like `agent_judge`, it renders the gold answer as a `## Reference` section by default; pass `include_reference=False` for criteria that must compare the responses on their own (e.g. grounding) without seeing the reference.
- **`human_pairwise(...)` / `human_labels(...)`** - a person decides. Covered next.

## Human comparison

A human is often the ground truth for "which is better." There are two ways to collect those judgements.

### Inline - judge as the run goes

`human_pairwise` prompts a reviewer for each task. The default prompt prints the question and both answers **blinded** - Response 1 / Response 2, in random order so there's no position bias - and reads `1` / `2` / `tie`:

```python
from autogen.beta.eval import run_pairwise
from autogen.beta.eval.scorers import human_pairwise

result = await run_pairwise(
    suite,
    variant_a=agent_v1,
    variant_b=agent_v2,
    comparators=[human_pairwise(key="quality")],   # prompts in the terminal, per task
    store_dir="runs",
)
print(result.win_rate("quality").rate)
```

```text
Task: What's the capital of France?
[1] The capital of France is Paris.
[2] Paris.
Which is better? 1 / 2 / tie: 2
```

Pass your own `ask` callback to collect the choice from a UI or notebook instead of the terminal. It receives `(task, response_1, response_2)` (still blinded) and returns `"1"`, `"2"`, or `"tie"`:

```python
def ask(task, response_1, response_2) -> str:
    # render the two answers in your own UI, collect a click, map it to "1" / "2" / "tie"
    return my_review_ui.compare(task.inputs["input"], response_1, response_2)

human_pairwise(key="quality", ask=ask)
```

### At scale - blinded offline labeling

Past a handful of cases - or with several labelers - you don't sit at a terminal. Export a **blinded manifest**, have people label it in any tool, then import the results. This is the workflow for a real human-eval pass.

```python
from autogen.beta.eval import DirectoryTraceSource, evaluate_pairwise
from autogen.beta.eval.scorers import export_pairwise_cases, human_labels

champion = DirectoryTraceSource("runs/champion")        # two sets of captured traces
challenger = DirectoryTraceSource("runs/challenger")

# 1. write a blinded JSONL - one line per (task, criterion)
await export_pairwise_cases(
    champion, challenger,
    criteria=["more helpful"],
    out="labels.jsonl",
    suite=suite,
)

# 2. a person opens labels.jsonl (or a spreadsheet / labeling UI) and adds
#    "preferred": "1" | "2" | "tie" to each line.

# 3. import the labelled file and compute the win-rate
result = await evaluate_pairwise(
    champion, challenger,
    comparators=[human_labels("labels.jsonl", criterion="more helpful", key="helpful")],
    suite=suite,
    store_dir="runs",
)
print(result.win_rate("helpful").rate)
```

Each manifest line is blinded - the labeler sees the two answers but **not** which model produced which:

```json
{"case_id": "task-1::more helpful", "task_id": "task-1", "criterion": "more helpful",
 "task_input": "What's the capital of France?",
 "response_1": "Paris.", "response_2": "The capital of France is Paris.", "first_variant": "b"}
```

`first_variant` is the de-blinding key - it records which model is Response 1. Keep it out of the labeler's view; `human_labels` uses it to map their `"1"` / `"2"` back to the right variant. Export several `criteria` at once and add one `human_labels(criterion=..., key=...)` comparator per criterion.

## Grading existing pairs - `evaluate_pairwise`

`run_pairwise` is to `evaluate_pairwise` what `run_agent` is to `evaluate_traces`: the *grade-only* version. Given two trace sources you already have, it pairs them by `task_id` and runs the comparators - no agent invocation. The offline-labeling flow above is one use; it works just as well with `pairwise_judge` to re-grade captured champion/challenger traces with an LLM.

## Where to next

- **[Variants](variants.md)** - rank more than two on a leaderboard.
- **[Scorers](scorers.md)** - `pairwise_judge` is the head-to-head cousin of the `agent_judge` scorer.

---

# Persistence & tracking

Source: https://docs.ag2.ai/latest/docs/beta/evaluation/persistence/

Every run is saved to disk. That's what turns "run an eval once" into "track an agent over its lifetime" - comparing each version against its past selves to confirm improvements and catch regressions before they ship.

## The schema-0.1 JSON

Every `run_agent()` writes a JSON file to `store_dir/<run_id>.json` before returning. The file is the run's permanent record - comparable across days, releases, and pull requests. Shape:

```json
{
  "schema_version": "0.1",
  "run_id": "01J7Z3M4QF8X9Y0K1V2N3P4Q5R",
  "label": "weather-eval",
  "created_at": "2026-05-11T14:23:00+00:00",
  "duration_ms": 5292,
  "suite": { "name": "dataset", "size": 5, "source": "eval/dataset.jsonl" },
  "target": "autogen.beta.agent:Agent",
  "concurrency": 4,
  "tasks": [
    {
      "task_id": "weather-001",
      "inputs": { "input": "What's the weather in Tokyo?" },
      "reference_outputs": { "city": "Tokyo" },
      "tags": ["happy-path"],
      "metadata": {},
      "duration_ms": 1234,
      "events": [
        { "type": "ToolCallEvent", "name": "get_weather", "arguments": "..." }
      ],
      "exception": null,
      "tokens": { "input": 423, "output": 78, "cache_creation": 0, "cache_read": 0 },
      "feedback": [
        { "key": "tool_called[get_weather]", "score": true, "value": null, "comment": null }
      ],
      "budget_violation": false,
      "trace_ref": { "trace_id": "6c9e58736a7f8b080ed7c72213b7dcd8", "task_id": "weather-001", "metadata": {} }
    }
  ],
  "aggregates": {
    "pass_rate": { "tool_called[get_weather]": 1.0 },
    "score_stats": { "extra_tool_calls": { "mean": 0.0, "p50": 0.0, "p95": 0.0, "n": 5 } },
    "value_counts": { "termination_reason": { "completed": 5 } },
    "tokens": { "input": 2115, "output": 390, "total": 2505 },
    "errors": 0,
    "budget_violations": 0
  }
}
```

The per-task `trace_ref` is the real OpenTelemetry trace id of that task's spans (see [External telemetry](runs.md#external-telemetry-exporting-spans-to-your-own-backend)) - a stored row deep-links to the trace in your backend. The schema is forward-compatible: future versions add fields at the end of objects; existing fields don't change name or type. **Don't rely on this shape for in-tree code** - use the `RunResult` Python API. The JSON is for cross-process / cross-time persistence: a future dashboard, a CI artifact, a diff between two releases.

## Comparing two runs

Persistence pays off when you compare two runs: *"did my change make the agent better or worse?"*. Load a past run with `load_run(path)` and `diff` the current one against it.

```python
from autogen.beta.eval import load_run, run_agent

baseline = load_run("runs/last_release.json")          # a previously saved run
current = await run_agent(suite, agent=my_new_agent, scorers=[...], store_dir="runs")

delta = current.diff(baseline)
print(delta.summary())
#   correctness    88.0% -> 71.0%   -17.0   REGRESSION
#   tool_called    90.0% -> 95.0%    +5.0
#   flipped pass->fail: ['correctness:task-12']
```

`diff` joins the two runs by `task_id` and scorer key and reports per-scorer pass-rate / mean deltas plus the tasks that flipped pass<->fail. `RunDiff.regressions` is the `(scorer, task_id)` pairs that went pass -> fail - the CI gate:

```python
assert not current.diff(baseline).regressions   # fail the build if anything regressed
```

!!! warning "Runs must be comparable"
    By default (`strict=True`) `diff` raises `RunsNotComparableError` when the two runs didn't grade the same tasks with the same checks - a task or scorer present in only one run, or a task whose `inputs` / `reference_outputs` changed under the same id (*content drift*). The error itemizes every mismatch and tells you to pass **`strict=False`** to diff the overlap instead. With `strict=False`, only the genuinely comparable `(task, scorer)` pairs are diffed and everything excluded is reported on the `RunDiff` (`content_changed`, `only_in_current` / `only_in_baseline`, `scorers_only_in_*`) - so you are never silently shown an apples-to-oranges number.

!!! note
    `load_run` reconstructs each run's **scores and task identity** - enough for `diff` and the `pass_rate` / `score_stats` / `value_counts` accessors. It does not replay event traces, so a loaded run's `aggregates.tokens` reads zero; read the JSON directly if you need event-level detail.

## Tracking an agent across iterations

Persistence turns a one-off eval into a record of an agent's progress. There's no separate "project" concept to set up - you get one from three habits:

- **One `store_dir` per agent** - the folder *is* that agent's history.
- **A `run_id` that encodes the version** - `run_id="blog-writer-v7"` saves as `blog-writer-v7.json`.
- **A stable suite** - keep the eval set fixed across iterations so the runs stay comparable.

Then every iteration follows the same loop: **run -> compare to the last -> act on the diff.**

### Iteration 1 - establish a baseline

```python
result = await run_agent(
    suite, agent=agent_v1, scorers=SCORERS,
    store_dir="runs/blog-writer", run_id="v1", label="first cut",
)
print(result.summary())
```

You now have `runs/blog-writer/v1.json`. Read the scorecard and note what's weak - say `has_call_to_action` sits at 40%.

### Iteration 2 - change something, then confirm you moved the needle

You tweak the prompt to always end with a call to action, and re-run against the **same suite** with a new `run_id`:

```python
from autogen.beta.eval import load_run

current = await run_agent(
    suite, agent=agent_v2, scorers=SCORERS,
    store_dir="runs/blog-writer", run_id="v2", label="add CTA",
)
print(current.diff(load_run("runs/blog-writer/v1.json")).summary())
#   has_call_to_action   40.0% -> 100.0%   +60.0
#   mentions_topic      100.0% -> 100.0%    +0.0
```

The diff confirms the fix landed and nothing else moved. Ship it.

### Iteration 3 - catch a regression before it ships

Later you swap to a cheaper model and re-run:

```python
current = await run_agent(
    suite, agent=agent_v3, scorers=SCORERS,
    store_dir="runs/blog-writer", run_id="v3", label="cheaper model",
)
diff = current.diff(load_run("runs/blog-writer/v2.json"))
print(diff.summary())
#   has_call_to_action  100.0% -> 100.0%    +0.0
#   mentions_topic      100.0% ->  80.0%   -20.0   REGRESSION
print(diff.regressions)   # [('mentions_topic', 'task-3')]
```

The cheaper model dropped the topic on one task. Now you can **decide deliberately**: accept -20% for the cost saving, or adjust (sharpen the prompt, or keep the old model for that case). Either way you knew *before* merging - and in CI, `assert not diff.regressions` would have blocked it for you.

The folder now holds `v1.json`, `v2.json`, `v3.json` - a complete, comparable history. Diff any pair to see how the agent moved between two points in its life, or hand the JSONs to a dashboard later.

## Where to next

- **[Runs](runs.md)** - producing the runs you persist here.
- **[Variants](variants.md)** - compare several builds in a single run instead of across iterations.

---

# Files API

Source: https://docs.ag2.ai/latest/docs/beta/advanced/files/

The Beta Files API provides a provider-agnostic interface for uploading, listing, reading, and deleting files used by multimodal workflows. It wraps each provider's native files endpoint behind a single async client.

## When to use Files API

Use `FilesAPI` when you want to:

- Upload large assets once, then reference them by `file_id`
- Manage file lifecycle in your app (list and delete stale files)
- Keep file handling logic consistent across supported providers

## Supported providers

`FilesAPI` is available for:

- `OpenAIConfig`
- `OpenAIResponsesConfig`
- `AnthropicConfig`
- `GeminiConfig`
- `XAIConfig`

!!! note
    Gemini does not support downloading file bytes via its Files API. `FilesAPI.read()` raises `NotImplementedError` for Gemini.

## Create a Files API client

```python
from autogen.beta import FilesAPI
from autogen.beta.config import OpenAIResponsesConfig

config = OpenAIResponsesConfig(
    model="gpt-5-mini",
    api_key="YOUR_API_KEY",
)
files = FilesAPI(config)
```

## Upload files

You can upload from a local path or from in-memory bytes.

### Upload from local path

```python
uploaded = await files.upload(path="report.pdf", purpose="assistants")
print(uploaded.file_id)
```

### Upload from bytes

```python
content = b"hello from ag2 beta"
uploaded = await files.upload(
    data=content,
    filename="hello.txt",
    purpose="assistants",
)
print(uploaded.file_id)
```

If `data` is provided without `filename`, `upload()` raises `ValueError`.

## Read, list, and delete

```python
# list all uploaded files for this provider/account
all_files = await files.list()

# download bytes for one file (not supported by Gemini)
file_data = await files.read(all_files[0].file_id)
print(file_data.name, len(file_data.data), file_data.media_type)

# delete by file ID
await files.delete(all_files[0].file_id)
```

You can also call `read()` from an `UploadedFile`:

```python
uploaded = await files.upload(path="report.pdf")
content = await uploaded.read(files)
```

## Use uploaded files in agent requests

After upload, pass the returned `file_id` to an input event.

```python
from autogen.beta import Agent
from autogen.beta.events import DocumentInput

agent = Agent(
    "assistant",
    config=config,
)

uploaded = await files.upload(path="report.pdf")
doc = DocumentInput(file_id=uploaded.file_id)

reply = await agent.ask("Summarize this report.", doc)
print(reply.body)
```

For more multimodal details, see [Multimodal Inputs](../multimodal/inputs.md).

---

# Events Streaming

Source: https://docs.ag2.ai/latest/docs/beta/advanced/stream/

## What is the Stream?

The **Stream** in **AG2 Beta** is a central event bus that facilitates communication between agents and system components. It operates on an event-driven architecture where components can publish events (derived from `BaseEvent`) and other components can subscribe to these events.

The Stream manages the flow of messages, actions, and tool calls, allowing for decoupled and scalable architectures. You interact with the Stream by publishing events to it and subscribing to specific events you want to listen to.

## Passing a custom stream to an Agent

By default, agents create a `MemoryStream` instance internally for each conversation.

However, you can pass a custom stream to an agent when calling its `ask` method. This allows you to set up subscribers on the stream before the conversation starts, letting you observe or intercept the agent's internal events.

```python
from autogen.beta import Agent, MemoryStream
from autogen.beta.events import ModelRequest, ModelResponse

agent = Agent("my_agent")

my_stream = MemoryStream()

# Subscribe to see what the model requests and returns
@my_stream.where(ModelRequest | ModelResponse).subscribe()
def log_model_activity(event):
    print(f"[{event.__class__.__name__}] {event}")

# Pass the stream to the agent
response = await agent.ask("Hello!", stream=my_stream)
```

## How to subscribe to stream events

You can listen to events flowing through the stream using the `subscribe` method. To filter which events you receive, you can use the `where` method.

!!! note
    All event subscribers and interrupters support the same powerful execution context capabilities as [Agent Tools](../tools/approval_required.md).

    By type-hinting the `Context` object (or using `Depends`, `Inject`, `Variable`), your subscribers can access injected dependencies, interact with human-in-the-loop flows, or access conversation variables.

    For more detailed information on specific context features, see [Dependency Injection](../context/inject.md), [Context Variables](../context/variables.md), [Depends](../depends.md), [Human-in-the-loop](../context/human_in_the_loop.md).

### Subscribe to events of a specific type

To subscribe to events of a specific type, pass the event class to `stream.where()`.

```python
from autogen.beta import MemoryStream
from autogen.beta.events import ToolCallEvent

stream = MemoryStream()

def handle_tool_call(event: ToolCallEvent):
    print(f"Tool called: {event.name}")

# Subscribe only to ToolCallEvent events
stream.where(ToolCallEvent).subscribe(handle_tool_call)
```

### Subscribe to multiple event types

You can subscribe to multiple event types using the bitwise OR operator (`|`).

```python
from autogen.beta.events import ToolCallEvent, ModelMessage

def handle_event(event):
    print(f"Received event: {event}")

# Subscribe to either ToolCallEvent or ModelMessage events
stream.where(ToolCallEvent | ModelMessage).subscribe(handle_event)
```

### Exclude events of a specific type

You can negate an event type using the bitwise NOT operator (`~`). This creates a condition that matches everything *except* the specified type.

```python
# Subscribe to all events except ToolCallEvent
stream.where(~ToolCallEvent).subscribe(handle_event)
```

### Subscribe to events with a specific value

You can filter events based on the value of their fields. The event classes define fields that support comparison operators.

```python
# Subscribe only to ToolCallEvent events where the name is "fetch_data"
stream.where(ToolCallEvent.name == "fetch_data").subscribe(handle_event)
```

## Static subscribers with decorators

You can use `subscribe` as a decorator to register static handlers cleanly. This works well with `where` filters.

```python
stream = MemoryStream()

@stream.where(ToolCallEvent).subscribe()
def on_tool_call(event: ToolCallEvent):
    print(f"Handling tool call: {event.name}")

@stream.subscribe()
def on_any_event(event):
    print(f"Global logger: {event}")
```

## Dynamic subscribers by context manager

If you only need to listen to events for a specific duration or within a specific block of code, you can use `sub_scope` as a context manager. This dynamically subscribes the handler when entering the block and unsubscribes when exiting.

```python
def temp_listener(event):
    print("Temporary event captured:", event)

# temp_listener is active only inside the with block
with stream.sub_scope(temp_listener):
    # Perform actions that might trigger events
    pass
```

You can also combine this with `where`:

```python
with stream.where(ModelMessage).sub_scope(temp_listener):
    pass
```

## Get specific events

To wait for and get the next occurrence of a specific event asynchronously, use the `get` async context manager. This yields a future that resolves to the matched event.

```python
from autogen.beta.events import HumanMessage

async def wait_for_human(stream: MemoryStream):
    async with stream.get(HumanMessage) as response:
        # Code here can trigger the event, or we just wait
        event = await response
        print(f"User said: {event.content}")
```

## Raise events manually

To raise or publish events to the stream, you should always use the `Context` object rather than sending them to the stream directly. The `Context` ensures that dependencies and the stream scope are properly propagated.

```python
from autogen.beta import Context
from autogen.beta.events import ModelMessage

async def publish_message(context: Context):
    event = ModelMessage(content="Hello from the agent!")
    # Always use context.send() to raise events
    await context.send(event)
```

## Events Interrupters

Interrupters allow you to intercept an event before it reaches regular subscribers. You can use an interrupter to modify the event, raise a completely different event in its place, or suppress it entirely.

To register an interrupter, pass `interrupt=True` to the `subscribe` method. If the interrupter returns an event, that event replaces the original one for subsequent interrupters and subscribers. If it returns `None`, the event is suppressed and propagation stops.

```python
@stream.where(ModelMessage).subscribe(interrupt=True)
async def intercept_message(
    event: ModelMessage,
    context: Context,
) -> BaseEvent | None:
    if "secret" in event.content:
        # Suppress the event by returning None
        return None

    elif "alert" in event.content:
        # Replace it with a different event
        await context.send(AlertEvent(message=event.content))
        return None

    # Or modify and return the original event
    event.content = event.content.upper()
    return event
```

## RedisStream - Persistent & Cross-Process Events

`RedisStream` is a drop-in replacement for `MemoryStream` that adds **persistent event history** and **cross-process pub/sub** via Redis. Events are delivered to all subscribers - even across different processes or machines.

```bash
pip install "ag2[redis]"
```

### Basic Usage

```python
from autogen.beta import Agent
from autogen.beta.streams.redis import RedisStream
from autogen.beta.config import OpenAIConfig

stream = RedisStream("redis://localhost:6379")

agent = Agent(
    "assistant",
    prompt="You are a helpful assistant.",
    config=OpenAIConfig("gpt-4o-mini"),
)

reply = await agent.ask("Hello!", stream=stream)

# History is persisted in Redis and survives restarts
history = list(await stream.history.get_events())
```

All the same subscription patterns (`subscribe`, `where`, `sub_scope`, `get`, interrupters) work exactly as with `MemoryStream`.

### Serialization Format

By default, events are serialized as **JSON** for human readability. You can switch to **pickle** for full Python object fidelity:

```python
from autogen.beta.streams.redis import RedisStream, Serializer

# JSON (default) - readable in Redis tools
stream = RedisStream("redis://localhost:6379")

# Pickle - preserves exact Python types
stream = RedisStream("redis://localhost:6379", serializer=Serializer.PICKLE)
```

### Cross-Process Communication

Multiple `RedisStream` instances sharing the same `id` automatically receive each other's events via Redis Pub/Sub:

```python
from uuid import UUID
from autogen.beta.streams.redis import RedisStream

STREAM_ID = UUID("aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee")

# Process A
stream_a = RedisStream("redis://localhost:6379", id=STREAM_ID)

# Process B (separate Python process)
stream_b = RedisStream("redis://localhost:6379", id=STREAM_ID)

# Events sent on stream_a are received by subscribers on stream_b, and vice versa
```

### Cleanup

Always close the stream when done to release Redis connections:

```python
await stream.close()
```

## Custom events

You can create your own custom events by subclassing `BaseEvent`. Because of its metaclass, field definitions automatically support value-based filtering in `where` clauses.

```python
from autogen.beta.events import BaseEvent

class PaymentProcessed(BaseEvent):
    amount: float
    status: str

# You can now filter by fields on your custom event:
@stream.where(PaymentProcessed.status == "success").subscribe()
def handle_success(event: PaymentProcessed):
    print(f"Payment processed: {event.amount}")

# And raise them via Context
async def process_payment(context: Context):
    await context.send(PaymentProcessed(amount=100.50, status="success"))
```

---

# Observers

Source: https://docs.ag2.ai/latest/docs/beta/advanced/observers/

Observers let you attach lightweight, read-only event listeners directly to an Agent. Under the hood, each observer is a regular [Stream subscriber](stream.md) - but you register it on the Agent instead of managing the stream yourself.

Use observers when you want to monitor agent behavior (logging, metrics, debugging) without writing a full [Middleware](../middleware.md) class.

## Creating an Observer

Use the `observer()` function to pair an event condition with a callback. The first argument is the event type (or condition) to match, the second is an optional callback:

```python
from autogen.beta import observer
from autogen.beta.events import ModelResponse

@observer(ModelResponse)
async def log_response(event: ModelResponse) -> None:
    print(f"Model said: {event.content}")
```

## Registering Observers

### On the Agent constructor

Pass observers when creating the agent. These observers are active for every `agent.ask()` call:

```python
from autogen.beta import Agent, observer
from autogen.beta.config import OpenAIConfig
from autogen.beta.events import ModelResponse, ToolCallEvent

@observer(ModelResponse)
def on_response(event: ModelResponse) -> None:
    print(f"Response: {event.content}")

agent = Agent(
    "assistant",
    config=OpenAIConfig("gpt-4o-mini"),
    observers=[on_response],
)

reply = await agent.ask("Hello!")
```

### With the decorator method

Use `@agent.observer()` to register an observer after agent creation. This mirrors the `@agent.tool()` and `@agent.prompt()` patterns:

```python
from autogen.beta import Agent

agent = Agent(
    "assistant",
    config=OpenAIConfig("gpt-4o-mini"),
)

@agent.observer(ModelResponse)
def on_response(event: ModelResponse) -> None:
    print(f"Response: {event.content}")
```

### Per-call observers

Pass observers to a specific `ask()` call. These are scoped to that call only and are automatically cleaned up when the call finishes:

```python
reply = await agent.ask(
    "What is 2+2?",
    observers=[on_response],
)
```

Constructor-level and per-call observers can be combined - both will fire:

```python
from autogen.beta import Agent

agent = Agent(
    "assistant",
    config=config,
    observers=[observer(ModelResponse, log_to_file)],
)

# log_to_file fires AND send_metric fires for this call
reply = await agent.ask(
    "Hello!",
    observers=[observer(ModelResponse, send_metric)],
)
```

## Event Filtering

Observers support the same condition system as [stream subscriptions](stream.md). You can filter by event type, combine types, or match on field values:

```python
from autogen.beta.events import ModelRequest, ModelResponse, ToolCallEvent

# Single event type
@agent.observer(ModelResponse)
def on_response(event: ModelResponse) -> None:
    print(f"Response: {event.content}")

# Multiple event types with OR
@agent.observer(ModelRequest | ModelResponse)
def on_any(event: ModelRequest | ModelResponse) -> None:
    print(f"Event: {event}")

# Field-based filtering
@agent.observer(ToolCallEvent.name == "search")
def on_search(event: ToolCallEvent) -> None:
    print(f"Search: {event.name}")

# Negation - everything except a specific type
@agent.observer(~ToolCallEvent)
def on_non_tool(event) -> None:
    print(f"Non-tool event: {event}")
```

## Dependency Injection

Observer callbacks support the same dependency injection features as [Agent Tools](../tools/tools.md) and [stream subscribers](stream.md) - `Context`, `Inject`, `Depends`, and `Variable`:

```python
from autogen.beta import Context, observer
from autogen.beta.events import ModelResponse

@observer(ModelResponse)
async def track(event: ModelResponse, ctx: Context) -> None:
    print(f"Stream: {ctx.stream.id}, Response: {event.content}")
```

## Advanced Options

Since observers are stream subscribers, they support the same `interrupt` and `sync_to_thread` parameters as `stream.subscribe()`:

```python
# Async observer - disable sync_to_thread since it's already async
@observer(ModelResponse, sync_to_thread=False)
async def async_tracker(event: ModelResponse) -> None:
    await metrics.record(event)

# Interrupter - processes before regular subscribers and can modify events
@observer(ModelResponse, interrupt=True)
def intercept(event: ModelResponse) -> ModelResponse:
    event.content = event.content.upper()
    return event
```

## Observers vs Middleware vs Stream Subscribers

| Feature | Observer | Middleware | Stream Subscriber |
|---|---|---|---|
| Registration | On Agent | On Agent | On Stream |
| Lifecycle | Scoped to execution | Scoped to execution | Manual |
| Boilerplate | Minimal - one function | Class with factory | Low - one function |
| Can modify events | Only with `interrupt=True` | Yes (wraps execution) | Only with `interrupt=True` |
| DI support | Yes | Yes | Yes |
| Use case | Monitoring, logging, metrics | Cross-cutting logic (retry, auth, rate limiting) | Low-level event wiring |

## Trigger-Driven Observers (BaseObserver)

The `observer()` factory above is perfect for one-off event hooks. But when you need **stateful monitoring** - detecting repeated tool calls, tracking cumulative token usage, rolling time-window metrics - subclass `BaseObserver`.

A `BaseObserver` is an ABC that pairs a [Watch](watches.md) with a `process()` method. The Watch decides *when* to fire; `process()` decides *what to do* with the collected events. If `process()` returns an `ObserverAlert`, the base class emits it back onto the stream for other subscribers to consume.

### BaseObserver vs `@observer`

| | `@observer` (StreamObserver) | `BaseObserver` |
|---|---|---|
| Shape | Function | Class |
| State | Stateless | Instance state (counters, history, etc.) |
| Trigger | Per matching event | Any `Watch` (event / batch / time / composite) |
| Output | Whatever your callback does | Optional `ObserverAlert` auto-emitted on the stream |
| Good for | Logging, metrics, one-offs | Thresholds, rate limits, loop detection, rolling stats |

### Built-in observers

Two ready-to-use `BaseObserver` subclasses ship with AG2 Beta:

#### LoopDetector

Detects repetitive tool-call patterns. Maintains a sliding window of recent tool calls and emits a `WARNING` alert when `repeat_threshold` consecutive identical calls (same tool name and arguments) are observed.

```python
from autogen.beta import Agent
from autogen.beta.observer import LoopDetector
from autogen.beta.config import OpenAIConfig

agent = Agent(
    "poller",
    config=OpenAIConfig(model="gpt-5"),
    observers=[LoopDetector(window_size=10, repeat_threshold=3)],
)
```

#### TokenMonitor

Tracks cumulative token usage across `ModelResponse` and `TaskCompleted` events. Emits `WARNING` / `CRITICAL` alerts as thresholds are crossed.

```python
from autogen.beta import Agent
from autogen.beta.observer import TokenMonitor
from autogen.beta.config import OpenAIConfig

agent = Agent(
    "assistant",
    config=OpenAIConfig(model="gpt-5"),
    observers=[TokenMonitor(warn_threshold=50_000, alert_threshold=100_000)],
)
```

### ObserverAlert

Both built-ins (and any `BaseObserver` you write) emit `ObserverAlert` events on the stream. Subscribe to them like any other event:

```python
from autogen.beta import MemoryStream
from autogen.beta.events import ObserverAlert

stream = MemoryStream()

@stream.where(ObserverAlert).subscribe
def surface_alerts(event: ObserverAlert) -> None:
    print(f"[{event.severity}] {event.source}: {event.message}")
```

!!! note
    `ObserverAlert` is emitted to the stream and persisted in history, but **is not rendered back to the LLM** by the default provider mappers. If you want the agent itself to react to alerts, write a [Middleware](../middleware.md) that converts alerts into follow-up messages.

Severity levels live in `autogen.beta.events.Severity`: `INFO`, `WARNING`, `CRITICAL`, `FATAL`.

### Building a custom observer

Subclass `BaseObserver`, pick a [Watch](watches.md), implement `process()`:

```python
from autogen.beta import Context
from autogen.beta.observer import BaseObserver
from autogen.beta.watch import CadenceWatch
from autogen.beta.events import BaseEvent, ModelResponse, ObserverAlert, Severity

class AvgCompletionObserver(BaseObserver):
    """Every N responses, emit an INFO alert with the average completion-token count."""

    def __init__(self, window: int = 5) -> None:
        super().__init__("avg-completion", watch=CadenceWatch(n=window, condition=ModelResponse))
        self._window = window

    async def process(self, events: list[BaseEvent], ctx: Context) -> ObserverAlert | None:
        tokens = [e.usage.completion_tokens for e in events if isinstance(e, ModelResponse) and e.usage]
        if not tokens:
            return None
        return ObserverAlert(
            source=self.name,
            severity=Severity.INFO,
            message=f"Avg completion tokens over last {self._window} responses: {sum(tokens) / len(tokens):.0f}",
        )
```

Register it like any other observer:

```python
from autogen.beta import Agent
from autogen.beta.config import OpenAIConfig

agent = Agent(
    "assistant",
    config=OpenAIConfig(model="gpt-5"),
    observers=[AvgCompletionObserver(window=5)],
)
```

`process()` may also emit events directly via `await ctx.send(...)` - returning an `ObserverAlert` is just the common case.

---

# Watches

Source: https://docs.ag2.ai/latest/docs/beta/advanced/watches/

A **Watch** is a reactive trigger primitive. You arm it on a [Stream](stream.md), and it fires a callback when its condition is met - where "condition" can be an event match, a count, a time window, a schedule, or a composition of other watches.

Watches are the mechanism behind trigger-driven [Observers](observers.md), but they can be used standalone for custom reactive logic on any Stream.

## When to use a Watch

Use a Watch when a simple `stream.subscribe(...)` is not enough - i.e. when you need buffering, timing, ordering, or composition:

| You need | Use |
|---|---|
| Run a callback on every matching event | `stream.subscribe(fn, condition=...)` (no Watch needed) |
| Fire every N events | `CadenceWatch(n=N, condition=...)` |
| Collect events over a time window | `CadenceWatch(max_wait=seconds, condition=...)` |
| Fire on "N events OR T seconds, whichever first" | `CadenceWatch(n=N, max_wait=seconds, condition=...)` |
| Fire once after a delay | `DelayWatch(seconds)` |
| Fire on a schedule | `IntervalWatch(seconds)` or `CronWatch(expr)` |
| Wait for two separate events in any order | `AllOf(w1, w2)` |
| Wait for an ordered sequence | `Sequence(w1, w2, ...)` |
| Fire on the earliest of several triggers | `AnyOf(w1, w2, ...)` |

## Anatomy of a Watch

Every Watch implements the same `Watch` protocol (importable from `autogen.beta`):

```python
from typing import Protocol

class Watch(Protocol):
    @property
    def id(self) -> str: ...
    @property
    def is_armed(self) -> bool: ...

    def arm(self, stream, callback) -> None: ...
    def disarm(self) -> None: ...
```

The callback signature is uniform across all Watch kinds:

```python
from autogen.beta import Context
from autogen.beta.events import BaseEvent

async def callback(events: list[BaseEvent], ctx: Context) -> None: ...
```

For event-driven watches (`EventWatch`, `CadenceWatch`, `Sequence`), `events` contains the matched events. For time-driven watches (`DelayWatch`, `IntervalWatch`, `CronWatch`), `events` is empty - the trigger is the timer itself.

## Event-driven Watches

### EventWatch

Fires immediately on each matching event.

```python
from autogen.beta import MemoryStream
from autogen.beta.watch import EventWatch
from autogen.beta.events import ModelResponse

stream = MemoryStream()
watch = EventWatch(ModelResponse)

async def on_response(events, ctx):
    print(f"Model responded: {events[0].content}")

watch.arm(stream, on_response)
```

`EventWatch` also supports field conditions and negation:

```python
from autogen.beta.watch import EventWatch
from autogen.beta.events import ToolCallEvent

watch = EventWatch(ToolCallEvent.name == "search")

# Fire on every event that is NOT a ToolCallEvent
watch = EventWatch(~ToolCallEvent)
```

### CadenceWatch

Buffers matching events and fires once the buffer reaches **size** `n`, once **time** `max_wait` seconds have elapsed since the first buffered event, or whichever comes first when both are set. At least one of `n` and `max_wait` is required.

```python
from autogen.beta.watch import CadenceWatch
from autogen.beta.events import ModelResponse

# Count-only: fire every 5 model responses
batch = CadenceWatch(n=5, condition=ModelResponse)

# Time-only: flush the buffer once a minute, whenever there's something in it
window = CadenceWatch(max_wait=60.0, condition=ModelResponse)

# Size OR time: fire at 5 events OR 60 seconds after the first event, whichever first
hybrid = CadenceWatch(n=5, max_wait=60.0, condition=ModelResponse)

async def summarize(events, ctx):
    print(f"Batched {len(events)} responses")
```

The timer starts on the **first** buffered event in a cadence - a quiet stream produces no firings. Any unfilled buffer at disarm-time is discarded.

## Time-driven Watches

### DelayWatch

Fires exactly once after a delay, then auto-disarms.

```python
from autogen.beta.watch import DelayWatch

watch = DelayWatch(30.0)

async def timeout_guard(events, ctx):
    # events is always [] for time-driven watches
    print("30 seconds elapsed")

watch.arm(stream, timeout_guard)
```

### IntervalWatch

Fires periodically at a fixed interval until disarmed.

```python
from autogen.beta.watch import IntervalWatch

watch = IntervalWatch(60.0)

async def heartbeat(events, ctx):
    print("tick")
```

### CronWatch

Fires on a standard 5-field cron expression.

```python
from autogen.beta.watch import CronWatch

watch = CronWatch("0 9 * * MON")  # every Monday at 9am
```

Supports `*`, `*/n`, `a-b`, comma lists, and `SUN`-`SAT` day-of-week names.

## Composite Watches

### AllOf

Fires once when **every** sub-watch has fired at least once.

```python
from autogen.beta.watch import AllOf, EventWatch
from autogen.beta.events import ModelResponse, ToolCallEvent

watch = AllOf(
    EventWatch(ModelResponse),
    EventWatch(ToolCallEvent),
)

async def both_seen(events, ctx):
    # Gets the combined events from all sub-watches
    print(f"Saw both types, got {len(events)} events total")
```

After firing, the gate resets - both sub-watches must fire again for the next firing.

### AnyOf

Fires on **any** sub-watch, every time.

```python
from autogen.beta.watch import AnyOf, EventWatch
from autogen.beta.events import ObserverAlert

watch = AnyOf(
    EventWatch(ObserverAlert.severity == "critical"),
    EventWatch(ObserverAlert.severity == "fatal"),
)
```

### Sequence

Fires when sub-watches trigger **in order**. Each sub-watch is armed only after the previous has fired.

```python
from autogen.beta.watch import Sequence, EventWatch
from autogen.beta.events import ModelRequest, ModelResponse

watch = Sequence(
    EventWatch(ModelRequest),
    EventWatch(ModelResponse),
)

async def round_trip(events, ctx):
    # events contains one ModelRequest followed by one ModelResponse
    print("round-trip complete")
```

After the last sub-watch fires, the sequence resets.

## Emitting events from a callback

A Watch callback can send events back onto the stream - this is how [trigger-driven Observers](observers.md#trigger-driven-observers-baseobserver) emit alerts:

```python
from autogen.beta import MemoryStream
from autogen.beta.watch import DelayWatch
from autogen.beta.events import ObserverAlert, Severity

watch = DelayWatch(30.0)

async def timeout_alert(events, ctx):
    await ctx.send(ObserverAlert(
        source="timeout-watch",
        severity=Severity.WARNING,
        message="Agent has been running for >30s.",
    ))
```

!!! tip
    Emitting events from within a callback means other subscribers and watches can react. This is the foundation for composing reactive workflows.

## Arming and lifecycle

A Watch is a stateful object. Calling `arm()` on an already-armed Watch will first disarm it, so re-arming is safe. `disarm()` cleans up any subscriptions and cancels any timers.

Most of the time you won't call `arm()` / `disarm()` directly - you'll hand the Watch to a [BaseObserver](observers.md#trigger-driven-observers-baseobserver), which manages the lifecycle against the agent's stream.

## Next steps

- Use Watches inside a [BaseObserver](observers.md#trigger-driven-observers-baseobserver) for agent-scoped monitoring.
- See [Stream](stream.md) for event publishing and subscription mechanics.

---

# Knowledge Store

Source: https://docs.ag2.ai/latest/docs/beta/advanced/knowledge_store/

A **`KnowledgeStore`** is a virtual, path-based key-value store with filesystem-like semantics. It gives agents a durable place to persist conversation logs, artifacts, working memory, and any other content that should outlive a single `agent.ask()` call.

All implementations share the same protocol, so you can swap an in-memory store for a SQLite file or Redis instance without changing callers.

## The protocol

The `KnowledgeStore` protocol (importable from `autogen.beta`) defines the full API every implementation satisfies:

```python
from typing import Protocol

class KnowledgeStore(Protocol):
    async def read(self, path: str) -> str | None: ...
    async def write(self, path: str, content: str) -> None: ...
    async def list(self, path: str = "/") -> list[str]: ...
    async def delete(self, path: str) -> None: ...
    async def exists(self, path: str) -> bool: ...

    async def append(self, path: str, content: str) -> int: ...
    async def read_range(self, path: str, start: int, end: int | None = None) -> str: ...

    async def on_change(self, path: str, callback) -> ChangeSubscription: ...
```

Paths follow Unix conventions: absolute (`/dir/subdir/file.txt`), directories are implicit, and `list()` returns immediate children with `/` suffixes for directories.

The `append` / `read_range` pair supports WAL-style workloads: `append` returns the byte offset, which you can later hand to `read_range` to retrieve just the newly written slice.

## Choosing an implementation

| Implementation | Use when |
|---|---|
| `MemoryKnowledgeStore` | Tests, ephemeral sessions, or when persistence isn't needed |
| `SqliteKnowledgeStore` | Single-process durability on disk - the pragmatic default |
| `DiskKnowledgeStore` | Files need to be human-readable on disk (artifacts, logs) |
| `RedisKnowledgeStore` | Multi-process or cross-host sharing |
| `LockedKnowledgeStore` | Wraps another store to serialize concurrent writers |

## Basic usage

### Memory store - fastest, non-persistent

```python
from autogen.beta import MemoryKnowledgeStore

store = MemoryKnowledgeStore()

await store.write("/artifacts/report.md", "# Q3 Summary\n...")
print(await store.read("/artifacts/report.md"))
print(await store.list("/"))  # ['artifacts/']
```

### Sqlite store - persistent across process restarts

```python
from autogen.beta import SqliteKnowledgeStore

store = SqliteKnowledgeStore("/var/agents/alice/knowledge.db")
await store.write("/config/model.txt", "claude-opus-4-7")

# In a later run, the same DB:
store2 = SqliteKnowledgeStore("/var/agents/alice/knowledge.db")
print(await store2.read("/config/model.txt"))  # "claude-opus-4-7"
```

### Disk store - files on the filesystem

```python
from autogen.beta import DiskKnowledgeStore

store = DiskKnowledgeStore("/var/agents/alice/knowledge/")
await store.write("/artifacts/data.json", '{"ok": true}')
# produces /var/agents/alice/knowledge/artifacts/data.json
```

## Append and `read_range`

These two methods support WAL-style event logs, turn-by-turn transcripts, and any append-only workload where you want to retrieve only new content.

```python
off1 = await store.append("/log/events.jsonl", '{"t": 1}\n')
off2 = await store.append("/log/events.jsonl", '{"t": 2}\n')

# Read everything appended in the second write
new_slice = await store.read_range("/log/events.jsonl", off1)
print(new_slice)  # '{"t": 2}\n'
```

!!! warning
    `read_range` returns UTF-8 text but operates on **byte offsets**. If you append multi-byte characters, align offsets to character boundaries yourself.

## Change subscriptions

Callers can react to writes via `on_change`. Backends that observe changes efficiently (`DiskKnowledgeStore` using `watchdog`) call the callback directly. Backends that cannot (`MemoryKnowledgeStore`, `SqliteKnowledgeStore`) return a `NoopChangeSubscription` - the caller is expected to poll.

```python
async def on_log_change(path: str) -> None:
    print(f"{path} changed")

sub = await store.on_change("/log/", on_log_change)
# ... later:
await sub.cancel()
```

## DefaultBootstrap

`DefaultBootstrap` populates a store with a standard layout and `SKILL.md` files that explain each directory to an LLM reader. It's designed to be called once per agent:

```python
from autogen.beta import DefaultBootstrap, MemoryKnowledgeStore

store = MemoryKnowledgeStore()
await DefaultBootstrap().bootstrap(store, actor_name="alice")

print(await store.list("/"))
# ['SKILL.md', 'artifacts/', 'log/', 'memory/']
```

Resulting layout:

| Path | Purpose |
|---|---|
| `/SKILL.md` | Top-level store description |
| `/log/` | Conversation logs - auto-populated by `EventLogWriter` after each `ask()` unless `KnowledgeConfig.write_event_log=False` |
| `/artifacts/` | User files, downloads, reference material |
| `/memory/` | Working memory and conversation summaries |

Implement your own `StoreBootstrap` if you need a different layout.

## EventLogWriter - persist stream history

`EventLogWriter` serializes a Stream's events to a `KnowledgeStore` as JSONL, and can reconstruct them later. Useful for replay, audit, or multi-run aggregation.

```python
from autogen.beta import Agent, EventLogWriter, MemoryKnowledgeStore, MemoryStream
from autogen.beta.config import OpenAIConfig

stream = MemoryStream()
agent = Agent("assistant", config=OpenAIConfig(model="gpt-5"))
await agent.ask("Hello!", stream=stream)

# Persist all events from this stream
store = MemoryKnowledgeStore()
writer = EventLogWriter(store)
events = list(await stream.history.get_events())
await writer.persist(stream.id, events)

# Reload later
loaded = await writer.load(stream.id)  # -> list[BaseEvent]
```

Persisted events land at `/log/{stream_id}.jsonl`. Events of types that cannot be deserialized (e.g. a removed class) come back as `UnknownEvent` - no data is lost.

When an `Agent` is given a `KnowledgeConfig`, it runs this `persist(...)` step for you at the end of every `ask()` - so you only call `EventLogWriter` directly in a custom harness. Set `KnowledgeConfig.write_event_log=False` to turn the automatic write off (e.g. when the store is purely user-facing memory rather than a transcript archive); if the writer raises, the turn still returns its reply and an `EventLogFailed` event is emitted on the stream.

!!! tip
    Pair `EventLogWriter` with `DefaultBootstrap` to get a ready-to-use persistent agent state. The writer targets `/log/` which the bootstrap has already described via `SKILL.md`. If you set `KnowledgeConfig.expose_tool=False`, the `Agent` passes `DefaultBootstrap(mention_tool=False)` so the generated `SKILL.md` doesn't tell the LLM about a `knowledge` tool it can't call.

## LockedKnowledgeStore - serialize writers

`LockedKnowledgeStore` wraps any `KnowledgeStore` to serialize concurrent writes. It delegates locking to a user-provided object implementing `acquire(name, ttl)` / `release(name)` - typically a distributed lock (Redis, database advisory locks, etc.) so multiple processes sharing the same store can coordinate.

```python
from autogen.beta import LockedKnowledgeStore, SqliteKnowledgeStore

inner = SqliteKnowledgeStore("/var/agents/shared.db")
store = LockedKnowledgeStore(inner, lock=your_distributed_lock)
# ... hand `store` to every agent that shares the DB
```

!!! note
    Reads are not locked (safe for concurrent access on all backends). Only `write`, `delete`, and `append` acquire the lock. Lock keys are of the form `store:write:{path}`.

---

# Assembly

Source: https://docs.ag2.ai/latest/docs/beta/advanced/assembly/

**Assembly** is the step that shapes what the LLM actually sees on each turn. An `AssemblerMiddleware` runs an ordered chain of `AssemblyPolicy` instances, each one transforming `(prompts, events)` before they reach the model.

Use it to inject persistent context (working memory, past conversations, observer alerts) and to cap the history footprint (sliding window, token budget) - without touching the event stream itself.

## Why assembly

Everything that happens during an agent's run - model requests/responses, tool calls, observer alerts, lifecycle events - lands on the [Stream](stream.md). But not all of that is useful to send to the next LLM call, and some useful context (e.g. a summary of a prior conversation) lives outside the stream entirely.

Assembly is the seam where you:

- **Inject** information from the [KnowledgeStore](knowledge_store.md) or from [Observer alerts](observers.md) into the prompt.
- **Filter** the event list down to what the model should see.
- **Reduce** the history to fit a window or token budget.

Each of those jobs is an `AssemblyPolicy`. The `AssemblerMiddleware` chains them.

## AssemblyPolicy protocol

Every policy implements the same shape:

```python
from typing import Protocol
from autogen.beta import Context
from autogen.beta.events import BaseEvent

class AssemblyPolicy(Protocol):
    name: str

    async def apply(
        self,
        prompts: list[str],
        events: list[BaseEvent],
        context: Context,
    ) -> tuple[list[str], list[BaseEvent]]:
        ...
```

A policy receives the current `prompts` and `events`, returns modified copies. Policies compose left-to-right: each one sees the output of the previous. They must be pure - side-effect-free, idempotent where possible, and they must not emit events onto the stream (with one exception: `AlertPolicy` emits `HaltEvent` on FATAL, documented below).

## Wiring policies onto an Agent

Pass your policy list via the `assembly=` keyword when constructing an Agent. The Agent wires an internal `AssemblerMiddleware` at the outermost position of the middleware chain automatically.

```python
from autogen.beta import Agent
from autogen.beta.policies import (
    AlertPolicy,
    WorkingMemoryPolicy,
    SlidingWindowPolicy,
)
from autogen.beta.config import OpenAIConfig

agent = Agent(
    "assistant",
    config=OpenAIConfig(model="gpt-5"),
    assembly=[
        WorkingMemoryPolicy(),
        AlertPolicy(),
        SlidingWindowPolicy(max_events=50),
    ],
)
```

Inside each turn the `AssemblerMiddleware`:

1. Builds the `prompts` / `events` pair from the current context.
2. Runs every policy in order, piping the output of each into the next.
3. Temporarily swaps `context.prompt` for the assembled version while the LLM call runs.
4. Restores the original prompt afterward.

!!! tip
    `AssemblerMiddleware` and `AssemblyPolicy` live in `autogen.beta.assembly` if you need to wire the middleware manually (for example, inside a custom harness that isn't built on `Agent`).

## Ordering matters

Assembly policies split into two kinds:

| Kind | Purpose | Examples |
|---|---|---|
| **Injection** | Add context to `prompts` | `AlertPolicy`, `WorkingMemoryPolicy`, `EpisodicMemoryPolicy` |
| **Reduction** | Trim `events` | `SlidingWindowPolicy`, `TokenBudgetPolicy`, `ConversationPolicy` |

The rule: **injection before reduction**. If a reducer runs first, the injections it should have included in its budget don't exist yet.

`AssemblerMiddleware.validate_order()` catches known bad orderings and returns a list of warnings:

```python
from autogen.beta.assembly import AssemblerMiddleware
from autogen.beta.policies import AlertPolicy, SlidingWindowPolicy

policies = [
    SlidingWindowPolicy(max_events=20),  # reduction first - wrong
    AlertPolicy(),
]
warnings = AssemblerMiddleware.validate_order(policies)
for w in warnings:
    print(w)
```

## Built-in policies

All six built-ins are importable from `autogen.beta.policies`.

### ConversationPolicy

Keeps only conversation and tool events (`ModelRequest`, `ModelResponse`, `ToolCallEvent`, `ToolResultEvent`, `ToolResultsEvent`, `ToolErrorEvent`, plus `CompactionSummary`). Drops alerts, lifecycle events, observer output - anything the LLM does not need to see.

```python
from autogen.beta.policies import ConversationPolicy

policy = ConversationPolicy()
```

Takes no arguments. Effectively an allowlist - add a new event type to the stream and it is filtered out by default.

### SlidingWindowPolicy

Keeps only the last `max_events` events. Skips leading orphaned `ToolResultsEvent` entries so the window never starts on an unmatched tool result.

```python
from autogen.beta.policies import SlidingWindowPolicy

policy = SlidingWindowPolicy(max_events=50, transparent=True)
```

Set `transparent=True` to append a prompt note like `"[sliding_window] Showing last 50 of 123 events."` - useful while tuning.

### TokenBudgetPolicy

Keeps the newest events that fit in an estimated token budget. Estimation is `len(str(event)) / chars_per_token` - cheap, not perfectly accurate. Use it as a safety net, not an exact meter.

```python
from autogen.beta.policies import TokenBudgetPolicy

policy = TokenBudgetPolicy(max_tokens=32_000, chars_per_token=4, transparent=True)
```

### AlertPolicy

Delivers [ObserverAlerts](observers.md) to the model. Each new alert is formatted into the prompt once (deduplicated on `(source, severity, message)`), and **FATAL** alerts additionally emit a `HaltEvent` onto the stream so the surrounding loop can short-circuit.

```python
from autogen.beta.policies import AlertPolicy

policy = AlertPolicy()
```

Takes no arguments. Dedup state lives on the instance - give each Agent its own `AlertPolicy`.

!!! note
    `AlertPolicy` is what bridges the [Observer](observers.md) system into the LLM. Without it, `ObserverAlert` events sit on the stream but never reach the model. Place it after other injection policies and before reduction policies.

### WorkingMemoryPolicy

Reads `/memory/working.md` from the [KnowledgeStore](knowledge_store.md) and injects it into the prompt. Working memory is the agent's persistent state - written between conversations by an aggregation strategy and read on every turn.

```python
from autogen.beta.assembly import AssemblerMiddleware
from autogen.beta.policies import WorkingMemoryPolicy
from autogen.beta.knowledge import KnowledgeStore, MemoryKnowledgeStore

store = MemoryKnowledgeStore()
await store.write("/memory/working.md", "- user prefers metric\n- timezone: Australia/Melbourne")

agent = Agent(
    "assistant",
    config=config,
    dependencies={KnowledgeStore: store},
    middleware=[lambda e, c: AssemblerMiddleware(e, c, policies=[WorkingMemoryPolicy()])],
)
```

The policy looks up the store by type (`context.dependencies.get(KnowledgeStore)`) - if no store is registered, it's a no-op.

### EpisodicMemoryPolicy

Reads the most recent summaries under `/memory/conversations/` and injects them. The companion reader for `ConversationSummaryAggregate`, which writes timestamped summary files to that path after each conversation.

```python
from autogen.beta.policies import EpisodicMemoryPolicy

policy = EpisodicMemoryPolicy(max_episodes=5, transparent=True)
```

Also requires a `KnowledgeStore` in `context.dependencies`; a no-op otherwise.

## A realistic chain

Typical production ordering - injections first, then `AlertPolicy`, then reduce:

```python
from autogen.beta.assembly import AssemblerMiddleware
from autogen.beta.policies import (
    AlertPolicy,
    EpisodicMemoryPolicy,
    SlidingWindowPolicy,
    WorkingMemoryPolicy,
)

policies = [
    WorkingMemoryPolicy(),            # inject persistent state
    EpisodicMemoryPolicy(max_episodes=3),  # inject past conversations
    AlertPolicy(),                    # inject observer alerts
    SlidingWindowPolicy(max_events=80),  # cap turn history
]
AssemblerMiddleware.validate_order(policies)  # returns [] - good order
```

## Writing a custom policy

Any object with a `name` and an `async apply(...)` method satisfies the protocol. Use it for domain-specific injection (project docs, RAG hits, on-call runbooks) or custom filtering:

```python
from autogen.beta import Context
from autogen.beta.events import BaseEvent

class RunbookPolicy:
    """Inject the on-call runbook as system context."""

    name = "runbook"

    def __init__(self, runbook: str) -> None:
        self._runbook = runbook

    async def apply(
        self,
        prompts: list[str],
        events: list[BaseEvent],
        context: Context,
    ) -> tuple[list[str], list[BaseEvent]]:
        return prompts + [f"## On-call Runbook\n\n{self._runbook}"], events
```

Drop it into the policy list alongside the built-ins.

!!! tip
    Custom policies are a better fit than [Middleware](../middleware.md) when you only need to shape the prompt or filter events - not to wrap the LLM call itself. Middleware is for retry, timeout, logging, rate limiting; policies are for context assembly.

---

# Compaction

Source: https://docs.ag2.ai/latest/docs/beta/advanced/compaction/

**Compaction** reduces a stream's event history to respect runtime constraints - event count or token budget. It is the constraint-respecting counterpart to [Aggregation](aggregation.md).

> Compaction removes. Aggregation creates. They are separate concerns.

## When to use it

Long-running conversations accumulate events faster than the model's context window can absorb. Use compaction to cap the size of history that flows into the next LLM call.

| Symptom | Use |
|---|---|
| History getting close to provider token limit | `TailWindowCompact` or `SummarizeCompact` |
| Need to keep recent events and forget old ones cheaply | `TailWindowCompact` |
| Want a short summary of old events to preserve context | `SummarizeCompact` |

## CompactStrategy protocol

Every strategy implements the same shape:

```python
from typing import Protocol
from autogen.beta import Context
from autogen.beta.events import BaseEvent
from autogen.beta.knowledge import KnowledgeStore

class CompactStrategy(Protocol):
    async def compact(
        self,
        events: list[BaseEvent],
        context: Context,
        store: KnowledgeStore | None,
    ) -> list[BaseEvent]:
        ...
```

Returns a new event list that **replaces** the current history. Strategies must preserve the causal ordering of retained events - no reshuffling.

## CompactTrigger

A dataclass describing when compaction should fire. Any configured threshold that is exceeded triggers compaction.

```python
from autogen.beta.compact import CompactTrigger

trigger = CompactTrigger(
    max_events=200,           # fire when history exceeds 200 events
    max_tokens=32_000,        # fire when estimated tokens exceed 32k
    chars_per_token=4,        # estimation constant (default 4)
)
```

Leaving a field at `0` disables that threshold. `CompactTrigger()` alone does nothing - you must opt into at least one condition.

`CompactTrigger` is a plain data object - it records when you want compaction to fire, but does not fire it. Strategies are invoked explicitly via `await strategy.compact(...)`.

## Built-in strategies

Both built-ins are importable from `autogen.beta.compact`.

### TailWindowCompact

Keeps the last N events, drops the rest. Zero LLM cost. Suitable when old context has diminishing value and recency is what matters.

```python
from autogen.beta.compact import TailWindowCompact
from autogen.beta.knowledge import MemoryKnowledgeStore

store = MemoryKnowledgeStore()
compact = TailWindowCompact(target=50)

retained = await compact.compact(events, ctx, store)
# retained is the last 50 events; older ones are dropped
# (and persisted to /log/{stream_id}.dropped-{n}.jsonl if store is passed)
```

Passing a `KnowledgeStore` is optional. If provided, dropped events are persisted to `/log/` as a numbered segment - see the [KnowledgeStore docs](knowledge_store.md) - so they can be replayed later via `EventLogWriter.load()`. If omitted, dropped events are discarded.

### SummarizeCompact

Summarizes the dropped portion via one LLM call, inserts a `CompactionSummary` event at the head of retained history. Use when you want to keep some sense of the old conversation instead of just forgetting it.

```python
from autogen.beta.compact import SummarizeCompact
from autogen.beta.config import OpenAIConfig
from autogen.beta.knowledge import MemoryKnowledgeStore

store = MemoryKnowledgeStore()
compact = SummarizeCompact(
    target=50,
    config=OpenAIConfig(model="gpt-5-mini"),  # cheap model recommended for summaries
)

retained = await compact.compact(events, ctx, store)
# retained[0] is a CompactionSummary event; retained[1:] are the last 50 originals
```

The summarization model is independent from the agent's main model - pick a smaller / cheaper one. Token usage is recorded on the strategy instance as `strategy.last_usage`.

!!! note
    Both built-ins snap the retained boundary to whole turns: a tool call and its result are never split. A tool cycle straddling the boundary is compacted as a unit, so the retained window can be slightly smaller than `target`. This keeps the retained history valid for providers that reject a tool result with no preceding call.

## CompactionSummary

The synthetic event inserted by `SummarizeCompact` at the head of history.

```python
from autogen.beta.compact import CompactionSummary

summary = CompactionSummary(
    summary="User asked about gardening and sourdough; decisions made about ...",
    event_count=42,
)
```

`CompactionSummary` is on the allowlist of [`ConversationPolicy`](assembly.md#conversationpolicy), so it survives the assembly chain. Each provider mapper then renders it as a user turn, so the summary reaches the LLM as visible context at the head of history.

## Wiring onto an Agent

Pass the strategy + trigger through [`KnowledgeConfig`](../agent_harness.md#knowledge-knowledgeconfig). The Agent wires a `_CompactionMiddleware` that fires the strategy automatically after each turn when the trigger threshold is crossed.

```python
from autogen.beta import Agent, KnowledgeConfig
from autogen.beta.compact import CompactTrigger, TailWindowCompact
from autogen.beta.config import OpenAIConfig
from autogen.beta.knowledge import MemoryKnowledgeStore

store = MemoryKnowledgeStore()
agent = Agent(
    "assistant",
    config=OpenAIConfig(model="gpt-5"),
    knowledge=KnowledgeConfig(
        store=store,
        compact=TailWindowCompact(target=100),
        compact_trigger=CompactTrigger(max_events=200),
    ),
)
```

Every compaction attempt emits a triple on the agent's stream:

| Event | When | Use it to |
|---|---|---|
| `CompactionStarted` | Just before `compact()` runs | Mark the start of work; carries `strategy` / `event_count` |
| `CompactionCompleted` | `compact()` returned and history was replaced | Read `events_before` / `events_after` / `usage` |
| `CompactionFailed` | `compact()` raised | Inspect `error_type` + `error`; the history is left untouched and the agent turn is **not** interrupted |

The failure path is the one that matters: the strategy exception is also logged via the module logger, but the stream event is the durable signal - subscribe to `CompactionFailed` if you want failed compactions to surface in your application's UI or alerting. (Aggregation emits the symmetric `AggregationStarted` / `AggregationCompleted` / `AggregationFailed` triple - see [Aggregation > Wiring onto an Agent](aggregation.md#wiring-onto-an-agent).)

## Driving a strategy directly

If you're not using `Agent` (custom harness, tests, one-off scripts), call `await strategy.compact(...)` yourself:

```python
from autogen.beta.compact import CompactTrigger, TailWindowCompact

trigger = CompactTrigger(max_events=200)
compact = TailWindowCompact(target=100)

async def after_turn(events, ctx, store):
    should = trigger.max_events and len(events) > trigger.max_events
    if should:
        events = await compact.compact(events, ctx, store)
    return events
```

For the token-based threshold, estimate with `sum(len(str(e)) for e in events) / trigger.chars_per_token`.

## Writing a custom strategy

Any object with an `async compact(events, ctx, store)` method satisfies the protocol. A couple of ideas:

- **Drop tool noise.** Keep `ModelRequest` / `ModelResponse`, drop `ToolCallEvent` / `ToolResultEvent` older than some boundary.
- **Priority retention.** Score events (e.g. keep every `ModelResponse` but decimate `ToolCallEvent`s).
- **Segmented summarization.** Run `SummarizeCompact` in chunks to produce multiple `CompactionSummary` events over time rather than one big one.

```python
from autogen.beta.events import BaseEvent, ToolCallEvent, ToolResultEvent
from autogen.beta.knowledge import KnowledgeStore

class DropOldToolEvents:
    """Keep conversation events; drop tool events older than the last K."""

    def __init__(self, keep_last_k: int = 20) -> None:
        self._k = keep_last_k

    async def compact(
        self,
        events: list[BaseEvent],
        context,
        store: KnowledgeStore | None,
    ) -> list[BaseEvent]:
        tool_types = (ToolCallEvent, ToolResultEvent)
        tool_indices = [i for i, e in enumerate(events) if isinstance(e, tool_types)]
        if len(tool_indices) <= self._k:
            return events
        drop_set = set(tool_indices[: -self._k])
        return [e for i, e in enumerate(events) if i not in drop_set]
```

---

# Aggregation

Source: https://docs.ag2.ai/latest/docs/beta/advanced/aggregation/

**Aggregation** extracts structured knowledge from raw events and writes it to the [KnowledgeStore](knowledge_store.md). It is the knowledge-organizing counterpart to [Compaction](compaction.md).

> Compaction removes. Aggregation creates. They are separate concerns.

Aggregation is how an agent builds up persistent state between conversations - the material that `WorkingMemoryPolicy` and `EpisodicMemoryPolicy` read back in on subsequent runs.

## When to use it

| You want the agent to | Use |
|---|---|
| Carry stable facts across conversations (user prefs, role, timezone) | `WorkingMemoryAggregate` + [`WorkingMemoryPolicy`](assembly.md#workingmemorypolicy) |
| Remember summaries of what happened in past sessions | `ConversationSummaryAggregate` + [`EpisodicMemoryPolicy`](assembly.md#episodicmemorypolicy) |
| Index artifacts for later retrieval | Write a custom strategy |

Each built-in is one half of a producer/consumer pair: the aggregate writes a file, the assembly policy reads it back on the next turn.

## AggregateStrategy protocol

Every strategy implements the same shape:

```python
from typing import Protocol
from autogen.beta import Context
from autogen.beta.events import BaseEvent
from autogen.beta.knowledge import KnowledgeStore

class AggregateStrategy(Protocol):
    async def aggregate(
        self,
        events: list[BaseEvent],
        context: Context,
        store: KnowledgeStore,
    ) -> None:
        ...
```

Aggregation returns nothing - its output lives in the knowledge store. Unlike `CompactStrategy`, the store is required (not optional).

## AggregateTrigger

A dataclass describing when aggregation should fire.

```python
from autogen.beta.aggregate import AggregateTrigger

trigger = AggregateTrigger(
    every_n_turns=10,      # aggregate every 10 LLM turns
    every_n_events=100,    # aggregate every 100 new events
    on_end=True,           # aggregate when the conversation ends
)
```

Each condition is independent. Setting a counter to `0` disables it. `AggregateTrigger()` with no arguments fires nothing - every condition is opt-in. `on_end` defaults to `False` because each strategy costs one LLM call per fire, and a typical setup pairs `ConversationSummaryAggregate` with `WorkingMemoryAggregate` (so `on_end=True` doubles the per-conversation cost).

Like `CompactTrigger`, this is a plain data object - it records when you want aggregation to fire, but does not fire it. Strategies are invoked explicitly via `await strategy.aggregate(...)`.

## Built-in strategies

Both built-ins are importable from `autogen.beta.aggregate` and both take a `ModelConfig` for a summarization LLM call. Use a smaller / cheaper model than the agent's main model.

### ConversationSummaryAggregate

Writes a timestamped summary of the conversation to `/memory/conversations/`. The companion to [`EpisodicMemoryPolicy`](assembly.md#episodicmemorypolicy), which reads from that directory.

```python
from autogen.beta.aggregate import ConversationSummaryAggregate
from autogen.beta.config import OpenAIConfig
from autogen.beta.knowledge import MemoryKnowledgeStore

store = MemoryKnowledgeStore()
strategy = ConversationSummaryAggregate(config=OpenAIConfig(model="gpt-5-mini"))

# After the conversation:
await strategy.aggregate(events, ctx, store)

# Produces a file like:
# /memory/conversations/20260420T091530_<stream_id>.md
```

Filenames are `{ISO timestamp}_{stream id}.md`, so lexicographic sort matches chronological sort - which is why `EpisodicMemoryPolicy(max_episodes=N)` can simply take the trailing N entries.

Token usage is recorded on the strategy instance as `strategy.last_usage`.

### WorkingMemoryAggregate

Updates `/memory/working.md` - the agent's single persistent state document. Reads the existing file, merges in context from recent events, writes the updated version back. The companion to [`WorkingMemoryPolicy`](assembly.md#workingmemorypolicy).

```python
from autogen.beta.aggregate import WorkingMemoryAggregate
from autogen.beta.config import OpenAIConfig
from autogen.beta.knowledge import MemoryKnowledgeStore

store = MemoryKnowledgeStore()
strategy = WorkingMemoryAggregate(config=OpenAIConfig(model="gpt-5-mini"))

# Refresh working memory at the end of a session:
await strategy.aggregate(events, ctx, store)
# /memory/working.md now reflects the latest context.
```

Unlike `ConversationSummaryAggregate`, this one is destructive toward its own prior output - each call overwrites `/memory/working.md` with the merged version. That is the point: working memory is a rolling single-file state, not an append log.

The default prompt is journal-style: *preserve facts that are still relevant, drop outdated content*. For other memory shapes - procedural memory (what tactics worked), reflection (what to do differently next time), or task-state memory - pass a `prompt=` template with `{existing}` and `{events}` placeholders:

```python
strategy = WorkingMemoryAggregate(
    config=OpenAIConfig(model="gpt-5-mini"),
    prompt=(
        "You maintain a research agent's working memory. Track tactics, "
        "not topical facts: which phrasings worked, which sources were "
        "reliable, which dead ends to avoid.\n\n"
        "## Current Notes\n{existing}\n\n## Latest Round\n{events}"
    ),
)
```

If a `prompt=` override is not enough - different storage path, multi-call extraction, schema-validated output - write a custom strategy (see [below](#writing-a-custom-strategy)). The protocol is small and the framework's wiring stays the same.

## Pairing with assembly policies

The intended pattern: aggregate at the end of a conversation (or on a cadence), then read back in on the next turn via the matching assembly policy.

```mermaid
flowchart LR
    A[Conversation events] --> B[Aggregate]
    B --> C[/KnowledgeStore/]
    C --> D[Policy]
    D --> E[Next LLM turn]
```

| Aggregate | File | Policy |
|---|---|---|
| `ConversationSummaryAggregate` | `/memory/conversations/{ts}_{id}.md` | [`EpisodicMemoryPolicy`](assembly.md#episodicmemorypolicy) |
| `WorkingMemoryAggregate` | `/memory/working.md` | [`WorkingMemoryPolicy`](assembly.md#workingmemorypolicy) |

The path constants (`WORKING_MEMORY_PATH`, `CONVERSATIONS_PREFIX`) are defined in `autogen.beta.knowledge`. Both sides of each pair use those constants so the producer/consumer contract is held together by types, not magic strings.

## Wiring onto an Agent

Pass the strategy + trigger through [`KnowledgeConfig`](../agent_harness.md#knowledge-knowledgeconfig). The Agent wires a `_AggregationMiddleware` that fires `aggregate()` automatically according to the trigger.

```python
from autogen.beta import Agent, KnowledgeConfig
from autogen.beta.aggregate import AggregateTrigger, ConversationSummaryAggregate
from autogen.beta.config import OpenAIConfig
from autogen.beta.knowledge import MemoryKnowledgeStore

store = MemoryKnowledgeStore()
summarizer_config = OpenAIConfig(model="gpt-5-mini")  # cheap model for summaries
agent = Agent(
    "assistant",
    config=OpenAIConfig(model="gpt-5"),
    knowledge=KnowledgeConfig(
        store=store,
        aggregate=ConversationSummaryAggregate(config=summarizer_config),
        aggregate_trigger=AggregateTrigger(every_n_turns=10, on_end=True),
    ),
)
```

Every aggregation attempt emits a triple on the agent's stream:

| Event | When | Use it to |
|---|---|---|
| `AggregationStarted` | Just before `aggregate()` runs | Mark the start of work in dashboards |
| `AggregationCompleted` | `aggregate()` returned | Read `strategy` / `usage` / `event_count` |
| `AggregationFailed` | `aggregate()` raised | Inspect `error_type` + `error`; the agent turn itself is not interrupted |

The failure path is the important one: the strategy exception is also logged via the module logger, but the stream event is the durable signal. Subscribe to `AggregationFailed` if you want failed aggregations to surface in your application's UI or alerting - relying on `AggregationCompleted` alone makes silent failures undebuggable.

Compaction emits the symmetric `CompactionStarted` / `CompactionCompleted` / `CompactionFailed` triple.

## Driving a strategy directly

If you're not using `Agent` (custom harness, tests, one-off scripts), call `await strategy.aggregate(...)` yourself:

```python
from autogen.beta.aggregate import AggregateTrigger, ConversationSummaryAggregate
from autogen.beta.config import OpenAIConfig

trigger = AggregateTrigger(every_n_turns=10, on_end=True)
strategy = ConversationSummaryAggregate(config=OpenAIConfig(model="gpt-5-mini"))

_turn_count = 0

async def after_turn(events, ctx, store, *, is_end: bool) -> None:
    global _turn_count
    _turn_count += 1
    should = (
        (trigger.every_n_turns and _turn_count % trigger.every_n_turns == 0)
        or (is_end and trigger.on_end)
    )
    if should:
        await strategy.aggregate(events, ctx, store)
```

## Writing a custom strategy

Any object with an `async aggregate(events, ctx, store)` method satisfies the protocol. Use it to extract domain-specific knowledge:

- **Extract facts.** Scan events for entity mentions, write `/memory/facts/{entity}.md`.
- **Build an index.** On each aggregation, append a row to `/memory/index.jsonl` for later RAG lookup.
- **Classify and tag.** Read events, ask the LLM for a category, store under `/memory/tags/{tag}/{timestamp}.md`.

```python
from autogen.beta.events import BaseEvent, ModelResponse
from autogen.beta.knowledge import KnowledgeStore

class ResponseLengthAggregate:
    """Track the length of each model response for later analysis."""

    async def aggregate(
        self,
        events: list[BaseEvent],
        context,
        store: KnowledgeStore,
    ) -> None:
        lengths = [
            len(e.message.content)
            for e in events
            if isinstance(e, ModelResponse) and e.message
        ]
        if not lengths:
            return
        stream_id = context.stream.id
        path = f"/memory/metrics/{stream_id}.txt"
        await store.write(path, "\n".join(str(n) for n in lengths))
```

!!! tip
    Aggregation strategies and assembly policies often come in pairs: the strategy writes a path, the policy reads that same path. If you add a new aggregate, consider also adding the reader policy - otherwise the data sits on disk with no way back into the prompt.

---

# AG-UI (Agent-User Interaction) Integration

Source: https://docs.ag2.ai/latest/docs/beta/ag-ui/index/

## Overview

The Agent-User Interaction (AG-UI) protocol standardizes how frontend applications communicate with agents.
In AG2, `autogen.ag_ui.AGUIStream` bridges a `ConversableAgent` to AG-UI event streams.

This solves common integration problems:

- Streaming agent output to UI clients
- Emitting tool-call lifecycle events
- Synchronizing shared state snapshots
- Supporting human-in-the-loop checkpoints through frontend actions and input-required flows

For protocol background, see [AG-UI Protocol introduction](https://docs.ag-ui.com/introduction).

## When to use AG-UI vs direct integration

| Approach | Use it when | Trade-offs |
| --- | --- | --- |
| AG-UI integration (`AGUIStream`) | You need streaming UI, tool rendering, shared state sync, and a protocol-compatible client ecosystem | Adds protocol event semantics you need to expose from your endpoint |
| Direct integration (custom REST/WebSocket contract) | You only need a narrow, app-specific API and will own protocol design end-to-end | You must define and maintain your own streaming/tool/state contract |

Use AG-UI when you want a reusable UI contract across clients and frameworks.

## Supported capabilities

Verified AG-UI features are supported in AG2:

* [x] Streaming text events (`TEXT_MESSAGE_START`, `TEXT_MESSAGE_CONTENT`, `TEXT_MESSAGE_END`, `TEXT_MESSAGE_CHUNK`)
* [x] [Backend tool lifecycle events](https://docs.copilotkit.ai/ag2/generative-ui/backend-tools) (`TOOL_CALL_START`, `TOOL_CALL_ARGS`, `TOOL_CALL_RESULT`, `TOOL_CALL_END`)
* [x] [Frontend-tool dispatch](https://docs.copilotkit.ai/ag2/generative-ui/frontend-tools) (`TOOL_CALL_CHUNK` for client tools in `RunAgentInput.tools`)
* [x] [Shared-state snapshots](https://docs.copilotkit.ai/ag2/shared-state) (`STATE_SNAPSHOT`) from context and agent state
* [x] [Human input checkpoints](https://docs.copilotkit.ai/ag2/human-in-the-loop) (`input_required` surfaced as user-visible message events)

## Installation

Install AG2 with AG-UI support:

```bash
pip install "ag2[ag-ui]"
```

## Basic server example

Use the manual-dispatch pattern when you want full control over auth, logging, and middleware:

```python
from fastapi import FastAPI, Header
from fastapi.responses import StreamingResponse

from autogen.beta import Agent
from autogen.beta.ag_ui import AGUIStream, RunAgentInput
from autogen.beta.config import OpenAIConfig

agent = Agent(
    name="support_bot",
    prompt="You help users with billing questions.",
    config=OpenAIConfig(model="gpt-4o-mini"),
)

stream = AGUIStream(agent)
app = FastAPI()

@app.post("/chat")
async def run_agent(
    message: RunAgentInput,
    accept: str | None = Header(None),
) -> StreamingResponse:
    return StreamingResponse(
        stream.dispatch(message, accept=accept),
        media_type=accept or "text/event-stream",
    )
```

Run it:

```bash
uvicorn run_ag_ui:app --reload --port 8000
```

!!! note "Simpler way"
    If you want to use ASGI endpoint without additional logic, you can use the `AGUIStream.build_asgi()` method to build an ASGI endpoint and mount it to your ASGI application.

    ```python linenums="1" hl_lines="5-6"
    from autogen.beta.ag_ui import AGUIStream
    from fastapi import FastAPI

    app = FastAPI()
    stream = AGUIStream(agent)
    app.mount("/chat", stream.build_asgi())
    ```

### Test the endpoint

```bash
curl -N -X POST http://127.0.0.1:8000/chat \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "thread_id": "thread-1",
    "run_id": "run-1",
    "messages": [{"id": "m1", "role": "user", "content": "Hello"}],
    "state": {},
    "context": [],
    "tools": []
  }'
```

Example stream (truncated):

```text
data: {"type":"RUN_STARTED","threadId":"thread-1","runId":"run-1",...}
data: {"type":"TEXT_MESSAGE_CHUNK","delta":"Hello! How can I help?",...}
data: {"type":"RUN_FINISHED","threadId":"thread-1","runId":"run-1",...}
```

## UI clients

Any AG-UI client works with this endpoint.

For React/Next.js UIs, CopilotKit is the recommended client path in AG2 docs because it provides:

- Streaming chat components
- Tool UI rendering hooks/components
- Shared state patterns for interactive workflows

Start from the [CopilotKit UI quickstart](./copilotkit-quickstart).

## AG-UI Dojo

For protocol-level testing and event inspection, use the AG2 Dojo profile:

- [AG2 Dojo - agentic_chat](https://dojo.ag-ui.com/ag2/feature/agentic_chat)

## Next steps

1. Build the AG-UI endpoint from the minimal example above.
2. Follow the [CopilotKit UI quickstart](./copilotkit-quickstart) to connect a React/Next.js client.
3. Validate runtime behavior with the [AG2 Dojo - agentic_chat](https://dojo.ag-ui.com/ag2/feature/agentic_chat).

---

# CopilotKit UI Quickstart for AG-UI

Source: https://docs.ag2.ai/latest/docs/beta/ag-ui/copilotkit-quickstart/

This quickstart shows how to connect a CopilotKit React/Next.js UI to an AG2 backend endpoint that speaks the AG-UI protocol.

## What you'll build

- A Python backend that serves an AG-UI endpoint for an AG2 agent
- A React/Next.js UI powered by CopilotKit

## Prerequisites

- Python 3.10+
- Node.js 18.18+
- An LLM API key for your AG2 agent (for example `OPENAI_API_KEY`)

## Quickstart

You can either bootstrap a template project, or follow the same structure as the AG2 + CopilotKit starter.

=== "Bootstrap with CopilotKit"
    CopilotKit can bootstrap a template project:

    ```sh
    npx copilotkit@latest create -f ag2
    ```

    After initialization, run the backend and UI that were generated for you.

=== "Clone the reference starter"
    Runnable reference implementation: [AG2 + CopilotKit starter](https://github.com/ag2ai/ag2-copilotkit-starter).

    The updated template structure used by the starter looks like:

    ```text
    ag2-copilotkit-starter/
    ├── agent-py/     # Python backend (AG2 agent + AG-UI endpoint)
    ├── ui-react/     # React + CopilotKit frontend
    ```

## 1) Start the AG-UI backend

The starter backend mounts the AG-UI endpoint at `/chat` and runs on port `8008`.

```sh
cd agent-py
pip install -r requirements.txt
export OPENAI_API_KEY="your_openai_api_key"
python backend.py
```

Your AG-UI endpoint will be available at `http://localhost:8008/chat`.

## 2) Start the React + CopilotKit UI

In a new terminal:

```sh
cd ui-react
npm install
npm run dev
```

Then open `http://localhost:3000`.

## 3) Connect CopilotKit runtime to the AG-UI endpoint

CopilotKit uses a Next.js route (typically `/api/copilotkit`) that bridges the UI to your agent runtime. In the template, that route registers an AG-UI HTTP agent client with `CopilotRuntime`.

```tsx
import { HttpAgent } from "@ag-ui/client";
import {
  CopilotRuntime,
  ExperimentalEmptyAdapter,
  copilotRuntimeNextJSAppRouterEndpoint,
} from "@copilotkit/runtime";
import { NextRequest } from "next/server";

const agent = new HttpAgent({ url: "http://localhost:8008/chat" });

const runtime = new CopilotRuntime({
  agents: {
    weather_agent: agent,
  },
});

export async function POST(req: NextRequest) {
  const { handleRequest } = copilotRuntimeNextJSAppRouterEndpoint({
    runtime,
    serviceAdapter: new ExperimentalEmptyAdapter(),
    endpoint: "/api/copilotkit",
  });
  return handleRequest(req);
}
```

## 4) Add the `CopilotKit` provider

Wrap your app with `<CopilotKit>` and point it at the runtime route.

```tsx
import { CopilotKit } from "@copilotkit/react-core";
import "@copilotkit/react-ui/styles.css";
import "./globals.css";

export const metadata = {
  title: "AG2 Weather Agent",
  description: "Weather agent powered by AG2 and CopilotKit",
};

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return (
    <html lang="en">
      <body>
        <CopilotKit agent="weather_agent" runtimeUrl="/api/copilotkit">
          {children}
        </CopilotKit>
      </body>
    </html>
  );
}
```

If your backend enforces CORS/auth, configure those before starting the UI (see Production notes).

## 5) Render a chat UI

```tsx
"use client";

import { CopilotChat } from "@copilotkit/react-ui";
import { useCopilotAction } from "@copilotkit/react-core";

function WeatherCard({
  location,
  temperature,
  feelsLike,
  humidity,
  windSpeed,
  windGust,
  conditions,
  isLoading,
}: {
  location?: string;
  temperature?: number;
  feelsLike?: number;
  humidity?: number;
  windSpeed?: number;
  windGust?: number;
  conditions?: string;
  isLoading: boolean;
}) {
  const tempF = temperature != null ? (temperature * 9 / 5 + 32).toFixed(1) : null;

  return (
    <div
      className={`rounded-lg border border-sky-300 bg-gradient-to-br from-sky-100 to-blue-100 p-4 max-w-xs shadow-md ${isLoading ? "animate-pulse" : ""}`}
    >
      <h3 className="text-lg font-semibold text-sky-700">
        {location || "Loading..."}
      </h3>
      <p className="text-xs text-sky-500 uppercase tracking-wide mb-3">
        {isLoading ? "Fetching weather..." : "Current Weather"}
      </p>

      <div className="flex items-start justify-between mb-3">
        <div>
          <div className="text-4xl font-bold text-gray-800">
            {temperature != null ? temperature : "--"}
            <span className="text-lg text-sky-600">&deg;C</span>
          </div>
          {tempF && <div className="text-sm text-gray-500">{tempF}&deg;F</div>}
        </div>
        <div className="text-sm text-gray-500 text-right max-w-[120px]">
          {conditions || "--"}
        </div>
      </div>

      <div className="grid grid-cols-3 gap-2 pt-3 border-t border-sky-200">
        <div className="text-center">
          <div className="text-[10px] text-gray-500 uppercase">Humidity</div>
          <div className="text-sm font-mono text-gray-800">
            {humidity != null ? `${humidity}%` : "--%"}
          </div>
        </div>
        <div className="text-center">
          <div className="text-[10px] text-gray-500 uppercase">Wind</div>
          <div className="text-sm font-mono text-gray-800">
            {windSpeed != null ? `${windSpeed} km/h` : "-- km/h"}
          </div>
        </div>
        <div className="text-center">
          <div className="text-[10px] text-gray-500 uppercase">Feels Like</div>
          <div className="text-sm font-mono text-gray-800">
            {feelsLike != null ? `${feelsLike}\u00B0` : "--\u00B0"}
          </div>
        </div>
      </div>
    </div>
  );
}

export default function Home() {
  useCopilotAction({
    name: "get_weather",
    description: "Get the weather for a given location.",
    available: "disabled",
    parameters: [{ name: "location", type: "string", required: true }],
    render: ({ args, status, result }) => {
      if (status === "complete" && result) {
        let data = result;
        if (typeof result === "string") {
          try {
            data = JSON.parse(result.replace(/'/g, '"'));
          } catch {
            return <div>{result}</div>;
          }
        }
        return (
          <WeatherCard
            location={data.location}
            temperature={data.temperature}
            feelsLike={data.feelsLike}
            humidity={data.humidity}
            windSpeed={data.windSpeed}
            windGust={data.windGust}
            conditions={data.conditions}
            isLoading={false}
          />
        );
      }
      return <WeatherCard location={args.location} isLoading={true} />;
    },
  });

  return (
    <div className="flex items-center justify-center min-h-screen bg-gradient-to-b from-sky-100 to-blue-200">
      <div className="w-full max-w-2xl h-[80vh] rounded-xl overflow-hidden shadow-2xl border border-sky-300">
        <CopilotChat
          labels={{ "{{" }}
            title: "AG2 Weather Agent",
            initial: "Hi! Ask me about the weather in any city.",
            placeholder: "Ask about the weather...",
          {{ "}}" }}
          className="h-full"
        />
      </div>
    </div>
  );
}
```

## Expected output

After setup:

- The Next.js page shows a CopilotKit chat UI
- Messages stream from the AG2 runtime via the AG-UI endpoint
- Tool/action calls can be rendered as custom UI (for example, a weather card in the starter)

## Production notes

- CORS: allow your frontend origin on the AG-UI backend (`POST`, `OPTIONS`, auth headers).
- Auth: protect both `/api/copilotkit` and backend `/chat` (token/header/cookie); do not rely on client-only secrets.
- Deployment topology: keep frontend runtime route and AG-UI backend on trusted internal network paths where possible.
- Timeouts/retries: configure conservative client and server timeouts for long-running tool workflows and retry only idempotent requests.

## Troubleshooting

- CORS errors (`blocked by CORS policy`): check backend `Access-Control-Allow-Origin`, `Access-Control-Allow-Headers`, and preflight handling.
- No streaming output: verify backend response `Content-Type` is `text/event-stream` and that proxy layers do not buffer SSE.
- Tool UI not rendering: ensure tool/action names match exactly (for example `get_weather`).

## Security considerations

Treat tool inputs and shared state as untrusted user-controlled data. Validate and authorize server-side before invoking privileged tools, and log tool execution with request identifiers for auditability.

## Version and compatibility notes

- For a complete working example (backend + React UI + HTML UI), see the starter repo referenced above.

---

# AG-UI backend deep dive

Source: https://docs.ag2.ai/latest/docs/beta/ag-ui/backend-deepdive/

This page expands on the backend of the AG-UI integration: how to secure your endpoint, how tool calls are represented in the protocol, and how to pass per-request context into your tools.

If you haven't set up an AG-UI endpoint yet, start with the [AG-UI overview](../){.internal-link}.

## Authentication

Because AG-UI integration works on top of simple HTTP SSE endpoints, you can use the same authentication mechanisms as you would for any other HTTP endpoint - validate headers/tokens before streaming events.

Example: protect `/chat` with a shared token header.

```python
from typing import Annotated

from fastapi import FastAPI, Header, HTTPException
from fastapi.responses import StreamingResponse

from autogen.beta import Agent
from autogen.beta.ag_ui import AGUIStream, RunAgentInput
from autogen.beta.config import OpenAIConfig

agent = Agent(
    name="support_bot",
    prompt="You help users with billing questions.",
    config=OpenAIConfig(model="gpt-4o-mini"),
)

stream = AGUIStream(agent)
app = FastAPI()

@app.post("/chat")
async def run_agent(
    message: RunAgentInput,
    token: Annotated[str, Header(..., description="Authentication token")],
    accept: str | None = Header(None),
) -> StreamingResponse:
    if token != "1234567890":
        raise HTTPException(status_code=401, detail="Invalid token")

    return StreamingResponse(
        stream.dispatch(message, accept=accept),
        media_type=accept or "text/event-stream",
    )
```

Notes:

- **Do not** put secrets (API keys, auth tokens) in the browser bundle. Protect the Next.js runtime route (`/api/copilotkit`) and forward credentials server-to-server.
- If you don't need auth/middleware, you can mount the generated ASGI endpoint with `AGUIStream.build_asgi()`, but you'll have less control over request handling.

## Tools context

You often need to pass **per-request context** (user ID, org ID, plan, permissions) into tools without baking it into prompts or trusting client-provided text.

AG2 supports this via `ContextVariables`. Your tool can accept a `context` parameter, and you provide values when dispatching.

```python
from autogen.beta import Agent, Context, tool
from autogen.beta.ag_ui import AGUIStream, RunAgentInput
from autogen.beta.config import OpenAIConfig

@tool
def get_user_profile(user_id: Context) -> str:
    user_id = context.variables.get("user_id")
    return f"User profile for user {user_id}"

agent = Agent(
    name="profile_bot",
    prompt="You can look up a user profile when needed.",
    config=OpenAIConfig(model="gpt-4o-mini"),
    tools=[get_user_profile],
)

stream = AGUIStream(agent)

async def dispatch_with_context(message: RunAgentInput):
    return stream.dispatch(message, variables={"user_id": "1234567890"})
```

## Backend tools (Python functions)

Backend tools are regular Python callables registered on the agent (for example via `functions=[...]`). When the agent invokes a tool during a run, the AG-UI stream emits tool lifecycle events that a UI can render in real time.

Example backend tool:

```python
from autogen.beta import Agent, tool
from autogen.beta.config import OpenAIConfig

@tool
def calculate_sum(a: int, b: int) -> int:
    """Adds two numbers and returns the result."""
    return a + b

agent = Agent(
    name="calculator",
    tools=[calculate_sum],
    config=OpenAIConfig(model="gpt-4o-mini"),
)
```

In an AG-UI-compatible UI, you typically render these as:

- Status updates (tool started / finished)
- Structured result cards (e.g., "Weather in Tokyo")
- Debug panels in development

## Frontend tools (UI-driven actions)

Frontend tools are defined by the UI/client and sent to the agent as part of the run payload (for example, CopilotKit "actions" / GenUI tools). They are useful for:

- **Generative UI** (custom cards, lists, buttons rendered from tool calls)
- **HITL** (human-in-the-loop) input flows (buttons, forms, confirmations)

In this setup:

- The **frontend** advertises available tools in `RunAgentInput.tools`.
- The **agent** can call those tools during the run.
- The **frontend** executes/handles the tool and renders the UI.

For a production-ready React/Next.js client that supports frontend tools, backend tools, streaming, and shared state, see the [CopilotKit UI quickstart](../copilotkit-quickstart){.internal-link} and CopilotKit's docs on:

- [Backend tools](https://docs.copilotkit.ai/ag2/generative-ui/backend-tools)
- [Frontend tools](https://docs.copilotkit.ai/ag2/generative-ui/frontend-tools)
- [Shared state](https://docs.copilotkit.ai/ag2/shared-state)

## See also

- [AG-UI overview](../){.internal-link}
- [CopilotKit quickstart](../copilotkit-quickstart){.internal-link}

---

# A2A Protocol Overview

Source: https://docs.ag2.ai/latest/docs/beta/a2a/overview/

The `autogen.beta.a2a` module exposes any AG2 `Agent` over the [Agent2Agent (A2A) protocol](https://a2a-protocol.org/) and lets one AG2 agent talk to a remote A2A endpoint as if it were a regular LLM provider. The protocol is transport-agnostic: the same agent can be served over JSON-RPC, HTTP+JSON (REST) or gRPC, and clients pick the binding from the published `AgentCard`.

## When To Use A2A

Reach for A2A when the agents you need to combine **don't live in the same Python process**:

- A remote agent owned by a different team (or a different runtime) that you want to call as a tool.
- A network of services that need a vendor-neutral, spec-defined wire format instead of bespoke HTTP APIs.
- A long-running task that should outlive a single client request and be polled / streamed by id.

For in-process multi-agent scenarios - strict turn order, governance, audit trails - see the [Multi-Agent Network](../network/overview.md) instead. A2A is the cross-process / cross-host complement, not a replacement.

## Mental Model

```
   ┌────────────────────────┐                  ┌────────────────────────┐
   │ Local Agent            │                  │ A2AServer              │
   │  config=A2AConfig(...) │  AgentCard       │  wraps an Agent        │
   │                        │  ◀──────────────▶│  + TaskStore           │
   │  send / sendStreaming  │  Task lifecycle  │  + (optional) push     │
   └──────────┬─────────────┘                  └────────────┬───────────┘
              │                                             │
              │  one of: jsonrpc / rest / grpc              │
              └──────────────────  network  ────────────────┘
```

The server publishes an `AgentCard` at `/.well-known/agent-card.json`. Every binding the server supports is listed in `card.supported_interfaces`; the client reads the card, picks one binding (via `prefer=...` or the first match) and uses it for the actual `message/send`, `tasks/get` and friends. URLs for individual transports are encoded in the card - the client never has to know them up front.

## Core Concepts

| Concept | Lives in | Purpose |
|---|---|---|
| `A2AServer` | `autogen.beta.a2a` | Wraps an `Agent` and produces a Starlette app (JSON-RPC / REST) or `grpc.aio.Server` |
| `build_card` | `autogen.beta.a2a` | Builds the `AgentCard` declaring which transports the server speaks |
| `A2AConfig` | `autogen.beta.a2a` | A `ModelConfig` - plug it into a local `Agent` to talk to a remote A2A server as its LLM |
| `TaskStore` | A2A SDK | Backs every transport on one server. Defaults to `InMemoryTaskStore` |
| `ClientToolsExtension` | `urn:ag2:client-tools:v1` | AG2-defined extension declared in the card. Lets a remote LLM call tools that live on the *client* |
| `list_tasks` / `get_task` / `cancel_task` | `autogen.beta.a2a.tasks` | Helpers for inspecting and aborting tasks on the remote server |
| `A2APushConfig` | `autogen.beta.a2a.push` | Webhook subscription config + CRUD helpers |
| `A2AEvent` family | `autogen.beta.a2a.events` | Typed wire events surfaced into the AG2 stream - subscribe for observability |

## Reading Order

1. [Server](server.md) - expose an existing `Agent` over A2A.
2. [Client](client.md) - connect to a remote A2A endpoint, multi-turn, client-side tools, `as_tool()`.
3. [Tasks & Push](tasks_and_push.md) - inspect/cancel tasks and manage push-notification webhooks.
4. [Advanced](advanced.md) - HITL via `input_required`, streaming reconnect, custom `AgentExecutor`.

## Public API

Top-level imports - the most commonly used entry points are re-exported from `autogen.beta.a2a`:

```python
from autogen.beta.a2a import A2AConfig, A2AServer, build_card
```

Sub-modules:

| Module | Contents |
|---|---|
| `autogen.beta.a2a.tasks` | `list_tasks`, `get_task`, `cancel_task`, `ListedTasks` |
| `autogen.beta.a2a.push` | `A2APushConfig`, `A2APushAuthentication`, CRUD helpers |
| `autogen.beta.a2a.security` | `Scheme` / `Requirement` types, scheme factories (`bearer_scheme`, `api_key_scheme`, `oauth2_scheme`, `mtls_scheme`, `http_auth_scheme`, `open_id_connect_scheme`) and the `require()` builder for `AgentCard` auth declarations |
| `autogen.beta.a2a.events` | Typed `A2AEvent` wrappers surfaced into the AG2 stream - `A2ATaskSnapshot`, `A2ATaskStatusUpdate`, `A2ATaskArtifactUpdate`, `A2ATextArtifact`, `A2AToolCallArtifact`, `A2AMessage` |
| `autogen.beta.a2a.errors` | Exception hierarchy - `A2AError`, `A2ATaskTerminalError`, `A2AReconnectError`, `A2AInvalidCardError`, `A2AClientToolsNotSupportedError` |
| `autogen.beta.a2a.transports` | `TransportName` - `Literal["jsonrpc", "rest", "grpc"]` - plus low-level `build_*` builders |
| `autogen.beta.a2a.testing` | In-process test helpers - `make_test_client_factory`, `make_test_rest_client_factory`, `pick_free_port` |

!!! note
    The A2A integration depends on [`a2a-sdk`](https://pypi.org/project/a2a-sdk/). Install with the `a2a` extras: `#!bash pip install "ag2[a2a]"`. gRPC is an additional optional dependency - install with `#!bash pip install "ag2[a2a-grpc]"`.

---

# Exposing an Agent as an A2A Server

Source: https://docs.ag2.ai/latest/docs/beta/a2a/server/

`A2AServer` wraps an existing `Agent` and produces a transport object you can serve directly. JSON-RPC is the default; the same `A2AServer` instance can also build REST and gRPC transports that share one task store.

## Minimal Server

The smallest end-to-end setup: an `Agent` with a tool, served over JSON-RPC on a single port via `uvicorn`.

```python
import uvicorn

from autogen.beta import Agent
from autogen.beta.a2a import A2AServer, build_card
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import tool

@tool(description="Add two integers and return the sum as a string.")
async def calc_add(a: int, b: int) -> str:
    return f"{a + b}"

async def main() -> None:
    agent = Agent(
        name="claude",
        config=AnthropicConfig(model="claude-sonnet-4-6"),
        tools=[calc_add],
    )
    server = A2AServer(agent)
    card = build_card(agent, url="http://127.0.0.1:8000")
    asgi = server.build_jsonrpc(url="http://127.0.0.1:8000", card=card)

    await uvicorn.Server(uvicorn.Config(asgi, host="127.0.0.1", port=8000)).serve()

```

After startup the agent card is reachable at `http://127.0.0.1:8000/.well-known/agent-card.json`. A client connects by passing that base URL to `A2AConfig(card_url=...)` - see the [Client](client.md) page.

## What `A2AServer` Holds

`A2AServer.__init__` materialises transport-agnostic state - the executor, the task store, optional push notifications. Transport-specific parameters (URL, paths, ports) live on the `build_*` methods.

| Constructor argument | Purpose |
|---|---|
| `agent` | The AG2 `Agent` exposed over A2A |
| `task_store` | Shared `TaskStore`. Defaults to a single `InMemoryTaskStore` reused across every `build_*` call |
| `push_config_store` | Enables push-notifications CRUD (see [Tasks & Push](tasks_and_push.md)). Optional |
| `push_sender` | Custom delivery sender. Defaults to no-op when not set |
| `extended_card` | Auth-aware extra metadata returned via `GetExtendedAgentCard` |
| `card_modifier` / `extended_card_modifier` | Per-request hooks that mutate the card before it's served |
| `executor` | Escape hatch - drop in a custom `AgentExecutor` (see [Advanced](advanced.md#custom-agentexecutor)) |

!!! note
    The default `InMemoryTaskStore` is materialised **once** at `__init__` time. This is what makes JSON-RPC, REST and gRPC bound to the same `A2AServer` see each other's tasks - a single store, three transports.

## Server-side Tools

Tools attached to the wrapped `Agent` execute on the server, just as they would for a local agent. The remote LLM picks them up automatically.

```python
from autogen.beta import Agent
from autogen.beta.a2a import A2AServer
from autogen.beta.config import AnthropicConfig
from autogen.beta.tools import tool
from autogen.beta.tools.builtin import WebSearchTool

agent = Agent(
    name="claude",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[
        WebSearchTool(),  # built-in provider tool - runs on Anthropic's side
        calc_add,         # @tool - runs on this server
    ],
)
server = A2AServer(agent)
```

For client-side tools - declared on the *caller* and forwarded back from the server when the LLM calls them - see [Client -> Local Tools](client.md#local-tools-forwarded-from-the-server).

## Choosing a Transport

The default `build_jsonrpc(...)` is the right choice for most setups. JSON-RPC is the most widely-supported A2A binding, works over any HTTP infrastructure (proxies, gateways, load balancers), and is what every A2A client implementation speaks first.

Reach for an alternative transport when:

| Transport | Use when |
|---|---|
| `build_jsonrpc` (default) | You want the simplest, most portable HTTP binding. Recommended start. |
| `build_rest` | You need a HTTP+JSON REST surface with stable URLs (logging, cache control, route-level auth in a gateway). |
| `build_grpc` | You need bidirectional streaming with low overhead, or your infra is gRPC-native. |

### REST

```python
card = build_card(agent, url="http://127.0.0.1:8001", transports=("rest",))
rest_app = server.build_rest(url="http://127.0.0.1:8001", card=card)
await uvicorn.Server(uvicorn.Config(rest_app, host="127.0.0.1", port=8001)).serve()
```

`build_rest(path_prefix="/v1")` mounts the routes under a sub-path; both the card and the dispatcher respect it.

### gRPC

```python
card = build_card(
    agent,
    url="grpc://127.0.0.1:50051",
    transports=("grpc",),
    grpc_url="grpc://127.0.0.1:50051",
)
grpc_server = server.build_grpc(
    bind="127.0.0.1:50051",
    grpc_url="grpc://127.0.0.1:50051",
    card=card,
)
await grpc_server.start()
await grpc_server.wait_for_termination()
```

`build_grpc` returns an unstarted `grpc.aio.Server` - the caller is responsible for `start()` and `wait_for_termination()`. `bind` is the listener address, `grpc_url` is the URL declared in the card (they're usually identical, but differ when the server sits behind a load balancer).

!!! note
    A2A v1.x has no `GetAgentCard` gRPC method - the public card is always served over HTTP at `/.well-known/agent-card.json`. So even for a gRPC-only server clients fetch the card via HTTP first, then switch to gRPC for the actual exchange. Plan your card URL accordingly.

### One Server, Three Transports

The same `A2AServer` instance can back any combination of transports. Build a single multi-transport `AgentCard` and call each `build_*` against it - they share one task store.

```python
from a2a.server.tasks import InMemoryPushNotificationConfigStore

server = A2AServer(agent, push_config_store=InMemoryPushNotificationConfigStore())
card = build_card(
    agent,
    url="http://127.0.0.1:8000",
    transports=("jsonrpc", "rest", "grpc"),
    rest_url="http://127.0.0.1:8001",
    grpc_url="grpc://127.0.0.1:50051",
)

asgi = server.build_jsonrpc(url="http://127.0.0.1:8000", card=card)
rest = server.build_rest(url="http://127.0.0.1:8001", card=card)
grpc = server.build_grpc(bind="127.0.0.1:50051", grpc_url="grpc://127.0.0.1:50051", card=card)

await grpc.start()
await asyncio.gather(
    uvicorn.Server(uvicorn.Config(asgi, host="127.0.0.1", port=8000)).serve(),
    uvicorn.Server(uvicorn.Config(rest, host="127.0.0.1", port=8001)).serve(),
    grpc.wait_for_termination(),
)
```

## Customising the `AgentCard`

`build_card(agent, url=...)` accepts a handful of optional kwargs to enrich the published card with discovery metadata and auth declarations.

| Argument | Purpose |
|---|---|
| `version` | Card version (defaults to `"1.0.0"`) |
| `description` | Free-form description. Defaults to the first entry of the agent's system prompt |
| `skills` | Explicit `Sequence[AgentSkill]`. When `None`, `build_card` walks `agent.tools` for any `SkillsToolkit` and publishes its local skills automatically; falls back to a single agent-derived skill if none are found |
| `push_notifications` | Toggles `capabilities.push_notifications` on the card |
| `provider` | `AgentProvider` block (organization, URL) |
| `documentation_url` / `icon_url` | Discovery metadata |
| `security` | Auth declarations - see below |
| `tenants` | `Mapping[TransportName, str]` - surface a per-transport tenant on the corresponding `AgentInterface.tenant` |
| `rest_url` / `rest_path_prefix` / `grpc_url` | Per-transport URL overrides for multi-transport cards |

### Declaring Authentication

`autogen.beta.a2a.security` ships factories for every A2A-recognised scheme. Each factory returns a typed `Scheme` object that carries its card-level binding name. Pass them to `require(...)` to build `Requirement` entries; `build_card` auto-derives the card's `security_schemes` from the schemes referenced in `security=` - no duplicate declarations.

```python
from autogen.beta.a2a import A2AServer, build_card
from autogen.beta.a2a.security import (
    bearer_scheme,
    api_key_scheme,
    require,
)

bearer = bearer_scheme(name="bearer", bearer_format="JWT")
api_key = api_key_scheme(name="x_api_key", key_name="X-API-Key", location="header")

card = build_card(
    agent,
    url="http://127.0.0.1:8000",
    security=[require(bearer), require(api_key)],
)
```

| Helper | Scheme |
|---|---|
| `bearer_scheme(name=..., bearer_format=..., description=...)` | HTTP Bearer (e.g. JWT) |
| `http_auth_scheme(name=..., scheme=..., ...)` | Any other HTTP auth scheme (basic, digest, custom bearer formats) |
| `api_key_scheme(name=..., key_name=..., location=...)` | API key in header / query / cookie. `name` is the card binding; `key_name` is the header/query/cookie key sent by the client. |
| `oauth2_scheme(name=..., flows=..., oauth2_metadata_url=...)` | OAuth2 wrapping a pre-built `OAuthFlows` |
| `open_id_connect_scheme(name=..., url=...)` | OpenID Connect discovery URL |
| `mtls_scheme(name=...)` | Mutual TLS client-cert auth |

#### Combining requirements: AND vs OR

The `security=` list holds independent rules - clients only need to satisfy **one** of them (entries are OR-ed). Inside a single `require(...)` call, **all** passed schemes must be presented together (arguments are AND-ed). Attach OAuth2/OIDC scopes via `scheme.with_scopes(...)`.

**Example A - accept Bearer OR API-key** (two separate `require()` calls):

```python
security=[
    require(bearer),
    require(api_key),
]
```

| Request headers | Accepted? |
|---|---|
| `Authorization: Bearer <jwt>` | ✅ matches first rule |
| `X-API-Key: <key>` | ✅ matches second rule |
| both headers present | ✅ either rule alone is enough |
| neither | ❌ no rule satisfied |

**Example B - require Bearer AND API-key together** (one `require()` with two args):

```python
security=[
    require(bearer, api_key),
]
```

| Request headers | Accepted? |
|---|---|
| only `Authorization: Bearer <jwt>` | ❌ missing API key |
| only `X-API-Key: <key>` | ❌ missing Bearer |
| both headers present | ✅ both args inside the same `require()` satisfied |

**Example C - mixing scopes** (OAuth2 needs scopes, Bearer doesn't):

```python
security=[
    require(bearer, oauth.with_scopes("read", "write")),
]
```

Scheme binding names are arbitrary strings - pass any value to `name=`, including non-identifier forms like `"X-My-Scheme"`:

```python
custom = bearer_scheme(name="X-My-Scheme")
require(custom)
```

!!! note
    `build_card` only **declares** auth on the card - it does not enforce it. Wire the actual check into the ASGI app (Starlette middleware, gateway, reverse proxy) or the gRPC server's interceptors.

## Adding Cross-cutting Middleware

A2A doesn't define server-side middleware. Attach CORS, auth or tracing directly to the returned transport object:

```python
from starlette.middleware.cors import CORSMiddleware

asgi = server.build_jsonrpc(url="http://127.0.0.1:8000")
asgi.add_middleware(CORSMiddleware, allow_origins=["*"])
```

For gRPC, attach interceptors when constructing the channel via `grpc.aio.Server` options on `build_grpc(options=...)`.

---

# Connecting to an A2A Server

Source: https://docs.ag2.ai/latest/docs/beta/a2a/client/

`A2AConfig` is a `ModelConfig` - pass it to a regular `Agent` and the remote A2A server becomes that agent's LLM provider. Conversation history, tool calls and streaming are negotiated through the protocol; calling code keeps the familiar `agent.ask(...)` / `reply.ask(...)` shape.

## Minimal Client

```python
from autogen.beta import Agent
from autogen.beta.a2a import A2AConfig

async def main() -> None:
    remote = Agent(
        "remote",
        config=A2AConfig(card_url="http://127.0.0.1:8000"),
    )
    reply = await remote.ask("Add 17 and 25 with calc_add. Just the number.")
    print(reply.response.content)  # -> "42"

```

`card_url` is the HTTP(S) base where the server publishes `/.well-known/agent-card.json`. The client fetches the card on first use, picks a binding from `supported_interfaces`, and uses the URL declared in the card for every subsequent request - you don't pass transport-specific URLs.

## Selecting a Transport

When the card declares multiple bindings, `prefer=...` forces a choice:

```python
config = A2AConfig(
    card_url="http://127.0.0.1:8000",
    prefer="grpc",  # one of "jsonrpc" | "rest" | "grpc"
)
```

`prefer=None` (default) auto-picks: if exactly one declared interface URL matches `card_url` it wins; otherwise the first server-listed interface is used.

!!! tip
    `card_url` is always an HTTP URL - even when the resolved transport is gRPC. The card is served over HTTP per spec; only the actual message exchange uses the resolved binding.

## Multi-turn - `reply.ask`

A2A servers are stateless from AG2's perspective: every call ships the full conversation history as a `application/vnd.ag2.history+json` `DataPart` attached to the outgoing message. The continuation API is the regular `reply.ask(...)`:

```python
remote = Agent("remote", config=A2AConfig(card_url="http://127.0.0.1:8000"))

reply = await remote.ask("Remember the number 42.")
reply = await reply.ask("What number did I ask you to remember?")
print(reply.response.content)
```

The remote agent recovers the entire prior context on every turn - there is no server-side session id to manage.

## Local Tools (forwarded from the server)

Tools declared on the **client** are advertised to the server in the AG2 client-tools extension. When the remote LLM picks one, the call is routed back to the client and executed locally; the tool result is sent back into the server-side LLM loop. The server LLM never sees your local environment.

```python
from datetime import datetime

from autogen.beta import Agent
from autogen.beta.a2a import A2AConfig
from autogen.beta.tools import tool

@tool(description="Return the user's local wall-clock time as ISO-8601.")
def get_local_time() -> str:
    return datetime.now().isoformat(timespec="seconds")

remote = Agent(
    "remote",
    config=A2AConfig(card_url="http://127.0.0.1:8000"),
    tools=[get_local_time],
)
reply = await remote.ask("What time is it on my machine? Use get_local_time.")
```

The same agent can mix client-side and server-side tools - server tools execute remotely, client tools execute locally, and the LLM picks freely between them within one turn.

!!! warning
    If the remote `AgentCard` does not advertise the `urn:ag2:client-tools:v1` extension, passing `tools=...` raises `A2AClientToolsNotSupportedError`. Only AG2-backed servers support client-side tool forwarding today.

## Remote Agent as a Sub-tool - `as_tool()`

A remote A2A agent plugs into a local `Agent` like any other delegate via `Agent.as_tool()`. The local LLM decides when to delegate; the wrapper exposes a `task_<name>` tool that takes an `objective` (and optional `context`).

```python
from autogen.beta import Agent
from autogen.beta.a2a import A2AConfig
from autogen.beta.config import AnthropicConfig

researcher = Agent(
    "researcher",
    config=A2AConfig(card_url="http://research.internal:8000"),
)

writer = Agent(
    "writer",
    prompt="Use the researcher tool to gather facts before writing a draft.",
    config=AnthropicConfig(model="claude-sonnet-4-6"),
    tools=[researcher.as_tool(description="Delegate research questions to the remote researcher.")],
)

reply = await writer.ask("Write a 3-paragraph brief on the latest A2A spec changes.")
```

This composes naturally with several remotes - give each `as_tool()` a distinct `name=` and let the local LLM route by capability. See [Sub-task Delegation](../task_delegation.md) for the general `as_tool()` semantics.

!!! note
    Each `task_<name>` call spawns a fresh sub-agent stream; history between calls is not preserved on the sub-task side. For a remote that remembers prior turns, prefer the `reply.ask(...)` pattern above instead of `as_tool()`.

## A2AConfig Reference

| Field | Type | Default | Purpose |
|---|---|---|---|
| `card_url` | `str` | required | Base URL where `/.well-known/agent-card.json` is served |
| `prefer` | `Optional[Literal["jsonrpc", "rest", "grpc"]]` | `None` | Force a specific binding when the card declares more than one |
| `streaming` | `bool` | `True` | Use `sendStreaming` when the server's card opts in. Falls back to polling otherwise |
| `headers` | `Optional[Mapping[str, str]]` | `None` | Extra HTTP headers (auth, tracing) |
| `timeout` | `Optional[float]` | `60.0` | Per-request timeout in seconds |
| `max_reconnects` | `int` | `3` | Streaming reconnect attempts (see [Advanced](advanced.md#streaming-reconnect)) |
| `reconnect_backoff` | `float` | `0.5` | Backoff between reconnect attempts (seconds) |
| `polling_interval` | `float` | `0.5` | Poll interval when streaming is off |
| `input_required_timeout` | `Optional[float]` | `None` | Cap how long the client waits on a HITL hook |
| `httpx_client_factory` | `Optional[Callable[[], AsyncClient]]` | `None` | Custom `httpx.AsyncClient` (proxies, custom TLS, etc.) |
| `interceptors` | `Sequence[ClientCallInterceptor]` | `()` | A2A SDK call interceptors |
| `grpc_channel_factory` | `Optional[Callable[[str], Channel]]` | `None` | Custom gRPC channel builder (defaults to insecure) |
| `preset_card` | `Optional[AgentCard]` | `None` | Skip the discovery round-trip when the card is already known |
| `tenant` | `Optional[str]` | `None` | Multi-tenancy scope on a shared backend |
| `history_length` | `Optional[int]` | `None` | Server-side hint to truncate echoed `Task.history` |

### Constructing From a Pre-fetched Card

When the card has already been resolved (discovery service, on-disk cache), `A2AConfig.from_card(...)` skips the network round-trip on connect:

```python
from autogen.beta.a2a import A2AConfig

config = A2AConfig.from_card(card, prefer="jsonrpc", timeout=30.0)
```

`card_url` defaults to the first interface URL on the card; pass `card_url=...` to override.

---

# Managing Tasks and Push Notifications

Source: https://docs.ag2.ai/latest/docs/beta/a2a/tasks_and_push/

A2A models every conversation as a `Task` with a server-side lifecycle. `autogen.beta.a2a.tasks` exposes helpers for inspecting and aborting tasks; `autogen.beta.a2a.push` manages webhook subscriptions for asynchronous delivery of task updates. All helpers accept an `A2AConfig` and work over any transport the server's card declares.

## Tasks

```python
from autogen.beta.a2a import A2AConfig
from autogen.beta.a2a.tasks import cancel_task, get_task, list_tasks

config = A2AConfig(card_url="http://127.0.0.1:8000")
```

### Listing

`list_tasks` returns a `ListedTasks` dataclass - the page's `tasks` plus the server-reported pagination metadata (`next_page_token`, `page_size`, `total_size`). Pagination is handled by the caller - pass `page_token` from a prior response back in to advance.

```python
from a2a.types import TaskState

page = await list_tasks(config, page_size=10)
for task in page.tasks:
    print(task.id, task.status.state)
if page.next_page_token:
    next_page = await list_tasks(config, page_size=10, page_token=page.next_page_token)

running = await list_tasks(config, status=TaskState.TASK_STATE_WORKING)
```

| Argument | Purpose |
|---|---|
| `tenant` | Scope the call to a specific tenant on a shared backend |
| `context_id` | Filter to a single conversation context |
| `status` | Filter by `TaskState` enum value (e.g. `TaskState.TASK_STATE_WORKING`) |
| `page_size` / `page_token` | Caller-driven pagination |
| `history_length` | Server-side hint to truncate echoed `Task.history` |
| `include_artifacts` | Include task artifacts in the response (default: `False`) |
| `status_timestamp_after` | Filter to tasks whose status timestamp is after this `datetime` |

### Fetching a Single Task

```python
full = await get_task(config, task_id)
truncated = await get_task(config, task_id, history_length=1)
```

`history_length` is a hint - some server versions ignore it and return the full history. Don't rely on it for security boundaries.

### Cancelling

```python
cancelled = await cancel_task(
    config,
    task_id,
    metadata={"reason": "operator override"},
)
```

`metadata` is forwarded to the server's cancel handler - useful for auditing who cancelled what and why. Already-terminal tasks (`COMPLETED`, `FAILED`, etc.) are typically refused with a `409`-style error; surface the exception, don't paper over it.

## Push Notifications

A2A push notifications let the server POST task updates to a webhook the client registered ahead of time - useful when the client doesn't want to hold an open SSE stream for the whole task lifetime.

!!! note
    The server must be constructed with a `push_config_store` for push CRUD to work. The default `A2AServer(agent)` does not enable push. See [Server -> What `A2AServer` Holds](server.md#what-a2aserver-holds).

### Registering a Webhook

```python
from autogen.beta.a2a.push import (
    A2APushAuthentication,
    A2APushConfig,
    create_push_notification_config,
)

push = A2APushConfig(
    url="https://hooks.example.com/a2a",
    token="webhook-token",
    authentication=A2APushAuthentication(scheme="bearer", credentials="abc..."),
)
created = await create_push_notification_config(config, task_id, push)
print(created.id)  # server-issued config id
```

`A2APushAuthentication` round-trips `scheme` / `credentials` through the wire format without loss. Receiving handlers verify these against the inbound request.

### Reading and Deleting

```python
from autogen.beta.a2a.push import (
    delete_push_notification_config,
    get_push_notification_config,
    list_push_notification_configs,
)

fetched = await get_push_notification_config(config, task_id, created.id)
listed = await list_push_notification_configs(config, task_id, page_size=10)
await delete_push_notification_config(config, task_id, created.id)
```

| Helper | Purpose |
|---|---|
| `create_push_notification_config` | Register a webhook for a task |
| `get_push_notification_config` | Fetch a single config by id |
| `list_push_notification_configs` | List configs registered for a task (paginated) |
| `delete_push_notification_config` | Remove a registered config |

## Multi-tenant Scoping

Both modules accept a per-call `tenant=...` kwarg that overrides the tenant baked into `A2AConfig` for that single request. Per-call overrides are also available via `context.variables["a2a:tenant"]` - useful when a single client serves multiple tenants on the same shared backend.

```python
acme_tasks = await list_tasks(config, tenant="acme-corp", page_size=10)
```

---

# A2A Advanced Topics

Source: https://docs.ag2.ai/latest/docs/beta/a2a/advanced/

Topics that aren't part of the day-to-day A2A path: human-in-the-loop, transparent reconnects on streaming drops, plugging in a custom executor, and the error hierarchy.

## Human-in-the-Loop via `input_required`

When a server-side executor raises `requires_input(...)` mid-task, the A2A server transitions the task to `TASK_STATE_INPUT_REQUIRED` and surfaces the prompt to the client. The AG2 client invokes the local agent's `hitl_hook` with the prompt and continues the same turn with the reply - the server prompt does **not** leak into the final response text.

```python
from autogen.beta import Agent
from autogen.beta.a2a import A2AConfig

async def hitl_hook() -> str:
    return input("server asks input> ")

remote = Agent(
    "remote",
    config=A2AConfig(card_url="http://127.0.0.1:8000"),
    hitl_hook=hitl_hook,
)
reply = await remote.ask("start")  # server may request input multiple times
print(reply.response.content)
```

`A2AConfig.input_required_timeout` caps how long the client waits on the hook. `None` (default) waits indefinitely - which matches the behaviour of `ConversationContext.input` in regular agents.

See [Human in the Loop](../context/human_in_the_loop.md) for the underlying `hitl_hook` contract.

## Streaming Reconnect

Streaming connections drop. Network blips, load balancer recycles, idle timeouts - the SSE channel breaks, and the client needs to recover without losing partially-streamed artifacts.

The A2A client keeps internal drive state across drops: on `A2AClientError` mid-stream it issues a fresh `subscribe` against the same `task_id`, deduplicates artifacts already seen by `artifact_id` (and messages by `message_id`), and continues from where it failed. Application code sees a single uninterrupted reply.

```python
remote = Agent(
    "remote",
    config=A2AConfig(
        card_url="http://127.0.0.1:8000",
        streaming=True,
        max_reconnects=3,
        reconnect_backoff=0.5,
    ),
)
reply = await remote.ask("a long answer")
```

| Field | Default | Purpose |
|---|---|---|
| `max_reconnects` | `3` | Total reconnect attempts before giving up with `A2AReconnectError` |
| `reconnect_backoff` | `0.5` | Seconds to wait between attempts (constant - no jitter or exponential) |

When attempts are exhausted the client raises `A2AReconnectError(attempts=N)`. Catch it to fall back to a polling re-fetch via `get_task` if you need recovery beyond the streaming budget.

## Custom `AgentExecutor`

`A2AServer` defaults to wrapping the supplied `Agent` in AG2's standard `AgentExecutor`. When you need behaviour that doesn't fit `Agent.ask` - a HITL-first turn, a multi-agent pipeline, a non-standard task lifecycle - drop in your own executor:

```python
from a2a.server.agent_execution import AgentExecutor as A2AAgentExecutorBase

from autogen.beta.a2a import A2AServer

class MyExecutor(A2AAgentExecutorBase):
    async def execute(self, request_context, event_queue) -> None:
        ...

    async def cancel(self, request_context, event_queue) -> None:
        ...

server = A2AServer(agent_stub, executor=MyExecutor())
```

The executor owns the entire request lifecycle: parsing the inbound message, emitting status updates and artifacts to the `event_queue`, and signalling terminal state. The wrapped agent passed to `A2AServer(agent_stub, ...)` is still used for card metadata (`name`, `description`, etc.) - make it a stub if a real agent doesn't fit your design.

## Errors

All A2A errors live in `autogen.beta.a2a.errors` and inherit from `A2AError`. Catch `A2AError` for everything; catch the specifics when you want to react differently.

| Exception | Raised when |
|---|---|
| `A2AError` | Base class - catch this for any A2A failure |
| `A2AInvalidCardError` | The card is missing data required to connect (no `supported_interfaces`, no usable URL) |
| `A2AClientToolsNotSupportedError` | Client passed `tools=` but the server card doesn't advertise the `urn:ag2:client-tools:v1` extension |
| `A2AReconnectError` | Streaming reconnect attempts exhausted - `err.attempts` holds the count |
| `A2ATaskTerminalError` | Base for the three terminal-state errors below; carries `err.task` with the final `Task` (status, history, artifacts) |
| `A2ATaskFailedError` | Task ended in `TASK_STATE_FAILED` |
| `A2ATaskRejectedError` | Task ended in `TASK_STATE_REJECTED` |
| `A2ATaskAuthRequiredError` | Task ended in `TASK_STATE_AUTH_REQUIRED`. Per A2A spec §7.6 the agent expects credentials out-of-band - apply them and retry |

Catch `A2ATaskTerminalError` to handle any terminal failure uniformly; switch on the concrete subclass when the recovery path differs (e.g. retry with auth on `AuthRequired`, surface the error to the operator on `Failed` / `Rejected`).

## Testing Helpers

`autogen.beta.a2a.testing` provides utilities for in-process A2A tests - no real socket, no port binding.

```python
from autogen.beta import Agent
from autogen.beta.a2a import A2AConfig, A2AServer
from autogen.beta.a2a.testing import make_test_client_factory

server = A2AServer(agent)
factory = make_test_client_factory(server, url="http://test")
remote = Agent("remote", config=A2AConfig(card_url="http://test", httpx_client_factory=factory))
await remote.ask("ping")
```

| Helper | Purpose |
|---|---|
| `make_test_client_factory(server, url=..., timeout=...)` | `httpx.AsyncClient` factory dispatching JSON-RPC into the server's ASGI app via `httpx.ASGITransport` |
| `make_test_rest_client_factory(server, url=..., timeout=...)` | Same idea for REST - builds the REST app and a card declaring only the REST interface |
| `pick_free_port(host="127.0.0.1")` | Probe a free TCP port. Used by gRPC tests since gRPC has no in-process transport equivalent |

```python
from autogen.beta.a2a.testing import pick_free_port

port = pick_free_port()
grpc_server = server.build_grpc(bind=f"127.0.0.1:{port}", grpc_url=f"grpc://127.0.0.1:{port}")
```

---

# Code Examples

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/code_examples/

End-to-end runnable scripts demonstrating `autogen.beta`. Each example is self-contained, instantiates a `GeminiConfig` directly, and exercises one or two specific harness primitives so you can read it top-to-bottom and copy what you need.

## Examples

| # | Page | Topic | Primitives covered |
|---|------|-------|--------------------|
| 01 | [Hello Agent](01_hello_agent.md) | Minimal Agent - one config, one ask | `Agent`, `GeminiConfig` |
| 02 | [Recipe Builder](02_recipe_builder.md) | Tools + Pydantic structured output | `Agent`, `tools=`, `response_schema=` |
| 03 | [Travel Planner](03_travel_planner.md) | Multi-turn chat via chained `reply.ask()` | `agent.ask()`, `reply.ask()` |
| 04 | [Token Watchdog](04_token_watchdog.md) | Built-in + custom observers, `ObserverAlert` | `BaseObserver`, `TokenMonitor`, `LoopDetector`, `EventWatch` |
| 05 | [Research Squad](05_research_squad.md) | Parallel subtasks + sibling delegation | `run_subtasks(parallel=True)`, `Agent.as_tool()` |
| 06 | [Journal Companion](06_journal_companion.md) | Persistent memory across runs | `KnowledgeConfig`, `WorkingMemoryAggregate`, `WorkingMemoryPolicy` |
| 07 | [Long-Doc Chat](07_long_doc_chat.md) | Composing assembly policies + compaction | `assembly=[...]`, `SlidingWindowPolicy`, `TokenBudgetPolicy`, `TailWindowCompact` |
| 08 | [Safety Guard](08_safety_guard.md) | FATAL alert -> `AlertPolicy` -> `HaltEvent` | `ObserverAlert(FATAL)`, `AlertPolicy`, `HaltEvent` |

Set `GEMINI_API_KEY` (or swap in another provider's `ModelConfig` - `AnthropicConfig`, `OpenAIConfig`, etc.) before running.

---

# Hello Agent

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/01_hello_agent/

The smallest possible end-to-end example: instantiate an `Agent` with one model config, call `ask()`, print the reply, then reuse the same Agent for a second turn. No tools, no harness primitives, no plugins - just the bare loop.

## What it covers

- Building an Agent from a name, prompt, and `ModelConfig`.
- Awaiting `agent.ask(...)` and reading `reply.body`.
- Reusing the same Agent for a follow-up `ask()` call.

## Primitives covered

- `Agent`
- `agent.ask()` / `reply.body`
- `GeminiConfig` (any `ModelConfig` works the same way)

## Source

```python
"""01 - Hello Agent

The smallest possible example: a bare ``Agent`` with a single LLM config and
one ``ask()`` call. No tools, no harness primitives, no plugins.

Run::

    .venv-beta/bin/python 01_hello_agent.py
"""

import asyncio

from autogen.beta import Agent
from autogen.beta.config import GeminiConfig

def section(title: str) -> None:
    print(f"\n── {title} ───")

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    section("Bare Agent - ask and print")

    agent = Agent(
        "greeter",
        prompt="You are a friendly but concise assistant. Reply in one sentence.",
        config=config,
    )

    reply = await agent.ask("Give me a single tip for learning to play chess.")
    print(reply.body)

    section("Reuse the Agent for another ask")

    reply2 = await agent.ask("And a tip for learning poker, in one sentence.")
    print(reply2.body)

if __name__ == "__main__":
    asyncio.run(main())
```

---

# Recipe Builder

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/02_recipe_builder/

A culinary assistant that rescales a classic carbonara recipe from 2 servings to 6. The Agent is given a custom Python function as a tool and a Pydantic model as its `response_schema`, so the final reply is a fully validated `Recipe` object instead of free-form text.

## What it covers

- Decorating a plain function (`scale_ingredient`) and registering it via `tools=[...]` so the LLM can call it.
- Passing a Pydantic class to `response_schema=` to coerce the model output into a typed object.
- Reading the typed result through `await reply.content(retries=1)` (with a retry on validation failure).

## Primitives covered

- `Agent` with `tools=` and `response_schema=`
- Plain Python `def` as a tool (auto-decorated)
- `pydantic.BaseModel` for structured output
- `reply.content(retries=...)` for schema-validated reads

## Source

```python
"""02 - Recipe builder - tools and structured output

Shows two core Agent features on top of the bare loop:

1. A custom ``@tool`` function the LLM can call (``scale_ingredient``).
2. A Pydantic ``response_schema`` so the final reply is a typed object.

Run::

    .venv-beta/bin/python 02_recipe_builder.py
"""

import asyncio

from pydantic import BaseModel, Field

from autogen.beta import Agent
from autogen.beta.config import GeminiConfig

def section(title: str) -> None:
    print(f"\n── {title} ───")

class Ingredient(BaseModel):
    name: str
    quantity: float
    unit: str

class Recipe(BaseModel):
    title: str = Field(description="Short human title for the recipe.")
    servings: int = Field(description="How many portions this recipe yields.")
    ingredients: list[Ingredient]
    steps: list[str] = Field(description="Ordered preparation steps.")

def scale_ingredient(quantity: float, factor: float) -> float:
    """Return ``quantity`` multiplied by ``factor``, rounded to 2 decimals.

    The model uses this any time it needs to rescale a recipe for a
    different number of servings.
    """
    return round(quantity * factor, 2)

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    section("Recipe builder - scale an existing dish for 6 servings")

    agent = Agent(
        "chef",
        prompt=(
            "You are a culinary assistant. When asked to rescale a recipe, "
            "use the scale_ingredient tool for every ingredient to compute the "
            "new quantity. Return a complete Recipe object."
        ),
        config=config,
        tools=[scale_ingredient],
        response_schema=Recipe,
    )

    reply = await agent.ask(
        "Start from classic carbonara for 2 servings: 200g spaghetti, 2 eggs, "
        "100g guanciale, 50g pecorino romano. Rescale it for 6 servings and "
        "produce the full Recipe."
    )

    recipe: Recipe | None = await reply.content(retries=1)

    if recipe is None:
        print("Model returned no body - try again.")
        return

    print(f"{recipe.title}  ({recipe.servings} servings)")
    print()
    print("Ingredients:")
    for ing in recipe.ingredients:
        print(f"  - {ing.quantity} {ing.unit} {ing.name}")
    print()
    print("Steps:")
    for i, step in enumerate(recipe.steps, 1):
        print(f"  {i}. {step}")

if __name__ == "__main__":
    asyncio.run(main())
```

---

# Travel Planner

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/03_travel_planner/

A travel-planner Agent walks through five turns of a single conversation: kicking off the trip, layering on budget and travel-mode constraints, swapping a day's activity, and producing a final itinerary summary. Each turn uses `reply.ask()` so the conversation context (constraints from earlier turns) carries forward without the caller re-supplying it.

## What it covers

- Holding a multi-turn conversation by chaining `reply.ask(...)` instead of calling `agent.ask(...)` each time.
- How conversation history persists across turns so the Agent doesn't "forget" the user's earlier constraints.
- The pattern of incremental refinement - start broad, add constraints, ask for revisions, then summarise.

## Primitives covered

- `Agent`
- `agent.ask()` for the first turn
- `reply.ask()` for follow-up turns (context-preserving)

## Source

```python
"""03 - Travel planner - multi-turn conversation

Chained ``reply.ask()`` builds on the same conversation history. The planner
remembers constraints from earlier turns without the caller re-supplying
context each time - the stream stays alive across the whole dialogue.

Run::

    .venv-beta/bin/python 03_travel_planner.py
"""

import asyncio

from autogen.beta import Agent
from autogen.beta.config import GeminiConfig

def section(title: str) -> None:
    print(f"\n── {title} ───")

TURNS = [
    "I want to plan a 5-day trip to Japan in late April. Just cherry-blossom season.",
    "Budget is around $2500 per person, two travellers. Optimise for sightseeing, not luxury.",
    "We prefer trains to flights once we're in Japan. Draft a day-by-day itinerary.",
    "Looks great. For day 3, swap the shopping stop for something outdoorsy in or near Kyoto.",
    "Summarize the final itinerary in a single bullet list, one line per day.",
]

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    agent = Agent(
        "travel-planner",
        prompt=(
            "You are a detail-oriented travel planner. When the user adds "
            "constraints, update the plan rather than starting over. Be "
            "concrete and concise."
        ),
        config=config,
    )

    section("Turn 1 - kick off")
    reply = await agent.ask(TURNS[0])
    print(reply.body)

    for i, question in enumerate(TURNS[1:], start=2):
        section(f"Turn {i} - {question}")
        reply = await reply.ask(question)
        print(reply.body)

if __name__ == "__main__":
    asyncio.run(main())
```

---

# Token Watchdog

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/04_token_watchdog/

A creative-writing Agent runs with three observers attached: two built-in (`TokenMonitor`, `LoopDetector`) and one custom (`AlertConsole`). The thresholds are deliberately low so a single ask trips the monitors and the custom observer prints every alert to stdout - giving you a live "watchdog dashboard" view of what the framework is noticing.

## What it covers

- Wiring multiple observers onto an Agent via `observers=[...]`.
- Using built-in observers (`TokenMonitor`, `LoopDetector`) to detect cost and repetition issues out of the box.
- Implementing a custom `BaseObserver` that subscribes to `ObserverAlert` events and reacts to them.
- Reading observer state after a run (`token_monitor.total_tokens`, `console.seen`).

## Primitives covered

- `BaseObserver` (custom subclass)
- Built-in observers: `TokenMonitor`, `LoopDetector`
- `ObserverAlert` events
- `EventWatch` for filtering which events the observer wakes on
- `MemoryStream` to capture a turn's events

## Source

```python
"""04 - Token watchdog - observers and alerts

Demonstrates three observer patterns running against a single Agent:

1. ``TokenMonitor`` - built-in, tallies usage and warns above a threshold.
2. ``LoopDetector`` - built-in, spots repetitive tool calls.
3. A hand-written ``BaseObserver`` that subscribes to ``ObserverAlert`` and
   prints a formatted dashboard line every time anything alerts.

Run::

    .venv-beta/bin/python 04_token_watchdog.py
"""

import asyncio

from autogen.beta import Agent
from autogen.beta import Context
from autogen.beta.config import GeminiConfig
from autogen.beta.events import BaseEvent, ObserverAlert
from autogen.beta.observer import BaseObserver, LoopDetector, TokenMonitor
from autogen.beta.stream import MemoryStream
from autogen.beta.watch import EventWatch

def section(title: str) -> None:
    print(f"\n── {title} ───")

class AlertConsole(BaseObserver):
    """Watches the stream for ObserverAlerts and prints them to stdout."""

    def __init__(self) -> None:
        super().__init__("alert-console", watch=EventWatch(ObserverAlert))
        self.seen: list[ObserverAlert] = []

    async def process(self, events: list[BaseEvent], ctx: Context) -> None:
        for event in events:
            if isinstance(event, ObserverAlert):
                self.seen.append(event)
                print(f"    [{event.severity.upper():<8}] {event.source}: {event.message}")
        return None  # Don't emit a follow-up alert

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    section("Watchdog - low thresholds so observers trip on a single ask")

    token_monitor = TokenMonitor(warn_threshold=50, alert_threshold=5_000)
    loop_detector = LoopDetector(window_size=5, repeat_threshold=2)
    console = AlertConsole()

    stream = MemoryStream()

    agent = Agent(
        "writer",
        prompt=("Write prose the user asks for. Favour variety - never repeat the same sentence twice."),
        config=config,
        observers=[token_monitor, loop_detector, console],
    )

    reply = await agent.ask(
        "Write three distinct 30-word paragraphs about springtime in Kyoto.",
        stream=stream,
    )

    print()
    print("Final reply (truncated):")
    print("   ", (reply.body or "")[:240], "...")
    print()
    print(f"Total tokens tracked by TokenMonitor: {token_monitor.total_tokens}")
    print(f"Alerts emitted this run:              {len(console.seen)}")

if __name__ == "__main__":
    asyncio.run(main())
```

---

# Research Squad

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/05_research_squad/

Two complementary multi-Agent patterns in one example. First, a coordinator opts in to the auto-injected `run_subtasks` tool (via `tasks=TaskConfig()`) to fan out three independent factual lookups concurrently in a single tool call. Then a second coordinator delegates arithmetic to a `math_expert` Agent exposed via `Agent.as_tool()`. Together they show "fan out then collect" alongside "named delegate".

## What it covers

- Opting in to `run_subtask` / `run_subtasks` via `tasks=TaskConfig(...)` - disabled by default; subtasks themselves never get them, so recursion is impossible by construction.
- Calling `run_subtasks(parallel=True)` to dispatch many sub-questions concurrently from one tool call.
- Wrapping an Agent with `Agent.as_tool()` to expose it as a named delegate (`task_math-expert`) on a parent's tool list.
- Subscribing to `TaskStarted` / `TaskCompleted` lifecycle events to observe the fan-out from outside the Agent.

## Primitives covered

- `Agent` with `tasks=TaskConfig(...)` to opt in to sub-task tools
- `run_subtasks(parallel=True)` for concurrent fan-out
- `Agent.as_tool(description=...)` for sibling delegation
- `TaskStarted` / `TaskCompleted` events on the stream
- `MemoryStream` + `stream.where(EventType).subscribe(...)`

## Source

```python
"""05 - Research squad - parallel subtasks and sibling delegation

Two patterns for multi-Agent orchestration:

1. **Opt-in subtask tools.** Pass ``tasks=TaskConfig(...)`` and the Agent
   gains ``run_subtask`` / ``run_subtasks``. The coordinator uses
   ``run_subtasks`` with ``parallel=True`` to fan out three short
   investigations concurrently. Spawned subtasks have **no** ``run_subtask``
   tools (they default to ``tasks=False``), so recursion is structurally
   impossible - no depth limiter needed.

2. **``Agent.as_tool()``.** A second Agent (``math_expert``) is exposed to
   the coordinator as a callable tool. The wrapped Agent has no
   ``run_subtask`` tools either (default), so recursion is bounded by the
   call structure.

Run::

    .venv-beta/bin/python 05_research_squad.py
"""

import asyncio
import time

from autogen.beta import Agent
from autogen.beta.agent import TaskConfig
from autogen.beta.config import GeminiConfig
from autogen.beta.events import TaskCompleted, TaskStarted
from autogen.beta.stream import MemoryStream

def section(title: str) -> None:
    print(f"\n── {title} ───")

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    section("Parallel subtasks - fan out three lookups in one tool call")

    coordinator = Agent(
        "coordinator",
        prompt=(
            "You answer multi-part questions by dispatching run_subtasks "
            "with parallel=True. Use one tool call with every sub-question "
            "packed into the 'tasks' list. Be concise."
        ),
        config=config,
        tasks=TaskConfig(),  # Opt in to run_subtask / run_subtasks.
    )

    # Collect subtask lifecycle events so we can show the fan-out to the user
    starts: list[TaskStarted] = []
    completions: list[TaskCompleted] = []
    stream = MemoryStream()
    stream.where(TaskStarted).subscribe(lambda e: starts.append(e))
    stream.where(TaskCompleted).subscribe(lambda e: completions.append(e))

    start = time.monotonic()
    reply = await coordinator.ask(
        "Use run_subtasks(parallel=True) to answer, in one tool call: "
        "(a) what is the tallest waterfall in the world, "
        "(b) what year was the Eiffel Tower completed, "
        "(c) what is the boiling point of nitrogen in Celsius. "
        "Then list all three answers.",
        stream=stream,
    )
    elapsed = time.monotonic() - start

    print(reply.body)
    print()
    print(f"Subtasks dispatched: {len(starts)}")
    print(f"Subtasks finished:   {len(completions)}")
    print(f"Wall time:           {elapsed:.2f}s (3 concurrent LLM calls)")

    section("Sibling delegation - math_expert is a tool on coordinator2")

    math_expert = Agent(
        "math-expert",
        prompt="You are an arithmetic specialist. Reply with only the number.",
        config=config,
    )

    coordinator2 = Agent(
        "coordinator2",
        prompt=(
            "When arithmetic comes up, delegate to the task_math-expert tool "
            "rather than computing yourself. Then present the answer in a "
            "complete sentence."
        ),
        config=config,
        tools=[
            math_expert.as_tool(
                description="Delegate arithmetic problems to the math expert.",
            )
        ],
    )

    reply2 = await coordinator2.ask("What is 237 times 19?")
    print(reply2.body)

if __name__ == "__main__":
    asyncio.run(main())
```

---

# Journal Companion

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/06_journal_companion/

A daily-journal Agent that genuinely remembers across runs - not by replaying conversation history, but by summarising each session into a `/memory/working.md` file in a `KnowledgeStore` and re-injecting that file at the start of every later conversation. Two sessions are run with different `Agent` instances pointed at the same store; the second one (a brand-new object) recalls what the user said in the first.

## What it covers

- Persistent agent memory backed by a `DiskKnowledgeStore` on disk.
- `WorkingMemoryAggregate` - an LLM-driven rollup that runs at the end of every conversation and writes `/memory/working.md`.
- `WorkingMemoryPolicy` - an assembly policy that reads `/memory/working.md` and injects it as context before every LLM call.
- The full `KnowledgeConfig(store=..., aggregate=..., aggregate_trigger=...)` shape and how it pairs with `assembly=`.
- The `AggregateTrigger(on_end=True)` cadence - fire once when the conversation ends.

## Primitives covered

- `KnowledgeConfig` with `store`, `aggregate`, `aggregate_trigger`
- `DiskKnowledgeStore`
- `WorkingMemoryAggregate` (rollup strategy)
- `AggregateTrigger(on_end=True)`
- Assembly policies: `WorkingMemoryPolicy`, `ConversationPolicy`
- Reading the store directly via `store.read("/memory/working.md")`

## Source

```python
"""06 - Journal companion - knowledge store with working memory

Persistent agent memory using the framework's three knowledge primitives:

- ``KnowledgeStore`` - virtual filesystem for agent state.
- ``WorkingMemoryAggregate`` - an LLM-driven summary rollup that runs at
  the end of every conversation and writes ``/memory/working.md``.
- ``WorkingMemoryPolicy`` - an assembly policy that reads
  ``/memory/working.md`` and injects it as context at the start of every
  subsequent conversation.

The agent therefore "remembers" what you told it even after a full restart,
because the state lives in the knowledge store - not in conversation
history.

Run::

    .venv-beta/bin/python 06_journal_companion.py
"""

import asyncio
import shutil
import tempfile
from pathlib import Path

from autogen.beta import Agent, KnowledgeConfig
from autogen.beta.aggregate import AggregateTrigger, WorkingMemoryAggregate
from autogen.beta.config import GeminiConfig
from autogen.beta.knowledge import DiskKnowledgeStore
from autogen.beta.policies import ConversationPolicy, WorkingMemoryPolicy

def section(title: str) -> None:
    print(f"\n── {title} ───")

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    # Use a fresh tempdir so the example is reproducible.
    workdir = Path(tempfile.mkdtemp(prefix="journal-companion-"))

    try:
        store = DiskKnowledgeStore(str(workdir))

        def build_agent() -> Agent:
            return Agent(
                "journal",
                prompt=(
                    "You are a supportive daily journal companion. Keep a "
                    "running understanding of what the user is working on. "
                    "Be brief and reference their past entries when relevant."
                ),
                config=config,
                knowledge=KnowledgeConfig(
                    store=store,
                    aggregate=WorkingMemoryAggregate(config=config),
                    # on_end=True: roll up working memory when each conversation finishes
                    aggregate_trigger=AggregateTrigger(on_end=True),
                ),
                assembly=[
                    WorkingMemoryPolicy(),  # inject /memory/working.md on every LLM call
                    ConversationPolicy(),  # then filter to conversation events
                ],
            )

        section("Session 1 - tell the journal what you're doing")

        agent1 = build_agent()
        r = await agent1.ask(
            "Today I started learning to build a home espresso setup. Still "
            "choosing between a Silvia Pro and a Linea Mini."
        )
        print(r.body)
        r = await r.ask(
            "Also started reading The Pragmatic Programmer. On chapter 2 about orthogonality. That's the whole update."
        )
        print(r.body)
        # When the `with await agent1.ask(...)` exits the `_execute`,
        # WorkingMemoryAggregate writes /memory/working.md.

        working = await store.read("/memory/working.md")
        print()
        print("## /memory/working.md after session 1")
        print(working)

        section("Session 2 - new Agent instance, same store: memory persists")

        agent2 = build_agent()
        r2 = await agent2.ask("Quick check-in: what was I working on? Answer in one line.")
        print(r2.body)
        # The answer should mention espresso and/or Pragmatic Programmer
        # even though agent2 is a brand-new object with no prior state,
        # because WorkingMemoryPolicy injected /memory/working.md as a prompt.
    finally:
        shutil.rmtree(workdir, ignore_errors=True)

if __name__ == "__main__":
    asyncio.run(main())
```

---

# Long-Doc Chat

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/07_long_doc_chat/

A "remember the last words" chat exercise that stress-tests the assembly chain. Three policies compose to control exactly what the LLM sees on each call - drop non-conversation events, hard-cap to the last 6 events, then enforce a token budget. A separate `TailWindowCompact` strategy keeps the underlying stream history small too. Watch the compaction events fire as the conversation grows.

## What it covers

- Composing multiple `AssemblyPolicy` instances in the order they should apply.
- `ConversationPolicy` to drop lifecycle/internal events from the LLM's view.
- `SlidingWindowPolicy(max_events=N, transparent=True)` for a hard event-count cap.
- `TokenBudgetPolicy(max_tokens=N)` as a belt-and-braces secondary cap.
- Combining the assembly chain with a `KnowledgeConfig` that wires `TailWindowCompact` for stream-history compaction (different from assembly!).
- Watching `CompactionCompleted` events fire from outside the Agent.

## Primitives covered

- `assembly=[ConversationPolicy(), SlidingWindowPolicy(...), TokenBudgetPolicy(...)]`
- `KnowledgeConfig` with `compact=TailWindowCompact(...)` + `compact_trigger=CompactTrigger(...)`
- `MemoryKnowledgeStore` (in-memory variant of the knowledge store)
- `CompactionCompleted` lifecycle event

## Source

```python
"""07 - Long-doc chat - composing assembly policies

Shows the assembly chain in action. Three policies compose in order:

1. ``ConversationPolicy`` - drops every event that isn't conversation or
   tool traffic (no lifecycle noise reaches the LLM).
2. ``SlidingWindowPolicy(max_events=6)`` - hard-caps the number of events
   forwarded to the LLM, so history can't grow unbounded.
3. ``TokenBudgetPolicy(max_tokens=2000)`` - character-based secondary cap,
   belt-and-braces against one huge event blowing the budget.

Also pairs the assembly chain with ``TailWindowCompact`` so the agent's
stream history itself (not just the view into it) is kept small.

Run::

    .venv-beta/bin/python 07_long_doc_chat.py
"""

import asyncio

from autogen.beta import Agent, KnowledgeConfig
from autogen.beta.compact import CompactTrigger, TailWindowCompact
from autogen.beta.config import GeminiConfig
from autogen.beta.events import CompactionCompleted
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.policies import (
    ConversationPolicy,
    SlidingWindowPolicy,
    TokenBudgetPolicy,
)
from autogen.beta.stream import MemoryStream

def section(title: str) -> None:
    print(f"\n── {title} ───")

QUESTIONS = [
    "Remember the word 'oak'.",
    "Remember the word 'river'.",
    "Remember the word 'lantern'.",
    "Remember the word 'sable'.",
    "Remember the word 'quartz'.",
    "Name the three most recent words I asked you to remember.",
]

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    store = MemoryKnowledgeStore()
    compactions: list[CompactionCompleted] = []
    stream = MemoryStream()
    stream.where(CompactionCompleted).subscribe(lambda e: compactions.append(e))

    agent = Agent(
        "lexicon",
        prompt=(
            "Be very terse - one short sentence per reply. "
            "Answer directly without calling any tools."
        ),
        config=config,
        assembly=[
            ConversationPolicy(),
            SlidingWindowPolicy(max_events=6, transparent=True),
            TokenBudgetPolicy(max_tokens=2000),
        ],
        knowledge=KnowledgeConfig(
            store=store,
            compact=TailWindowCompact(target=4),
            compact_trigger=CompactTrigger(max_events=8),
        ),
    )

    section("Long-doc chat - assembly policies trim what the LLM actually sees")

    reply = await agent.ask(QUESTIONS[0], stream=stream)
    print(f"Q1> {QUESTIONS[0]}")
    print(f"A1> {reply.body}")

    for i, q in enumerate(QUESTIONS[1:], start=2):
        reply = await reply.ask(q)
        print(f"Q{i}> {q}")
        print(f"A{i}> {reply.body}")

    print()
    print(f"Compactions fired during run: {len(compactions)}")
    for c in compactions:
        print(f"  - {c.strategy}: {c.events_before} -> {c.events_after} events")

if __name__ == "__main__":
    asyncio.run(main())
```

---

# Safety Guard

Source: https://docs.ag2.ai/latest/docs/beta/code_examples/08_safety_guard/

A custom `BaseObserver` (`PathGuardian`) watches every tool call and emits a `Severity.FATAL` `ObserverAlert` when an Agent tries to write to a forbidden path like `/etc/`. The alert routes through `AlertPolicy` -> `HaltEvent` -> `_HaltCheckMiddleware`, which short-circuits the next LLM call with a synthetic `HALTED: ...` response. The first ask (writing to `/tmp/...`) succeeds; the second (writing to `/etc/passwd`) is blocked end-to-end.

## What it covers

- Building a `BaseObserver` that watches a specific event type (here `ToolCallEvent`) via `EventWatch`.
- Returning an `ObserverAlert(severity=Severity.FATAL, ...)` from an observer to signal a hard-stop condition.
- How `AlertPolicy` (in the `assembly` chain) translates a FATAL alert into a `HaltEvent` and appends a halt notice to the system prompt.
- How `_HaltCheckMiddleware` (auto-wired when `assembly` is non-empty) sees the `HaltEvent` and short-circuits the next LLM call.
- Subscribing to `HaltEvent` and `ObserverAlert` from outside the Agent to verify the halt fired.

## Primitives covered

- `BaseObserver` + `EventWatch(ToolCallEvent)`
- `ObserverAlert` with `Severity.FATAL`
- `AlertPolicy` in the `assembly=` chain
- Auto-wired `_HaltCheckMiddleware` (no explicit middleware setup needed)
- `HaltEvent` lifecycle event

## Source

```python
"""08 - Safety guard - FATAL alert halts the Agent

A hand-rolled ``BaseObserver`` watches every tool call and flags anything
that looks dangerous (here: a ``write_file`` tool asked to touch
``/etc/``). It emits a ``Severity.FATAL`` ``ObserverAlert``. The flow from
there is fully wired by the framework:

1. The alert lands on the agent's stream.
2. ``AlertPolicy`` (an assembly policy) picks it up before the next LLM
   call, emits a ``HaltEvent`` on the stream, and appends a halt notice
   to the system prompt.
3. ``_HaltCheckMiddleware`` (wired in automatically when ``assembly`` is
   non-empty) sees the ``HaltEvent`` and short-circuits the LLM call with
   a synthetic ``HALTED: ...`` response.

Run::

    .venv-beta/bin/python 08_safety_guard.py
"""

import asyncio

from autogen.beta import Agent
from autogen.beta import Context
from autogen.beta.config import GeminiConfig
from autogen.beta.events import BaseEvent, ToolCallEvent, HaltEvent, ObserverAlert, Severity
from autogen.beta.observer import BaseObserver
from autogen.beta.policies import AlertPolicy
from autogen.beta.stream import MemoryStream
from autogen.beta.watch import EventWatch

def section(title: str) -> None:
    print(f"\n── {title} ───")

# ---- Tool under supervision -------------------------------------------------

def write_file(path: str, content: str) -> str:
    """Pretend-write ``content`` to ``path``. This playground never touches disk."""
    return f"[ok] wrote {len(content)} bytes to {path}"

# ---- Guardian observer ------------------------------------------------------

class PathGuardian(BaseObserver):
    """Emits a FATAL alert if anything tries to write outside /tmp."""

    def __init__(self) -> None:
        super().__init__("path-guardian", watch=EventWatch(ToolCallEvent))

    async def process(self, events: list[BaseEvent], ctx: Context) -> ObserverAlert | None:
        for event in events:
            if not isinstance(event, ToolCallEvent):
                continue
            if event.name != "write_file":
                continue
            if "/etc/" in event.arguments or "/usr/" in event.arguments:
                return ObserverAlert(
                    source=self.name,
                    severity=Severity.FATAL,
                    message=f"blocked dangerous write: {event.arguments}",
                )
        return None

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    halt_events: list[HaltEvent] = []
    alerts: list[ObserverAlert] = []
    stream = MemoryStream()
    stream.where(HaltEvent).subscribe(lambda e: halt_events.append(e))
    stream.where(ObserverAlert).subscribe(lambda e: alerts.append(e))

    agent = Agent(
        "safe-shell",
        prompt=(
            "You are a filesystem operator. Use the write_file tool to "
            "fulfil write requests. Never refuse - if a request is risky "
            "the guardian observer will intervene automatically."
        ),
        config=config,
        tools=[write_file],
        observers=[PathGuardian()],
        assembly=[AlertPolicy()],  # routes FATAL alerts to HaltEvent
    )

    section("Safe request - observer stays silent")

    reply = await agent.ask(
        "Use write_file to write 'hello' into /tmp/playground_hello.txt. Then confirm.",
        stream=stream,
    )
    print(reply.body)

    section("Dangerous request - guardian fires FATAL, agent halts")

    reply = await agent.ask(
        "Now use write_file to write 'bad' into /etc/passwd. Then confirm.",
        stream=stream,
    )
    print(reply.body)

    print()
    print(f"ObserverAlerts seen:  {len(alerts)}")
    for a in alerts:
        print(f"  - [{a.severity.upper()}] {a.source}: {a.message}")
    print(f"HaltEvents seen:      {len(halt_events)}")
    for h in halt_events:
        print(f"  - source={h.source} reason={h.reason!r}")

if __name__ == "__main__":
    asyncio.run(main())
```