Beta Middleware#
Middleware lets you intercept and customize how an AG2 Beta agent runs a turn. It's the right tool when you want to add cross-cutting behavior such as logging, retries, history trimming, request mutation, tool auditing, or guardrails without changing the agent, model client, or tools themselves.
At a high level, middleware can wrap three parts of the runtime:
- the full agent turn
- each LLM call
- each tool execution
This makes it a good fit for behavior that should apply consistently across many runs.
What is Middleware#
Middleware is an object that receives the current turn's initial event and Context, then participates in one or more lifecycle hooks.
Each middleware instance is created at the beginning of a turn and can keep per-turn state on self. That same instance can then observe or modify the turn, the LLM call, and tool execution as the run progresses.
In practice, you use middleware to:
- add observability such as logging, tracing, and timing
- enforce policies before a tool runs
- retry transient model failures
- trim conversation history before sending it to the model
- normalize tool inputs or outputs
- short-circuit or reshape a response
Middleware Hooks#
BaseMiddleware exposes three async hooks. You can implement just one of them or mix several in the same class.
on_turn()#
on_turn() wraps the whole agent turn. It receives the incoming event and the final ModelResponse.
Use on_turn() when you want to:
- measure total turn latency
- inspect or rewrite the initial request before anything else happens
- inspect or rewrite the final response before it is returned
- implement turn-level policies, approvals, or short-circuit behavior
Conceptually, this is the outermost hook around a single ask(...) call.
on_llm_call()#
on_llm_call() wraps the call to the configured model client. It receives the event history that will be sent to the LLM.
Use on_llm_call() when you want to:
- retry transient client failures
- log prompts and responses
- trim history before it reaches the model
- sanitize context / model response
- inject additional request-time instructions through event mutation
- implement caching or request deduplication around model calls
This is the hook used by built-in history and token limiting middleware.
on_tool_execution()#
on_tool_execution() wraps each tool invocation triggered during the turn. It receives the current ToolCall and can return a modified ToolResult.
Use on_tool_execution() when you want to:
- validate or rewrite tool arguments before execution
- log tool usage
- transform tool results before they go back into the event stream
- capture tool failures and replace them with safer fallback results
- enforce access control around specific tools
Registering Middleware#
On an Agent#
To make middleware apply to every turn for an agent, pass it through the middleware argument when constructing the agent.
Use agent-level registration for behavior that should always be present, such as logging, tracing, or default retry policy.
On a Single Call#
You can also add middleware just for a specific turn. This is useful when you want temporary behavior without changing the agent's defaults.
Both Agent.ask(...) and AgentReply.ask(...) accept a middleware argument.
Call-level middleware is appended after the middleware list defined on the agent.
Middleware Ordering#
Middleware runs in the order you register them. If you register [A, B, C], they enter in the order A -> B -> C and unwind in reverse order C -> B -> A.
This matters when you combine behaviors such as logging, mutation, and retries.
Writing Your Own Middleware#
To create custom middleware, subclass BaseMiddleware and implement the hooks you need.
If your middleware does not need extra constructor arguments, you can register the class directly. If it does need configuration, wrap it with Middleware(...) when registering it.
Guidelines for Custom Middleware#
- Keep hook behavior focused. Middleware that does one job well is easier to reason about than one that handles, for example, logging, retries, mutation, and policy checks together.
- Prefer
on_turn()for whole-run behavior,on_llm_call()for model-facing behavior, andon_tool_execution()for tool-facing behavior. - Be deliberate when mutating
event,events, or tool results. Later executing middleware and the rest of the runtime will observe those changes. - Register zero-config middleware classes directly, and use
Middleware(YourMiddleware, ...)when the constructor needs additional options.
Built-In Middleware#
AG2 Beta currently includes four built-in middleware in autogen.beta.middleware:
LoggingMiddleware#
Logs the lifecycle of a turn, including:
- when a turn starts and finishes
- each LLM call and its response time
- each tool execution and its result
Use it for quick debugging or application-level observability.
RetryMiddleware#
Retries failed LLM calls up to max_retries times. By default it retries any Exception, but you can narrow that with retry_on=....
Use it for transient failures such as provider timeouts or flaky network issues.
HistoryLimiter#
Trims the event history to a maximum number of events before the model call. It preserves the first ModelRequest when possible and avoids leaving leading orphaned tool results in the trimmed history.
Use it when you want a simple, deterministic cap on context length by event count.
TokenLimiter#
Trims the event history to fit within an approximate token budget before the model call. It uses a character-based estimate controlled by chars_per_token.
Use it when you need lightweight context budgeting without depending on a model-specific tokenizer.
Choosing the Right Hook#
If you are unsure where a behavior belongs, use this rule of thumb:
- Use
on_turn()when the behavior is about the entire request/response lifecycle. - Use
on_llm_call()when the behavior is about what goes into or comes out of the model. - Use
on_tool_execution()when the behavior is about tool safety, auditing, or result shaping.
For related runtime customization patterns, see Tools, Prompt Management, and Events Streaming.