Assembly
Assembly is the step that shapes what the LLM actually sees on each turn. An AssemblerMiddleware runs an ordered chain of AssemblyPolicy instances, each one transforming (prompts, events) before they reach the model.
Use it to inject persistent context (working memory, past conversations, observer alerts) and to cap the history footprint (sliding window, token budget) — without touching the event stream itself.
Why assembly#
Everything that happens during an agent's run — model requests/responses, tool calls, observer alerts, lifecycle events — lands on the Stream. But not all of that is useful to send to the next LLM call, and some useful context (e.g. a summary of a prior conversation) lives outside the stream entirely.
Assembly is the seam where you:
- Inject information from the KnowledgeStore or from Observer alerts into the prompt.
- Filter the event list down to what the model should see.
- Reduce the history to fit a window or token budget.
Each of those jobs is an AssemblyPolicy. The AssemblerMiddleware chains them.
AssemblyPolicy protocol#
Every policy implements the same shape:
A policy receives the current prompts and events, returns modified copies. Policies compose left-to-right: each one sees the output of the previous. They must be pure — side-effect-free, idempotent where possible, and they must not emit events onto the stream (with one exception: AlertPolicy emits HaltEvent on FATAL, documented below).
AssemblerMiddleware#
Wire the chain onto an Agent by constructing an AssemblerMiddleware with your policy list:
AssemblerMiddleware sits at the outermost position in the middleware chain, before any user-provided middleware or the LLM client itself. Inside each turn it:
- Builds the
prompts/eventspair from the current context. - Runs every policy in order, piping the output of each into the next.
- Temporarily swaps
context.promptfor the assembled version while the LLM call runs. - Restores the original prompt afterward.
Ordering matters#
Assembly policies split into two kinds:
| Kind | Purpose | Examples |
|---|---|---|
| Injection | Add context to prompts | AlertPolicy, WorkingMemoryPolicy, EpisodicMemoryPolicy |
| Reduction | Trim events | SlidingWindowPolicy, TokenBudgetPolicy, ConversationPolicy |
The rule: injection before reduction. If a reducer runs first, the injections it should have included in its budget don't exist yet.
AssemblerMiddleware.validate_order() catches known bad orderings and returns a list of warnings:
Built-in policies#
All six built-ins are importable from autogen.beta.
ConversationPolicy#
Keeps only conversation and tool events (ModelRequest, ModelResponse, ToolCallEvent, ToolResultEvent, ToolResultsEvent, ToolErrorEvent, plus CompactionSummary). Drops alerts, lifecycle events, observer output — anything the LLM does not need to see.
Takes no arguments. Effectively an allowlist — add a new event type to the stream and it is filtered out by default.
SlidingWindowPolicy#
Keeps only the last max_events events. Skips leading orphaned ToolResultsEvent entries so the window never starts on an unmatched tool result.
Set transparent=True to append a prompt note like "[sliding_window] Showing last 50 of 123 events." — useful while tuning.
TokenBudgetPolicy#
Keeps the newest events that fit in an estimated token budget. Estimation is len(str(event)) / chars_per_token — cheap, not perfectly accurate. Use it as a safety net, not an exact meter.
AlertPolicy#
Delivers ObserverAlerts to the model. Each new alert is formatted into the prompt once (deduplicated on (source, severity, message)), and FATAL alerts additionally emit a HaltEvent onto the stream so the surrounding loop can short-circuit.
Takes no arguments. Dedup state lives on the instance — give each Agent its own AlertPolicy.
Note
AlertPolicy is what bridges the Observer system into the LLM. Without it, ObserverAlert events sit on the stream but never reach the model. Place it after other injection policies and before reduction policies.
WorkingMemoryPolicy#
Reads /memory/working.md from the KnowledgeStore and injects it into the prompt. Working memory is the actor's persistent state — written between conversations by an aggregation strategy and read on every turn.
The policy looks up the store by type (context.dependencies.get(KnowledgeStore)) — if no store is registered, it's a no-op.
EpisodicMemoryPolicy#
Reads the most recent summaries under /memory/conversations/ and injects them. The companion reader for ConversationSummaryAggregate, which writes timestamped summary files to that path after each conversation.
Also requires a KnowledgeStore in context.dependencies; a no-op otherwise.
A realistic chain#
Typical production ordering — injections first, then AlertPolicy, then reduce:
Writing a custom policy#
Any object with a name and an async apply(...) method satisfies the protocol. Use it for domain-specific injection (project docs, RAG hits, on-call runbooks) or custom filtering:
Drop it into the policy list alongside the built-ins.
Tip
Custom policies are a better fit than Middleware when you only need to shape the prompt or filter events — not to wrap the LLM call itself. Middleware is for retry, timeout, logging, rate limiting; policies are for context assembly.