Compaction
Compaction reduces a stream's event history to respect runtime constraints — event count or token budget. It is the constraint-respecting counterpart to Aggregation.
Compaction removes. Aggregation creates. They are separate concerns.
When to use it#
Long-running conversations accumulate events faster than the model's context window can absorb. Use compaction to cap the size of history that flows into the next LLM call.
| Symptom | Use |
|---|---|
| History getting close to provider token limit | TailWindowCompact or SummarizeCompact |
| Need to keep recent events and forget old ones cheaply | TailWindowCompact |
| Want a short summary of old events to preserve context | SummarizeCompact |
CompactStrategy protocol#
Every strategy implements the same shape:
Returns a new event list that replaces the current history. Strategies must preserve the causal ordering of retained events — no reshuffling.
CompactTrigger#
A dataclass describing when compaction should fire. Any configured threshold that is exceeded triggers compaction.
Leaving a field at 0 disables that threshold. CompactTrigger() alone does nothing — you must opt into at least one condition.
CompactTrigger is a plain data object — it records when you want compaction to fire, but does not fire it. Strategies are invoked explicitly via await strategy.compact(...).
Built-in strategies#
Both built-ins are importable from autogen.beta.compact.
TailWindowCompact#
Keeps the last N events, drops the rest. Zero LLM cost. Suitable when old context has diminishing value and recency is what matters.
Passing a KnowledgeStore is optional. If provided, dropped events are persisted to /log/ as a numbered segment — see the KnowledgeStore docs — so they can be replayed later via EventLogWriter.load(). If omitted, dropped events are discarded.
SummarizeCompact#
Summarizes the dropped portion via one LLM call, inserts a CompactionSummary event at the head of retained history. Use when you want to keep some sense of the old conversation instead of just forgetting it.
The summarization model is independent from the agent's main model — pick a smaller / cheaper one. Token usage is recorded on the strategy instance as strategy.last_usage.
CompactionSummary#
The synthetic event inserted by SummarizeCompact at the head of history.
CompactionSummary is on the allowlist of ConversationPolicy, so it passes through the assembly chain and reaches the LLM as context — without requiring special handling elsewhere.
Wiring onto an Agent#
Pass the strategy + trigger through KnowledgeConfig. The Agent wires a _CompactionMiddleware that fires the strategy automatically after each turn when the trigger threshold is crossed.
Every compaction attempt emits a triple on the agent's stream:
| Event | When | Use it to |
|---|---|---|
CompactionStarted | Just before compact() runs | Mark the start of work; carries strategy / event_count |
CompactionCompleted | compact() returned and history was replaced | Read events_before / events_after / usage |
CompactionFailed | compact() raised | Inspect error_type + error; the history is left untouched and the agent turn is not interrupted |
The failure path is the one that matters: the strategy exception is also logged via the module logger, but the stream event is the durable signal — subscribe to CompactionFailed if you want failed compactions to surface in your application's UI or alerting. (Aggregation emits the symmetric AggregationStarted / AggregationCompleted / AggregationFailed triple — see Aggregation › Wiring onto an Agent.)
Driving a strategy directly#
If you're not using Agent (custom harness, tests, one-off scripts), call await strategy.compact(...) yourself:
For the token-based threshold, estimate with sum(len(str(e)) for e in events) / trigger.chars_per_token.
Writing a custom strategy#
Any object with an async compact(events, ctx, store) method satisfies the protocol. A couple of ideas:
- Drop tool noise. Keep
ModelRequest/ModelResponse, dropToolCallEvent/ToolResultEventolder than some boundary. - Priority retention. Score events (e.g. keep every
ModelResponsebut decimateToolCallEvents). - Segmented summarization. Run
SummarizeCompactin chunks to produce multipleCompactionSummaryevents over time rather than one big one.