Skip to content

07 · Long-Doc Chat

A "remember the last words" chat exercise that stress-tests the assembly chain. Three policies compose to control exactly what the LLM sees on each call — drop non-conversation events, hard-cap to the last 6 events, then enforce a token budget. A separate TailWindowCompact strategy keeps the underlying stream history small too. Watch the compaction events fire as the conversation grows.

What it covers#

  • Composing multiple AssemblyPolicy instances in the order they should apply.
  • ConversationPolicy to drop lifecycle/internal events from the LLM's view.
  • SlidingWindowPolicy(max_events=N, transparent=True) for a hard event-count cap.
  • TokenBudgetPolicy(max_tokens=N) as a belt-and-braces secondary cap.
  • Combining the assembly chain with a KnowledgeConfig that wires TailWindowCompact for stream-history compaction (different from assembly!).
  • Watching CompactionCompleted events fire from outside the Agent.

Primitives covered#

  • assembly=[ConversationPolicy(), SlidingWindowPolicy(...), TokenBudgetPolicy(...)]
  • KnowledgeConfig with compact=TailWindowCompact(...) + compact_trigger=CompactTrigger(...)
  • MemoryKnowledgeStore (in-memory variant of the knowledge store)
  • CompactionCompleted lifecycle event

Source#

"""07 · Long-doc chat — composing assembly policies

Shows the assembly chain in action. Three policies compose in order:

1. ``ConversationPolicy`` — drops every event that isn't conversation or
   tool traffic (no lifecycle noise reaches the LLM).
2. ``SlidingWindowPolicy(max_events=6)`` — hard-caps the number of events
   forwarded to the LLM, so history can't grow unbounded.
3. ``TokenBudgetPolicy(max_tokens=2000)`` — character-based secondary cap,
   belt-and-braces against one huge event blowing the budget.

Also pairs the assembly chain with ``TailWindowCompact`` so the agent's
stream history itself (not just the view into it) is kept small.

Run::

    .venv-beta/bin/python 07_long_doc_chat.py
"""

import asyncio

from autogen.beta import Agent, KnowledgeConfig
from autogen.beta.compact import CompactTrigger, TailWindowCompact
from autogen.beta.config import GeminiConfig
from autogen.beta.events.lifecycle import CompactionCompleted
from autogen.beta.knowledge import MemoryKnowledgeStore
from autogen.beta.policies import (
    ConversationPolicy,
    SlidingWindowPolicy,
    TokenBudgetPolicy,
)
from autogen.beta.stream import MemoryStream

def section(title: str) -> None:
    print(f"\n── {title} ───")

QUESTIONS = [
    "Remember the word 'oak'.",
    "Remember the word 'river'.",
    "Remember the word 'lantern'.",
    "Remember the word 'sable'.",
    "Remember the word 'quartz'.",
    "Name the three most recent words I asked you to remember.",
]

async def main() -> None:
    config = GeminiConfig(model="gemini-3-flash-preview", temperature=0)

    store = MemoryKnowledgeStore()
    compactions: list[CompactionCompleted] = []
    stream = MemoryStream()
    stream.where(CompactionCompleted).subscribe(lambda e: compactions.append(e))

    agent = Agent(
        "lexicon",
        prompt=(
            "Be very terse — one short sentence per reply. "
            "Answer directly without calling any tools."
        ),
        config=config,
        tasks=False,  # No subtask tools — keeps the demo focused on assembly + compaction.
        assembly=[
            ConversationPolicy(),
            SlidingWindowPolicy(max_events=6, transparent=True),
            TokenBudgetPolicy(max_tokens=2000),
        ],
        knowledge=KnowledgeConfig(
            store=store,
            compact=TailWindowCompact(target=4),
            compact_trigger=CompactTrigger(max_events=8),
        ),
    )

    section("Long-doc chat — assembly policies trim what the LLM actually sees")

    reply = await agent.ask(QUESTIONS[0], stream=stream)
    print(f"Q1> {QUESTIONS[0]}")
    print(f"A1> {reply.body}")

    for i, q in enumerate(QUESTIONS[1:], start=2):
        reply = await reply.ask(q)
        print(f"Q{i}> {q}")
        print(f"A{i}> {reply.body}")

    print()
    print(f"Compactions fired during run: {len(compactions)}")
    for c in compactions:
        print(f"  - {c.strategy}: {c.events_before}{c.events_after} events")

if __name__ == "__main__":
    asyncio.run(main())