Skip to content

Veronica Core: Circuit Breaker for AG2 Agents#

Open In Colab Open on GitHub

Production multi-agent systems fail in two distinct ways:

  • Individual agent failure - one LLM endpoint degrades while others are healthy.
  • System-wide emergency - something is deeply wrong and every agent must stop immediately.

The veronica-core library handles both with a single CircuitBreakerCapability that attaches to any AG2 agent via the standard add_to_agent() pattern.

  1. Basic circuit breaker - an agent trips after repeated failures; callers receive None instead of hanging.
  2. System-wide SAFE_MODE - a shared VeronicaIntegration blocks all agents instantly on anomaly detection, then recovers in two steps.
  3. Per-agent isolation - a broken agent’s open circuit does not affect healthy agents sharing the same capability instance.

Installation#

pip install -U "autogen[openai]" veronica-core

Imports#

# Copyright (c) 2023 - 2026, AG2ai, Inc., AG2ai open-source projects maintainers and core contributors
# SPDX-License-Identifier: Apache-2.0

from veronica_core import (
    CircuitBreakerCapability,
    MemoryBackend,
    VeronicaIntegration,
    VeronicaState,
)

from autogen import ConversableAgent

Demo 1: Basic Circuit Breaker#

A CircuitBreakerCapability wraps agent.generate_reply() transparently. When an agent returns None (the AG2 convention for “I have no reply”), the breaker counts it as a failure. After failure_threshold consecutive failures the circuit opens, and subsequent calls return None immediately without invoking the agent.

# An agent whose backend is completely broken (always returns None)
planner = ConversableAgent("planner", llm_config=False)
planner.register_reply(
    trigger=lambda _: True,
    reply_func=lambda agent, messages, sender, config: (True, None),
    position=0,
    remove_other_reply_funcs=True,
)

cap = CircuitBreakerCapability(failure_threshold=3)
cap.add_to_agent(planner)

breaker = cap.get_breaker("planner")
print(f"initial state  : {breaker.state}")  # CircuitState.CLOSED

msg = [{"role": "user", "content": "test"}]

# Three None replies trip the circuit
for _ in range(3):
    planner.generate_reply(msg)

print(f"after 3 failures: {breaker.state}")  # CircuitState.OPEN
print(f"failure count   : {breaker.failure_count}")  # 3

# Subsequent calls are short-circuited -- the agent is never invoked
reply = planner.generate_reply(msg)
print(f"reply when open : {reply!r}")  # None

Demo 2: System-wide SAFE_MODE#

When multiple agents share a single VeronicaIntegration, any component can trigger a system-wide halt by transitioning to SAFE_MODE. All agents are blocked immediately — no code changes at call sites.

Recovery requires two explicit transitions (SAFE_MODE → IDLE → SCREENING) — skipping straight to SCREENING isn’t valid.

def _always_ok(agent, messages, sender, config):
    return True, f"{agent.name}: ok"

# MemoryBackend keeps state in-process -- no files written during the demo
veronica = VeronicaIntegration(backend=MemoryBackend())
cap2 = CircuitBreakerCapability(failure_threshold=5, veronica=veronica)

msg = [{"role": "user", "content": "test"}]

planner2 = ConversableAgent("planner", llm_config=False)
executor2 = ConversableAgent("executor", llm_config=False)
for agent in (planner2, executor2):
    agent.register_reply(
        trigger=lambda _: True,
        reply_func=_always_ok,
        position=0,
        remove_other_reply_funcs=True,
    )
    cap2.add_to_agent(agent)

# Both agents are healthy
print(planner2.generate_reply(msg))  # planner: ok
print(executor2.generate_reply(msg))  # executor: ok

# Anomaly detected -- halt everything immediately
# VeronicaIntegration starts in SCREENING, so SCREENING -> SAFE_MODE is valid
veronica.state.transition(VeronicaState.SAFE_MODE, reason="anomaly detected")
print(planner2.generate_reply(msg))  # None -- blocked by SAFE_MODE
print(executor2.generate_reply(msg))  # None -- blocked by SAFE_MODE

# Two-step recovery: confirm stability (IDLE), then resume screening
veronica.state.transition(VeronicaState.IDLE, reason="anomaly resolved")
veronica.state.transition(VeronicaState.SCREENING, reason="resuming")
print(planner2.generate_reply(msg))  # planner: ok
print(executor2.generate_reply(msg))  # executor: ok

Demo 3: Per-agent Isolation#

Each call to add_to_agent() creates an independent CircuitBreaker for that agent. A broken agent’s circuit opening does not affect any other agent, even when they share the same CircuitBreakerCapability instance.

cap3 = CircuitBreakerCapability(failure_threshold=2)

healthy = ConversableAgent("healthy", llm_config=False)
healthy.register_reply(
    trigger=lambda _: True,
    reply_func=lambda agent, messages, sender, config: (True, "healthy: ok"),
    position=0,
    remove_other_reply_funcs=True,
)

broken = ConversableAgent("broken", llm_config=False)
broken.register_reply(
    trigger=lambda _: True,
    reply_func=lambda agent, messages, sender, config: (True, None),
    position=0,
    remove_other_reply_funcs=True,
)

cap3.add_to_agent(healthy)
cap3.add_to_agent(broken)

msg = [{"role": "user", "content": "test"}]

# Trip the broken agent's circuit
broken.generate_reply(msg)
broken.generate_reply(msg)
print(f"broken  state: {cap3.get_breaker('broken').state}")  # CircuitState.OPEN

# The healthy agent is completely unaffected -- same cap, independent breaker
print(f"healthy reply: {healthy.generate_reply(msg)!r}")  # 'healthy: ok'
print(f"healthy state: {cap3.get_breaker('healthy').state}")  # CircuitState.CLOSED

Summary#

Feature API
Protect an agent cap.add_to_agent(agent)
Inspect circuit state cap.get_breaker(agent.name).state
System-wide halt veronica.state.transition(VeronicaState.SAFE_MODE, ...)
Recovery SAFE_MODE -> IDLE -> SCREENING (two explicit transitions)
Backend for demos MemoryBackend() (no file I/O)

Existing agent.generate_reply(messages) calls need no changes.