Veronica Core: Circuit Breaker for AG2 Agents#
Production multi-agent systems fail in two distinct ways:
- Individual agent failure - one LLM endpoint degrades while others are healthy.
- System-wide emergency - something is deeply wrong and every agent must stop immediately.
The veronica-core library handles both with a single CircuitBreakerCapability that attaches to any AG2 agent via the standard add_to_agent() pattern.
- Basic circuit breaker - an agent trips after repeated failures; callers receive
Noneinstead of hanging. - System-wide SAFE_MODE - a shared
VeronicaIntegrationblocks all agents instantly on anomaly detection, then recovers in two steps. - Per-agent isolation - a broken agent’s open circuit does not affect healthy agents sharing the same capability instance.
Installation#
Imports#
# Copyright (c) 2023 - 2026, AG2ai, Inc., AG2ai open-source projects maintainers and core contributors
# SPDX-License-Identifier: Apache-2.0
from veronica_core import (
CircuitBreakerCapability,
MemoryBackend,
VeronicaIntegration,
VeronicaState,
)
from autogen import ConversableAgent
Demo 1: Basic Circuit Breaker#
A CircuitBreakerCapability wraps agent.generate_reply() transparently. When an agent returns None (the AG2 convention for “I have no reply”), the breaker counts it as a failure. After failure_threshold consecutive failures the circuit opens, and subsequent calls return None immediately without invoking the agent.
# An agent whose backend is completely broken (always returns None)
planner = ConversableAgent("planner", llm_config=False)
planner.register_reply(
trigger=lambda _: True,
reply_func=lambda agent, messages, sender, config: (True, None),
position=0,
remove_other_reply_funcs=True,
)
cap = CircuitBreakerCapability(failure_threshold=3)
cap.add_to_agent(planner)
breaker = cap.get_breaker("planner")
print(f"initial state : {breaker.state}") # CircuitState.CLOSED
msg = [{"role": "user", "content": "test"}]
# Three None replies trip the circuit
for _ in range(3):
planner.generate_reply(msg)
print(f"after 3 failures: {breaker.state}") # CircuitState.OPEN
print(f"failure count : {breaker.failure_count}") # 3
# Subsequent calls are short-circuited -- the agent is never invoked
reply = planner.generate_reply(msg)
print(f"reply when open : {reply!r}") # None
Demo 2: System-wide SAFE_MODE#
When multiple agents share a single VeronicaIntegration, any component can trigger a system-wide halt by transitioning to SAFE_MODE. All agents are blocked immediately — no code changes at call sites.
Recovery requires two explicit transitions (SAFE_MODE → IDLE → SCREENING) — skipping straight to SCREENING isn’t valid.
def _always_ok(agent, messages, sender, config):
return True, f"{agent.name}: ok"
# MemoryBackend keeps state in-process -- no files written during the demo
veronica = VeronicaIntegration(backend=MemoryBackend())
cap2 = CircuitBreakerCapability(failure_threshold=5, veronica=veronica)
msg = [{"role": "user", "content": "test"}]
planner2 = ConversableAgent("planner", llm_config=False)
executor2 = ConversableAgent("executor", llm_config=False)
for agent in (planner2, executor2):
agent.register_reply(
trigger=lambda _: True,
reply_func=_always_ok,
position=0,
remove_other_reply_funcs=True,
)
cap2.add_to_agent(agent)
# Both agents are healthy
print(planner2.generate_reply(msg)) # planner: ok
print(executor2.generate_reply(msg)) # executor: ok
# Anomaly detected -- halt everything immediately
# VeronicaIntegration starts in SCREENING, so SCREENING -> SAFE_MODE is valid
veronica.state.transition(VeronicaState.SAFE_MODE, reason="anomaly detected")
print(planner2.generate_reply(msg)) # None -- blocked by SAFE_MODE
print(executor2.generate_reply(msg)) # None -- blocked by SAFE_MODE
# Two-step recovery: confirm stability (IDLE), then resume screening
veronica.state.transition(VeronicaState.IDLE, reason="anomaly resolved")
veronica.state.transition(VeronicaState.SCREENING, reason="resuming")
print(planner2.generate_reply(msg)) # planner: ok
print(executor2.generate_reply(msg)) # executor: ok
Demo 3: Per-agent Isolation#
Each call to add_to_agent() creates an independent CircuitBreaker for that agent. A broken agent’s circuit opening does not affect any other agent, even when they share the same CircuitBreakerCapability instance.
cap3 = CircuitBreakerCapability(failure_threshold=2)
healthy = ConversableAgent("healthy", llm_config=False)
healthy.register_reply(
trigger=lambda _: True,
reply_func=lambda agent, messages, sender, config: (True, "healthy: ok"),
position=0,
remove_other_reply_funcs=True,
)
broken = ConversableAgent("broken", llm_config=False)
broken.register_reply(
trigger=lambda _: True,
reply_func=lambda agent, messages, sender, config: (True, None),
position=0,
remove_other_reply_funcs=True,
)
cap3.add_to_agent(healthy)
cap3.add_to_agent(broken)
msg = [{"role": "user", "content": "test"}]
# Trip the broken agent's circuit
broken.generate_reply(msg)
broken.generate_reply(msg)
print(f"broken state: {cap3.get_breaker('broken').state}") # CircuitState.OPEN
# The healthy agent is completely unaffected -- same cap, independent breaker
print(f"healthy reply: {healthy.generate_reply(msg)!r}") # 'healthy: ok'
print(f"healthy state: {cap3.get_breaker('healthy').state}") # CircuitState.CLOSED
Summary#
| Feature | API |
|---|---|
| Protect an agent | cap.add_to_agent(agent) |
| Inspect circuit state | cap.get_breaker(agent.name).state |
| System-wide halt | veronica.state.transition(VeronicaState.SAFE_MODE, ...) |
| Recovery | SAFE_MODE -> IDLE -> SCREENING (two explicit transitions) |
| Backend for demos | MemoryBackend() (no file I/O) |
Existing agent.generate_reply(messages) calls need no changes.