Exponential Backoff and Retry Configuration with Amazon Bedrock in AG2#
Author: Priyanshu Deshmukh
This notebook demonstrates how to configure exponential backoff and retry behavior for Amazon Bedrock API calls in AG2. Proper retry configuration helps handle transient errors, rate limits, and network issues gracefully.
What are Retry Configurations?#
Retry configurations enable you to: - Handle transient errors: Automatically retry failed requests due to temporary network issues - Manage rate limits: Use exponential backoff to respect API rate limits - Improve reliability: Ensure your applications are resilient to temporary failures - Control retry behavior: Fine-tune how many retries and what strategy to use
How Bedrock Implements Retries#
Bedrock uses boto3’s retry configuration system, which supports:
- Total Max Attempts: Maximum number of total attempts (initial + retries)
- Max Attempts: Legacy parameter for maximum retry attempts
- Retry Modes: Different strategies for handling retries
legacy: Pre-existing retry behaviorstandard: Standardized retry rules (defaults to 3 max attempts)adaptive: Retries with additional client-side throttling
Requirements#
- Python >= 3.10
- AG2 installed with bedrock extra:
pip install ag2[bedrock] - AWS credentials configured (via environment variables, IAM role, or AWS credentials file)
Retry Configuration Parameters#
Key Parameters#
total_max_attempts(int): Maximum number of total attempts (initial + retries)- Preferred over
max_attempts - Maps to
AWS_MAX_ATTEMPTSenvironment variable - Example:
5means 1 initial attempt + 4 retries = 5 total attempts
- Preferred over
max_attempts(int): Maximum number of retry attempts (legacy)- Example:
2means 2 retries after initial request 0means no retries- Defaults to 4 if not specified
- Example:
mode(str): Retry strategy mode"legacy": Pre-existing retry behavior"standard": Standardized retry rules (defaults to 3 max attempts)"adaptive": Retries with client-side throttling (best for rate limits)
Important Notes#
- If both
total_max_attemptsandmax_attemptsare provided,total_max_attemptstakes precedence total_max_attemptsis preferred because it aligns with AWS environment variablesadaptivemode is recommended for handling rate limits and throttling
Installation#
Install required packages if not already installed:
Setup: Import Libraries and Configure AWS Credentials#
import os
from dotenv import load_dotenv
from autogen import ConversableAgent, LLMConfig
load_dotenv()
print("Libraries imported successfully!")
Part 1: Basic Retry Configuration#
Let’s start with a simple configuration using default retry settings:
# Basic configuration with default retry settings
llm_config_default = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"aws_profile_name": os.getenv("AWS_PROFILE"),
# Default retry: total_max_attempts=5, max_attempts=5, mode="standard"
},
)
print("Default retry configuration created!")
print("Default settings:")
print(" - total_max_attempts: 5")
print(" - max_attempts: 5")
print(" - mode: standard")
Part 2: Custom Retry Configuration - Total Max Attempts#
Configure the total number of attempts (initial + retries):
# Configuration with custom total_max_attempts
llm_config_custom_attempts = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 10, # 1 initial + 9 retries = 10 total attempts
"mode": "standard",
},
)
print("Custom retry configuration created!")
print("Settings:")
print(" - total_max_attempts: 10 (1 initial + 9 retries)")
print(" - mode: standard")
Part 3: Retry Modes Comparison#
Mode 1: Legacy Mode#
Uses the pre-existing retry behavior:
# Define structured output model for math problem solving
from pydantic import BaseModel
class Step(BaseModel):
"""Represents a single step in solving a math problem."""
explanation: str # What operation or reasoning is being performed
output: str # The result of this step
class MathReasoning(BaseModel):
"""Complete structured response for a math problem solution."""
steps: list[Step] # List of all steps taken
final_answer: str # The final answer
def format(self) -> str:
"""Format the structured output for human-readable display."""
steps_output = "\n".join(
f"Step {i + 1}: {step.explanation}\n Output: {step.output}" for i, step in enumerate(self.steps)
)
return f"{steps_output}\n\nFinal Answer: {self.final_answer}"
# Legacy retry mode
llm_config_legacy = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 5,
"mode": "legacy", # Pre-existing retry behavior
},
)
print("Legacy mode configuration created!")
Mode 2: Standard Mode (Recommended)#
Standardized retry rules with default 3 max attempts:
# Standard retry mode (default)
llm_config_standard = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 5,
"mode": "standard", # Standardized retry rules
},
)
print("Standard mode configuration created!")
Mode 3: Adaptive Mode (Best for Rate Limits)#
Retries with additional client-side throttling:
# Adaptive retry mode (best for handling rate limits)
llm_config_adaptive = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 8,
"mode": "adaptive", # Retries with client-side throttling
"response_format": MathReasoning,
},
)
print("Adaptive mode configuration created!")
print("Adaptive mode is recommended for:")
print(" - Handling rate limits")
print(" - Managing throttling")
print(" - High-throughput scenarios")
Part 4: Complete Retry Configuration Examples#
Example 1: High-Reliability Configuration#
For critical applications that need maximum retry attempts:
# High-reliability configuration
llm_config_high_reliability = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 10, # More retries for reliability
"mode": "adaptive", # Best for handling various error types
},
)
print("High-reliability configuration created!")
Example 2: Fast-Fail Configuration#
For applications that need quick failure detection:
# Fast-fail configuration
llm_config_fast_fail = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 2, # Minimal retries for fast failure
"mode": "standard",
},
)
print("Fast-fail configuration created!")
Example 3: Rate-Limit Optimized Configuration#
For handling rate limits and throttling:
# Rate-limit optimized configuration
llm_config_rate_limit = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 8,
"mode": "adaptive", # Best for rate limit handling
},
)
print("Rate-limit optimized configuration created!")
Part 5: Creating Agents with Retry Configuration#
Create agents with different retry configurations:
# Agent with adaptive retry mode
agent_adaptive = ConversableAgent(
name="adaptive_agent",
llm_config=llm_config_adaptive,
system_message="You are a helpful assistant.",
max_consecutive_auto_reply=1,
human_input_mode="NEVER",
)
print(f"Agent '{agent_adaptive.name}' created with adaptive retry mode!")
# Agent with high-reliability configuration
agent_reliable = ConversableAgent(
name="reliable_agent",
llm_config=llm_config_high_reliability,
system_message="You are a reliable assistant that handles errors gracefully.",
max_consecutive_auto_reply=1,
human_input_mode="NEVER",
)
print(f"Agent '{agent_reliable.name}' created with high-reliability retry config!")
Part 6: Testing Retry Behavior#
Test how retry configuration handles errors:
# Test with adaptive retry mode
print("=== Testing Adaptive Retry Mode ===")
result = agent_adaptive.run(
message="What is 2 + 2?",
max_turns=1,
).process()
Part 7: Inspecting Retry Configuration#
Inspect the actual retry configuration used by the client:
from autogen.oai.bedrock import BedrockClient
# Create a client to inspect retry config
client = BedrockClient(
aws_region=os.getenv("AWS_REGION", "us-east-1"),
aws_access_key=os.getenv("AWS_ACCESS_KEY"),
aws_secret_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
total_max_attempts=7,
max_attempts=3,
mode="adaptive",
)
print("Retry Configuration:")
print(f" - total_max_attempts: {client._total_max_attempts}")
print(f" - max_attempts: {client._max_attempts}")
print(f" - mode: {client._mode}")
print(f" - retry_config dict: {client._retry_config}")
# Note: When both total_max_attempts and max_attempts are provided,
# boto3 Config may normalize the config, preferring total_max_attempts
Part 8: Environment Variable Configuration#
You can also configure retries via environment variables:
# Set environment variables for retry configuration
# Note: These are boto3/botocore environment variables
os.environ["AWS_MAX_ATTEMPTS"] = "10" # Maps to total_max_attempts
print("Environment variable configured:")
print(f" AWS_MAX_ATTEMPTS: {os.environ.get('AWS_MAX_ATTEMPTS')}")
# When using environment variables, you don't need to specify
# total_max_attempts in the config_list
llm_config_env = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
# total_max_attempts will be read from AWS_MAX_ATTEMPTS env var
"mode": "adaptive",
},
)
print("Configuration using environment variable created!")
Part 9: Best Practices#
1. Choose the Right Retry Mode#
- Use
standardfor most applications (default, well-tested) - Use
adaptivewhen dealing with rate limits or high-throughput scenarios - Use
legacyonly for backward compatibility
2. Set Appropriate Total Max Attempts#
- Low (2-3): For fast-fail scenarios or when errors are likely permanent
- Medium (5-7): For most applications (good balance)
- High (10+): For critical applications or when dealing with unreliable networks
3. Prefer total_max_attempts over max_attempts#
total_max_attemptsis the preferred parameter- It aligns with AWS environment variables
- It’s more intuitive (total attempts vs retry attempts)
4. Combine with Timeout Configuration#
# Combine retry config with timeout
llm_config_with_timeout = LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 5,
"mode": "adaptive",
"timeout": 60, # 60 seconds timeout per request
},
)
print("Configuration with timeout created!")
Part 10: Error Handling with Retries#
Handle different types of errors that retries can help with:
# Agent with comprehensive retry configuration
agent_with_retries = ConversableAgent(
name="retry_agent",
llm_config=llm_config_adaptive,
system_message="You are a helpful assistant.",
max_consecutive_auto_reply=1,
human_input_mode="NEVER",
)
def test_with_error_handling():
"""Test agent with error handling."""
try:
result = agent_with_retries.run(
message="Hello, how are you?",
max_turns=1,
).process()
return result
except Exception as e:
print(f"Error after retries: {type(e).__name__}: {e}")
# The retry mechanism should have already attempted multiple times
raise
# Test the error handling
print("=== Testing Error Handling ===")
result = test_with_error_handling()
print("Test completed!")
Part 11: Comparison Table#
| Configuration | total_max_attempts | mode | Use Case |
|---|---|---|---|
| Default | 5 | standard | General purpose |
| High Reliability | 10 | adaptive | Critical applications |
| Fast Fail | 2 | standard | Quick failure detection |
| Rate Limit Optimized | 8 | adaptive | High-throughput scenarios |
| Legacy | 5 | legacy | Backward compatibility |
Part 12: Advanced: Custom Retry Configuration per Agent#
Create multiple agents with different retry configurations:
# Agent 1: Fast responses (fewer retries)
fast_agent = ConversableAgent(
name="fast_agent",
llm_config=LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 3,
"mode": "standard",
},
),
system_message="You provide quick responses.",
max_consecutive_auto_reply=1,
)
# Agent 2: Reliable responses (more retries)
reliable_agent = ConversableAgent(
name="reliable_agent",
llm_config=LLMConfig(
config_list={
"api_type": "bedrock",
"model": "qwen.qwen3-coder-480b-a35b-v1:0",
"aws_region": os.getenv("AWS_REGION", "eu-north-1"),
"aws_access_key": os.getenv("AWS_ACCESS_KEY"),
"aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"total_max_attempts": 10,
"mode": "adaptive",
},
),
system_message="You provide reliable responses with retries.",
max_consecutive_auto_reply=1,
)
print("Multiple agents created with different retry configurations!")
Summary#
In this notebook, we’ve learned:
- ✅ How to configure retry behavior for Bedrock API calls
- ✅ Understanding
total_max_attempts,max_attempts, andmodeparameters - ✅ Different retry modes:
legacy,standard, andadaptive - ✅ Best practices for choosing retry configurations
- ✅ How to combine retry config with timeout settings
- ✅ Error handling strategies with retries
- ✅ Environment variable configuration options
Key Takeaways#
- Use
total_max_attempts(preferred overmax_attempts) - Use
adaptivemode for rate limit handling - Use
standardmode for general-purpose applications - Set appropriate attempt counts based on your reliability needs
- Combine with timeout for better control
Next Steps#
- Experiment with different retry configurations for your use case
- Monitor retry behavior in production
- Adjust retry settings based on error patterns
- Consider using
adaptivemode for high-throughput scenarios