Exponential Backoff and Retry Configuration with Amazon Bedrock in AG2#

Author: Priyanshu Deshmukh

This notebook demonstrates how to configure exponential backoff and retry behavior for Amazon Bedrock API calls in AG2. Proper retry configuration helps handle transient errors, rate limits, and network issues gracefully.

What are Retry Configurations?#

Retry configurations enable you to: - Handle transient errors: Automatically retry failed requests due to temporary network issues - Manage rate limits: Use exponential backoff to respect API rate limits - Improve reliability: Ensure your applications are resilient to temporary failures - Control retry behavior: Fine-tune how many retries and what strategy to use

How Bedrock Implements Retries#

Bedrock uses boto3’s retry configuration system, which supports:

Total Max Attempts: Maximum number of total attempts (initial + retries)
Max Attempts: Legacy parameter for maximum retry attempts
Retry Modes: Different strategies for handling retries
- legacy: Pre-existing retry behavior
- standard: Standardized retry rules (defaults to 3 max attempts)
- adaptive: Retries with additional client-side throttling

Requirements#

Python >= 3.10
AG2 installed with bedrock extra: pip install ag2[bedrock]
AWS credentials configured (via environment variables, IAM role, or AWS credentials file)

Retry Configuration Parameters#

Key Parameters#

total_max_attempts (int): Maximum number of total attempts (initial + retries)
Preferred over max_attempts
Maps to AWS_MAX_ATTEMPTS environment variable
Example: 5 means 1 initial attempt + 4 retries = 5 total attempts
max_attempts (int): Maximum number of retry attempts (legacy)
Example: 2 means 2 retries after initial request
0 means no retries
Defaults to 4 if not specified
mode (str): Retry strategy mode
"legacy": Pre-existing retry behavior
"standard": Standardized retry rules (defaults to 3 max attempts)
"adaptive": Retries with client-side throttling (best for rate limits)

Important Notes#

If both total_max_attempts and max_attempts are provided, total_max_attempts takes precedence
total_max_attempts is preferred because it aligns with AWS environment variables
adaptive mode is recommended for handling rate limits and throttling

Installation#

Install required packages if not already installed:

%pip install ag2[bedrock] --upgrade

Setup: Import Libraries and Configure AWS Credentials#

import os

from dotenv import load_dotenv

from autogen import ConversableAgent, LLMConfig

load_dotenv()

print("Libraries imported successfully!")

Part 1: Basic Retry Configuration#

Let’s start with a simple configuration using default retry settings:

# Basic configuration with default retry settings
llm_config_default = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "aws_profile_name": os.getenv("AWS_PROFILE"),
        # Default retry: total_max_attempts=5, max_attempts=5, mode="standard"
    },
)

print("Default retry configuration created!")
print("Default settings:")
print("  - total_max_attempts: 5")
print("  - max_attempts: 5")
print("  - mode: standard")

Part 2: Custom Retry Configuration - Total Max Attempts#

Configure the total number of attempts (initial + retries):

# Configuration with custom total_max_attempts
llm_config_custom_attempts = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 10,  # 1 initial + 9 retries = 10 total attempts
        "mode": "standard",
    },
)

print("Custom retry configuration created!")
print("Settings:")
print("  - total_max_attempts: 10 (1 initial + 9 retries)")
print("  - mode: standard")

Part 3: Retry Modes Comparison#

Mode 1: Legacy Mode#

Uses the pre-existing retry behavior:

# Define structured output model for math problem solving
from pydantic import BaseModel

class Step(BaseModel):
    """Represents a single step in solving a math problem."""

    explanation: str  # What operation or reasoning is being performed
    output: str  # The result of this step

class MathReasoning(BaseModel):
    """Complete structured response for a math problem solution."""

    steps: list[Step]  # List of all steps taken
    final_answer: str  # The final answer

    def format(self) -> str:
        """Format the structured output for human-readable display."""
        steps_output = "\n".join(
            f"Step {i + 1}: {step.explanation}\n  Output: {step.output}" for i, step in enumerate(self.steps)
        )
        return f"{steps_output}\n\nFinal Answer: {self.final_answer}"

# Legacy retry mode
llm_config_legacy = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 5,
        "mode": "legacy",  # Pre-existing retry behavior
    },
)

print("Legacy mode configuration created!")

Mode 2: Standard Mode (Recommended)#

Standardized retry rules with default 3 max attempts:

# Standard retry mode (default)
llm_config_standard = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 5,
        "mode": "standard",  # Standardized retry rules
    },
)

print("Standard mode configuration created!")

Mode 3: Adaptive Mode (Best for Rate Limits)#

Retries with additional client-side throttling:

# Adaptive retry mode (best for handling rate limits)
llm_config_adaptive = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 8,
        "mode": "adaptive",  # Retries with client-side throttling
        "response_format": MathReasoning,
    },
)

print("Adaptive mode configuration created!")
print("Adaptive mode is recommended for:")
print("  - Handling rate limits")
print("  - Managing throttling")
print("  - High-throughput scenarios")

Part 4: Complete Retry Configuration Examples#

Example 1: High-Reliability Configuration#

For critical applications that need maximum retry attempts:

# High-reliability configuration
llm_config_high_reliability = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 10,  # More retries for reliability
        "mode": "adaptive",  # Best for handling various error types
    },
)

print("High-reliability configuration created!")

Example 2: Fast-Fail Configuration#

For applications that need quick failure detection:

# Fast-fail configuration
llm_config_fast_fail = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 2,  # Minimal retries for fast failure
        "mode": "standard",
    },
)

print("Fast-fail configuration created!")

Example 3: Rate-Limit Optimized Configuration#

For handling rate limits and throttling:

# Rate-limit optimized configuration
llm_config_rate_limit = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 8,
        "mode": "adaptive",  # Best for rate limit handling
    },
)

print("Rate-limit optimized configuration created!")

Part 5: Creating Agents with Retry Configuration#

Create agents with different retry configurations:

# Agent with adaptive retry mode
agent_adaptive = ConversableAgent(
    name="adaptive_agent",
    llm_config=llm_config_adaptive,
    system_message="You are a helpful assistant.",
    max_consecutive_auto_reply=1,
    human_input_mode="NEVER",
)

print(f"Agent '{agent_adaptive.name}' created with adaptive retry mode!")

# Agent with high-reliability configuration
agent_reliable = ConversableAgent(
    name="reliable_agent",
    llm_config=llm_config_high_reliability,
    system_message="You are a reliable assistant that handles errors gracefully.",
    max_consecutive_auto_reply=1,
    human_input_mode="NEVER",
)

print(f"Agent '{agent_reliable.name}' created with high-reliability retry config!")

Part 6: Testing Retry Behavior#

Test how retry configuration handles errors:

# Test with adaptive retry mode
print("=== Testing Adaptive Retry Mode ===")

result = agent_adaptive.run(
    message="What is 2 + 2?",
    max_turns=1,
).process()

Part 7: Inspecting Retry Configuration#

Inspect the actual retry configuration used by the client:

from autogen.oai.bedrock import BedrockClient

# Create a client to inspect retry config
client = BedrockClient(
    aws_region=os.getenv("AWS_REGION", "us-east-1"),
    aws_access_key=os.getenv("AWS_ACCESS_KEY"),
    aws_secret_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    total_max_attempts=7,
    max_attempts=3,
    mode="adaptive",
)

print("Retry Configuration:")
print(f"  - total_max_attempts: {client._total_max_attempts}")
print(f"  - max_attempts: {client._max_attempts}")
print(f"  - mode: {client._mode}")
print(f"  - retry_config dict: {client._retry_config}")

# Note: When both total_max_attempts and max_attempts are provided,
# boto3 Config may normalize the config, preferring total_max_attempts

Part 8: Environment Variable Configuration#

You can also configure retries via environment variables:

# Set environment variables for retry configuration
# Note: These are boto3/botocore environment variables
os.environ["AWS_MAX_ATTEMPTS"] = "10"  # Maps to total_max_attempts

print("Environment variable configured:")
print(f"  AWS_MAX_ATTEMPTS: {os.environ.get('AWS_MAX_ATTEMPTS')}")

# When using environment variables, you don't need to specify
# total_max_attempts in the config_list
llm_config_env = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        # total_max_attempts will be read from AWS_MAX_ATTEMPTS env var
        "mode": "adaptive",
    },
)

print("Configuration using environment variable created!")

Part 9: Best Practices#

1. Choose the Right Retry Mode#

Use standard for most applications (default, well-tested)
Use adaptive when dealing with rate limits or high-throughput scenarios
Use legacy only for backward compatibility

2. Set Appropriate Total Max Attempts#

Low (2-3): For fast-fail scenarios or when errors are likely permanent
Medium (5-7): For most applications (good balance)
High (10+): For critical applications or when dealing with unreliable networks

3. Prefer `total_max_attempts` over `max_attempts`#

total_max_attempts is the preferred parameter
It aligns with AWS environment variables
It’s more intuitive (total attempts vs retry attempts)

4. Combine with Timeout Configuration#

# Combine retry config with timeout
llm_config_with_timeout = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 5,
        "mode": "adaptive",
        "timeout": 60,  # 60 seconds timeout per request
    },
)

print("Configuration with timeout created!")

Part 10: Error Handling with Retries#

Handle different types of errors that retries can help with:

# Agent with comprehensive retry configuration
agent_with_retries = ConversableAgent(
    name="retry_agent",
    llm_config=llm_config_adaptive,
    system_message="You are a helpful assistant.",
    max_consecutive_auto_reply=1,
    human_input_mode="NEVER",
)

def test_with_error_handling():
    """Test agent with error handling."""
    try:
        result = agent_with_retries.run(
            message="Hello, how are you?",
            max_turns=1,
        ).process()
        return result
    except Exception as e:
        print(f"Error after retries: {type(e).__name__}: {e}")
        # The retry mechanism should have already attempted multiple times
        raise

# Test the error handling
print("=== Testing Error Handling ===")
result = test_with_error_handling()
print("Test completed!")

Part 11: Comparison Table#

Configuration	total_max_attempts	mode	Use Case
Default	5	standard	General purpose
High Reliability	10	adaptive	Critical applications
Fast Fail	2	standard	Quick failure detection
Rate Limit Optimized	8	adaptive	High-throughput scenarios
Legacy	5	legacy	Backward compatibility

Part 12: Advanced: Custom Retry Configuration per Agent#

Create multiple agents with different retry configurations:

# Agent 1: Fast responses (fewer retries)
fast_agent = ConversableAgent(
    name="fast_agent",
    llm_config=LLMConfig(
        config_list={
            "api_type": "bedrock",
            "model": "qwen.qwen3-coder-480b-a35b-v1:0",
            "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
            "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
            "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
            "total_max_attempts": 3,
            "mode": "standard",
        },
    ),
    system_message="You provide quick responses.",
    max_consecutive_auto_reply=1,
)

# Agent 2: Reliable responses (more retries)
reliable_agent = ConversableAgent(
    name="reliable_agent",
    llm_config=LLMConfig(
        config_list={
            "api_type": "bedrock",
            "model": "qwen.qwen3-coder-480b-a35b-v1:0",
            "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
            "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
            "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
            "total_max_attempts": 10,
            "mode": "adaptive",
        },
    ),
    system_message="You provide reliable responses with retries.",
    max_consecutive_auto_reply=1,
)

print("Multiple agents created with different retry configurations!")

Summary#

In this notebook, we’ve learned:

✅ How to configure retry behavior for Bedrock API calls
✅ Understanding total_max_attempts, max_attempts, and mode parameters
✅ Different retry modes: legacy, standard, and adaptive
✅ Best practices for choosing retry configurations
✅ How to combine retry config with timeout settings
✅ Error handling strategies with retries
✅ Environment variable configuration options

Key Takeaways#

Use total_max_attempts (preferred over max_attempts)
Use adaptive mode for rate limit handling
Use standard mode for general-purpose applications
Set appropriate attempt counts based on your reliability needs
Combine with timeout for better control

Next Steps#

Experiment with different retry configurations for your use case
Monitor retry behavior in production
Adjust retry settings based on error patterns
Consider using adaptive mode for high-throughput scenarios