AG2 Shell Tool Integration: Command Execution and Multi-Tool DevOps Orchestration

AG2's shell tool integration with OpenAI's Responses API enables agents to execute shell commands directly, unlocking powerful automation capabilities for filesystem operations, build processes, and system diagnostics. Combined with the apply_patch tool, you can orchestrate complete DevOps pipelines—from project creation to deployment validation—all within a single agent workflow.

This article explores how to leverage AG2's built-in tools for command execution, file operations, and multi-tool orchestration, with practical examples for automating development workflows and building production-ready DevOps pipelines.

Traditional agent workflows often require custom integrations for every system operation—file creation, command execution, testing, and deployment each need separate tooling. AG2's built-in tools eliminate this complexity by providing native support for:

Shell command execution: Run any shell command through a controlled interface
Structured file operations: Create, update, and delete files with precise control
Multi-tool orchestration: Combine tools seamlessly in agent workflows
Security controls: Built-in protection against dangerous commands

Key Features:

Native Shell Integration: Execute shell commands directly through OpenAI's Responses API
Multi-Tool Support: Use apply_patch and shell tools together in the same workflow
Security Controls: Configure dangerous patterns, allowed commands, and denied commands
DevOps Automation: Complete pipeline orchestration from code to deployment
Concurrent Execution: Multiple commands can run simultaneously within a single shell call
Production Ready: Built-in safeguards and validation for production deployments

Why This Matters:

Building automated development workflows traditionally requires complex integrations, custom scripts, and manual coordination between different tools. AG2's built-in tools provide a unified interface that enables agents to handle the entire software development lifecycle—from initial project setup through testing, building, and deployment—with intelligent routing and error handling.

When to Use Built-in Tools:

Use AG2's built-in tools when you need:

Filesystem Operations: Create, modify, or delete files programmatically
Command Execution: Run tests, builds, deployments, or system diagnostics
DevOps Automation: Orchestrate complete CI/CD pipelines
Multi-Step Workflows: Chain file operations and commands together
Development Automation: Automate repetitive development tasks

Don't use built-in tools for simple text generation or when you need custom tool integrations that aren't supported.

Understanding Built-in Tools#

AG2 provides two powerful built-in tools that work seamlessly with OpenAI's Responses API:

1. Shell Tool: Executes shell commands through your system's command-line interface - Supports concurrent command execution - Includes timeout and output length limits - Provides security controls for dangerous commands - Works on Mac/Linux and Windows

2. Apply Patch Tool: Performs structured file operations - Create new files with content - Update existing files using unified diff format - Delete files when needed - Maintains file structure and formatting

Together, these tools enable complete automation of software development workflows.

Shell Tool Architecture#

The shell tool is built on the ShellExecutor class, which provides multiple layers of security and control:

Security Layers#

The shell executor implements four layers of security protection:

Command Pattern Filtering: Blocks dangerous commands using regex patterns
Working Directory Restriction: Limits command execution to a specified workspace (chroot-like behavior)
Allowed/Denied Command Lists: Whitelist and blacklist for command control
Path Restrictions: Limits file system access to allowed paths within the workspace

Core Components#

ShellExecutor provides the following key capabilities:

Timeout Management: Configurable timeout for command execution (default: 60 seconds)
Concurrent Execution: Execute multiple commands simultaneously via run_commands()
Path Validation: Ensures commands only access allowed paths within the workspace
Command Validation: Multi-stage validation before command execution
Error Handling: Graceful handling of timeouts and security violations

Default Dangerous Patterns#

The shell tool includes comprehensive protection against dangerous commands by default:

DEFAULT_DANGEROUS_PATTERNS = [
    # Critical: Root filesystem deletion
    (r"\brm\s+-rf\s+/\s*$", "Deletion of root filesystem (rm -rf /) is not allowed."),
    (r"\brm\s+-rf\s+/\s+", "Deletion starting from root (rm -rf / ...) is not allowed."),
    # Critical: Home directory deletion
    (r"\brm\s+-rf\s+~\s*$", "Deletion of entire home directory (rm -rf ~) is not allowed."),
    (r"\brm\s+-rf\s+~\s+", "Deletion starting from home (rm -rf ~ ...) is not allowed."),
    # Critical system directories - block deletion
    (r"\brm\s+-rf\s+/(?:etc|usr|bin|sbin|lib|lib64|boot|root|sys|proc|dev)\b",
     "Deletion of critical system directories is not allowed."),
    # Critical: Direct disk block device operations
    (r">\s*/dev/sd[a-z][0-9]*\s*$", "Direct disk block device overwrite is not allowed."),
    (r">\s*/dev/hd[a-z][0-9]*\s*$", "Direct disk block device overwrite is not allowed."),
    (r">\s*/dev/nvme\d+n\d+p\d+\s*$", "Direct NVMe disk overwrite is not allowed."),
    # Critical: dd to disk devices
    (r"\bdd\b.*\bof=/dev/(?:sd|hd|nvme)", "Writing to disk devices with dd is not allowed."),
    # Critical: Fork bombs
    (r":\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;", "Fork bombs are not allowed."),
    # Critical: Filesystem formatting
    (r"\bmkfs\.(?:ext[234]|xfs|btrfs|ntfs|vfat|fat)\s+/dev/", "Formatting filesystems is not allowed."),
    # Windows: Format drives
    (r"\bformat\s+[A-Z]:\s*$", "Formatting Windows drives is not allowed."),
    (r"\bformat\s+[A-Z]:\s+/", "Formatting Windows drives is not allowed."),
    # Windows: System directory deletion
    (r"\bdel\s+/[sS]\s+C:\\Windows", "Deletion of Windows system directory is not allowed."),
    (r"\bdel\s+/[sS]\s+C:\\Program\s+Files", "Deletion of Windows Program Files is not allowed."),
    (r"\brmdir\s+/[sS]\s+C:\\Windows", "Deletion of Windows system directory is not allowed."),
    # Dangerous: Mass deletion with wildcards in system paths
    (r"\brm\s+-rf\s+/\*\s*$", "Mass deletion of root directory contents is not allowed."),
    (r"\brm\s+-rf\s+~\*\s*$", "Mass deletion of home directory contents is not allowed."),
    # Dangerous: Overwriting critical system files
    (r">\s*/etc/(?:passwd|shadow|hosts|fstab)", "Overwriting critical system files is not allowed."),
    (r">\s*/boot/", "Overwriting boot files is not allowed."),
]

These patterns protect against: - Filesystem destruction commands - System directory deletion - Direct disk operations - Fork bombs and resource exhaustion attacks - Critical system file modification - Cross-platform threats (Linux/Unix and Windows)

Command Execution Flow#

When a command is executed, the shell tool follows this validation flow:

Command Parsing: Extract command name and arguments
Whitelist Check: If allowed_commands is set, verify command is in the list
Blacklist Check: Verify command is not in denied_commands
Pattern Matching: Check against dangerous patterns (if enabled)
Path Validation: Ensure all paths in the command are within allowed paths
Execution: Run command in the restricted workspace directory
Result Handling: Return stdout, stderr, exit code, and timeout status

Workspace Directory Isolation#

The shell tool enforces workspace isolation:

Automatic Creation: Workspace directory is created if it doesn't exist
Path Resolution: All relative paths are resolved relative to the workspace
Access Control: Commands cannot access files outside the workspace (unless explicitly allowed)
Chroot-like Behavior: Provides similar isolation to chroot without requiring root privileges

Basic Setup#

The simplest way to use the shell tool is to configure it in your LLM configuration:

import os
from autogen import ConversableAgent, LLMConfig

# Configure the LLM with Responses API and shell tool
llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
    },
)

# Create the assistant agent
assistant = ConversableAgent(
    name="Assistant",
    system_message="""You are a helpful assistant with access to shell commands.
    You can use the shell tool to execute commands and interact with the filesystem.
    The local shell environment is on Mac/Linux.
    Keep your responses concise and include command output when helpful.
    """,
    llm_config=llm_config,
    human_input_mode="NEVER",
)

This configuration enables the agent to execute shell commands directly with default security settings.

Advanced Configuration#

Customizing Security Settings#

You can customize the shell tool's security behavior through LLM configuration:

# Custom dangerous patterns
dangerous_patterns = [
    (r"\brm\s+-rf\s+/\s*$", "Deletion of root filesystem (rm -rf /) is not allowed."),
    (r"\brm\s+-rf\s+/\s+", "Deletion starting from root (rm -rf / ...) is not allowed."),
]

# Configure with custom security settings
llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "dangerous_patterns": dangerous_patterns,
        "allowed_commands": ["ls", "cat", "grep", "pytest"],
        "denied_commands": ["rm", "dd", "format"],
        "workspace_dir": "./safe_workspace",
    },
)

Configuration Parameters#

workspace_dir: Directory where all commands execute - Commands run within this directory - Paths are resolved relative to this directory - Defaults to current working directory if not specified

allowed_commands: Whitelist of allowed commands - If provided, only commands in this list can execute - None = allow all commands (subject to other restrictions) - Example: ["ls", "cat", "grep", "pytest"]

denied_commands: Blacklist of denied commands - Commands in this list are always blocked - Takes precedence over allowed_commands - Example: ["rm", "dd", "format"]

dangerous_patterns: Custom regex patterns to block - List of tuples: (pattern, error_message) - Checked against full command string - None = use default dangerous patterns

default_timeout: Timeout in seconds for command execution - Commands exceeding this timeout are killed - Default: 60 seconds - Can be overridden per command

Concurrent Command Execution#

The shell tool supports executing multiple commands concurrently:

# Multiple commands can be executed in a single shell_call
# The model can generate multiple commands that run simultaneously
result = assistant.run(
    message="""
    Please execute these commands concurrently:
    1. List files in current directory
    2. Check Python version
    3. Show disk usage
    """,
    max_turns=2,
).process()

Each command runs independently and returns its own result, allowing for efficient parallel execution of independent operations.

Practical Examples#

Example 1: Filesystem Diagnostics#

The shell tool excels at automating filesystem and process diagnostics:

# Example: Find files and show processes
result = assistant.run(
    message="""
    Please help me with the following tasks:
    1. ls to show files in current directory
    2. Show me information about running Python processes
    """,
    max_turns=2,
).process()

This enables agents to: - List directory contents - Check running processes - Analyze system resources - Diagnose filesystem issues

Example 2: Extending Capabilities with UNIX Utilities#

The shell tool extends model capabilities by allowing access to UNIX utilities, Python runtime, and other CLIs:

# Example: Use UNIX utilities and Python CLI
result = assistant.run(
    message="""
    Please help me:
    1. Check the current Python version using the python CLI
    2. Get system information like disk usage and memory
    3. Create a simple text file and then use grep to search within it
    """,
    max_turns=6,
).process()

This pattern enables: - System information gathering - File manipulation with standard tools - Integration with existing CLI tools - Cross-platform compatibility

Example 3: Multi-Step Build and Test Flows#

The shell tool excels at running multi-step build and test flows:

# Example: Multi-step build and test flow
result = assistant.run(
    message="""
    Please help me set up a simple Python project:
    1. Create a directory called 'test_project'
    2. Create a simple Python module with a function to test
    3. Create a test file using pytest format
    4. Install pytest if needed
    5. Run the tests and show me the results
    """,
    max_turns=3,
).process()

This demonstrates: - Sequential command execution - Dependency management - Test automation - Project setup automation

Security Best Practices#

1. Always Configure Workspace Directory#

Always specify a workspace directory for production use:

# ✅ Good: Workspace directory specified
llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "workspace_dir": "./isolated_workspace",
    },
)

# ❌ Bad: No workspace directory
llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
    },
)  # Commands can access entire filesystem

2. Use Command Restrictions#

Implement command allow-lists or deny-lists:

# ✅ Good: Command restrictions configured
llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "allowed_commands": ["ls", "cat", "grep", "pytest", "python"],
        "denied_commands": ["rm", "dd", "format", "mkfs"],
    },
)

# ❌ Bad: No command restrictions
llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
    },
)  # All commands allowed (dangerous!)

3. Customize Dangerous Patterns#

Add custom patterns for your specific use case:

# ✅ Good: Custom dangerous patterns
custom_patterns = [
    (r"\brm\s+-rf\s+/\s*$", "Root deletion blocked"),
    (r"\bformat\s+[A-Z]:", "Drive formatting blocked"),
    # Add domain-specific patterns
    (r"\bdrop\s+database", "Database deletion blocked"),
]

llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "dangerous_patterns": custom_patterns,
    },
)

4. Monitor Command Execution#

Implement logging and monitoring:

# Log all executed commands for audit purposes
# The shell tool returns detailed results including:
# - stdout: Command output
# - stderr: Error output
# - exit_code: Command exit status
# - timed_out: Whether command exceeded timeout

result = assistant.run(message="Execute command", max_turns=2).process()
# Log result for security auditing

Troubleshooting#

Common Issues#

1. Commands Blocked by Security

If commands are being blocked, check your security configuration:

# Review dangerous patterns
from autogen.tools.experimental.shell.shell_tool import ShellExecutor

for pattern, message in ShellExecutor.DEFAULT_DANGEROUS_PATTERNS:
    print(f"Pattern: {pattern}")
    print(f"Message: {message}\n")

# Check if command matches any pattern
import re
command = "rm -rf /tmp/test"
for pattern, message in ShellExecutor.DEFAULT_DANGEROUS_PATTERNS:
    if re.search(pattern, command, re.IGNORECASE):
        print(f"Blocked: {message}")

2. Workspace Directory Issues

Verify workspace directory setup:

import os
from pathlib import Path

workspace_dir = "./project_dir"
workspace_path = Path(workspace_dir).resolve()

# Check if directory exists
if not workspace_path.exists():
    print(f"Creating workspace directory: {workspace_path}")
    workspace_path.mkdir(parents=True, exist_ok=True)

# Verify write permissions
test_file = workspace_path / "test.txt"
try:
    test_file.write_text("test")
    test_file.unlink()
    print("Workspace directory is writable")
except Exception as e:
    print(f"Workspace directory not writable: {e}")

3. Timeout Issues

Adjust timeout for long-running commands:

# Configure longer timeout for build commands
llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "default_timeout": 300,  # 5 minutes for builds
    },
)

Benefits Summary#

Multi-Layer Security: Four layers of protection against dangerous commands
Workspace Isolation: Commands execute in isolated directories
Concurrent Execution: Multiple commands can run simultaneously
Flexible Configuration: Customize security settings for your use case
Production Ready: Built-in safeguards and validation
Cross-Platform: Works on Mac/Linux and Windows
Comprehensive Protection: Default patterns protect against common threats

Getting Started#

Install AG2 with OpenAI support:

pip install ag2[openai]

Configure LLM with shell tool:

from autogen import ConversableAgent, LLMConfig
import os

llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "workspace_dir": "./safe_workspace",
    },
)

Create an agent:

assistant = ConversableAgent(
    name="Assistant",
    system_message="You are a helpful assistant with shell access.",
    llm_config=llm_config,
    human_input_mode="NEVER",
)

Execute commands:

result = assistant.run(
    message="List files in the current directory",
    max_turns=2,
).process()

Add security controls:

llm_config = LLMConfig(
    config_list={
        "api_type": "responses_v2",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "allowed_commands": ["ls", "cat", "grep"],
        "denied_commands": ["rm", "dd"],
        "workspace_dir": "./safe_workspace",
    },
)

Review the documentation: OpenAI Responses API

Additional Resources#

AG2's shell tool provides enterprise-grade command execution with multiple layers of security protection. By combining workspace isolation, command filtering, and pattern matching, you can safely enable shell access in your agent workflows while maintaining strict control over what commands can execute. Start building your automated workflows today with confidence in the security and reliability of AG2's shell tool integration.