AG2 Shell Tool Integration: Command Execution and Multi-Tool DevOps Orchestration

AG2's shell tool integration with OpenAI's Responses API enables agents to execute shell commands directly, unlocking powerful automation capabilities for filesystem operations, build processes, and system diagnostics. Combined with the apply_patch tool, you can orchestrate complete DevOps pipelines—from project creation to deployment validation—all within a single agent workflow.
This article explores how to leverage AG2's built-in tools for command execution, file operations, and multi-tool orchestration, with practical examples for automating development workflows and building production-ready DevOps pipelines.
{/ more /}
Traditional agent workflows often require custom integrations for every system operation—file creation, command execution, testing, and deployment each need separate tooling. AG2's built-in tools eliminate this complexity by providing native support for:
- Shell command execution: Run any shell command through a controlled interface
- Structured file operations: Create, update, and delete files with precise control
- Multi-tool orchestration: Combine tools seamlessly in agent workflows
- Security controls: Built-in protection against dangerous commands
Key Features:
-
Native Shell Integration: Execute shell commands directly through OpenAI's Responses API
-
Multi-Tool Support: Use
apply_patchandshelltools together in the same workflow -
Security Controls: Configure dangerous patterns, allowed commands, and denied commands
-
DevOps Automation: Complete pipeline orchestration from code to deployment
-
Concurrent Execution: Multiple commands can run simultaneously within a single shell call
-
Production Ready: Built-in safeguards and validation for production deployments
Why This Matters:
Building automated development workflows traditionally requires complex integrations, custom scripts, and manual coordination between different tools. AG2's built-in tools provide a unified interface that enables agents to handle the entire software development lifecycle—from initial project setup through testing, building, and deployment—with intelligent routing and error handling.
When to Use Built-in Tools:
Use AG2's built-in tools when you need:
- Filesystem Operations: Create, modify, or delete files programmatically
- Command Execution: Run tests, builds, deployments, or system diagnostics
- DevOps Automation: Orchestrate complete CI/CD pipelines
- Multi-Step Workflows: Chain file operations and commands together
- Development Automation: Automate repetitive development tasks
Don't use built-in tools for simple text generation or when you need custom tool integrations that aren't supported.
Understanding Built-in Tools#
AG2 provides two powerful built-in tools that work seamlessly with OpenAI's Responses API:
1. Shell Tool: Executes shell commands through your system's command-line interface - Supports concurrent command execution - Includes timeout and output length limits - Provides security controls for dangerous commands - Works on Mac/Linux and Windows
2. Apply Patch Tool: Performs structured file operations - Create new files with content - Update existing files using unified diff format - Delete files when needed - Maintains file structure and formatting
Together, these tools enable complete automation of software development workflows.
Shell Tool Architecture#
The shell tool is built on the ShellExecutor class, which provides multiple layers of security and control:
Security Layers#
The shell executor implements four layers of security protection:
- Command Pattern Filtering: Blocks dangerous commands using regex patterns
- Working Directory Restriction: Limits command execution to a specified workspace (chroot-like behavior)
- Allowed/Denied Command Lists: Whitelist and blacklist for command control
- Path Restrictions: Limits file system access to allowed paths within the workspace
Core Components#
ShellExecutor provides the following key capabilities:
- Timeout Management: Configurable timeout for command execution (default: 60 seconds)
- Concurrent Execution: Execute multiple commands simultaneously via
run_commands() - Path Validation: Ensures commands only access allowed paths within the workspace
- Command Validation: Multi-stage validation before command execution
- Error Handling: Graceful handling of timeouts and security violations
Default Dangerous Patterns#
The shell tool includes comprehensive protection against dangerous commands by default:
DEFAULT_DANGEROUS_PATTERNS = [
# Critical: Root filesystem deletion
(r"\brm\s+-rf\s+/\s*$", "Deletion of root filesystem (rm -rf /) is not allowed."),
(r"\brm\s+-rf\s+/\s+", "Deletion starting from root (rm -rf / ...) is not allowed."),
# Critical: Home directory deletion
(r"\brm\s+-rf\s+~\s*$", "Deletion of entire home directory (rm -rf ~) is not allowed."),
(r"\brm\s+-rf\s+~\s+", "Deletion starting from home (rm -rf ~ ...) is not allowed."),
# Critical system directories - block deletion
(r"\brm\s+-rf\s+/(?:etc|usr|bin|sbin|lib|lib64|boot|root|sys|proc|dev)\b",
"Deletion of critical system directories is not allowed."),
# Critical: Direct disk block device operations
(r">\s*/dev/sd[a-z][0-9]*\s*$", "Direct disk block device overwrite is not allowed."),
(r">\s*/dev/hd[a-z][0-9]*\s*$", "Direct disk block device overwrite is not allowed."),
(r">\s*/dev/nvme\d+n\d+p\d+\s*$", "Direct NVMe disk overwrite is not allowed."),
# Critical: dd to disk devices
(r"\bdd\b.*\bof=/dev/(?:sd|hd|nvme)", "Writing to disk devices with dd is not allowed."),
# Critical: Fork bombs
(r":\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;", "Fork bombs are not allowed."),
# Critical: Filesystem formatting
(r"\bmkfs\.(?:ext[234]|xfs|btrfs|ntfs|vfat|fat)\s+/dev/", "Formatting filesystems is not allowed."),
# Windows: Format drives
(r"\bformat\s+[A-Z]:\s*$", "Formatting Windows drives is not allowed."),
(r"\bformat\s+[A-Z]:\s+/", "Formatting Windows drives is not allowed."),
# Windows: System directory deletion
(r"\bdel\s+/[sS]\s+C:\\Windows", "Deletion of Windows system directory is not allowed."),
(r"\bdel\s+/[sS]\s+C:\\Program\s+Files", "Deletion of Windows Program Files is not allowed."),
(r"\brmdir\s+/[sS]\s+C:\\Windows", "Deletion of Windows system directory is not allowed."),
# Dangerous: Mass deletion with wildcards in system paths
(r"\brm\s+-rf\s+/\*\s*$", "Mass deletion of root directory contents is not allowed."),
(r"\brm\s+-rf\s+~\*\s*$", "Mass deletion of home directory contents is not allowed."),
# Dangerous: Overwriting critical system files
(r">\s*/etc/(?:passwd|shadow|hosts|fstab)", "Overwriting critical system files is not allowed."),
(r">\s*/boot/", "Overwriting boot files is not allowed."),
]
These patterns protect against: - Filesystem destruction commands - System directory deletion - Direct disk operations - Fork bombs and resource exhaustion attacks - Critical system file modification - Cross-platform threats (Linux/Unix and Windows)
Command Execution Flow#
When a command is executed, the shell tool follows this validation flow:
- Command Parsing: Extract command name and arguments
- Whitelist Check: If
allowed_commandsis set, verify command is in the list - Blacklist Check: Verify command is not in
denied_commands - Pattern Matching: Check against dangerous patterns (if enabled)
- Path Validation: Ensure all paths in the command are within allowed paths
- Execution: Run command in the restricted workspace directory
- Result Handling: Return stdout, stderr, exit code, and timeout status
Workspace Directory Isolation#
The shell tool enforces workspace isolation:
- Automatic Creation: Workspace directory is created if it doesn't exist
- Path Resolution: All relative paths are resolved relative to the workspace
- Access Control: Commands cannot access files outside the workspace (unless explicitly allowed)
- Chroot-like Behavior: Provides similar isolation to chroot without requiring root privileges
Basic Setup#
The simplest way to use the shell tool is to configure it in your LLM configuration:
import os
from autogen import ConversableAgent, LLMConfig
# Configure the LLM with Responses API and shell tool
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
},
)
# Create the assistant agent
assistant = ConversableAgent(
name="Assistant",
system_message="""You are a helpful assistant with access to shell commands.
You can use the shell tool to execute commands and interact with the filesystem.
The local shell environment is on Mac/Linux.
Keep your responses concise and include command output when helpful.
""",
llm_config=llm_config,
human_input_mode="NEVER",
)
This configuration enables the agent to execute shell commands directly with default security settings.
Advanced Configuration#
Customizing Security Settings#
You can customize the shell tool's security behavior through LLM configuration:
# Custom dangerous patterns
dangerous_patterns = [
(r"\brm\s+-rf\s+/\s*$", "Deletion of root filesystem (rm -rf /) is not allowed."),
(r"\brm\s+-rf\s+/\s+", "Deletion starting from root (rm -rf / ...) is not allowed."),
]
# Configure with custom security settings
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
"dangerous_patterns": dangerous_patterns,
"allowed_commands": ["ls", "cat", "grep", "pytest"],
"denied_commands": ["rm", "dd", "format"],
"workspace_dir": "./safe_workspace",
},
)
Configuration Parameters#
workspace_dir: Directory where all commands execute - Commands run within this directory - Paths are resolved relative to this directory - Defaults to current working directory if not specified
allowed_commands: Whitelist of allowed commands - If provided, only commands in this list can execute - None = allow all commands (subject to other restrictions) - Example: ["ls", "cat", "grep", "pytest"]
denied_commands: Blacklist of denied commands - Commands in this list are always blocked - Takes precedence over allowed_commands - Example: ["rm", "dd", "format"]
dangerous_patterns: Custom regex patterns to block - List of tuples: (pattern, error_message) - Checked against full command string - None = use default dangerous patterns
default_timeout: Timeout in seconds for command execution - Commands exceeding this timeout are killed - Default: 60 seconds - Can be overridden per command
Concurrent Command Execution#
The shell tool supports executing multiple commands concurrently:
# Multiple commands can be executed in a single shell_call
# The model can generate multiple commands that run simultaneously
result = assistant.run(
message="""
Please execute these commands concurrently:
1. List files in current directory
2. Check Python version
3. Show disk usage
""",
max_turns=2,
).process()
Each command runs independently and returns its own result, allowing for efficient parallel execution of independent operations.
Practical Examples#
Example 1: Filesystem Diagnostics#
The shell tool excels at automating filesystem and process diagnostics:
# Example: Find files and show processes
result = assistant.run(
message="""
Please help me with the following tasks:
1. ls to show files in current directory
2. Show me information about running Python processes
""",
max_turns=2,
).process()
This enables agents to: - List directory contents - Check running processes - Analyze system resources - Diagnose filesystem issues
Example 2: Extending Capabilities with UNIX Utilities#
The shell tool extends model capabilities by allowing access to UNIX utilities, Python runtime, and other CLIs:
# Example: Use UNIX utilities and Python CLI
result = assistant.run(
message="""
Please help me:
1. Check the current Python version using the python CLI
2. Get system information like disk usage and memory
3. Create a simple text file and then use grep to search within it
""",
max_turns=6,
).process()
This pattern enables: - System information gathering - File manipulation with standard tools - Integration with existing CLI tools - Cross-platform compatibility
Example 3: Multi-Step Build and Test Flows#
The shell tool excels at running multi-step build and test flows:
# Example: Multi-step build and test flow
result = assistant.run(
message="""
Please help me set up a simple Python project:
1. Create a directory called 'test_project'
2. Create a simple Python module with a function to test
3. Create a test file using pytest format
4. Install pytest if needed
5. Run the tests and show me the results
""",
max_turns=3,
).process()
This demonstrates: - Sequential command execution - Dependency management - Test automation - Project setup automation
Security Best Practices#
1. Always Configure Workspace Directory#
Always specify a workspace directory for production use:
# ✅ Good: Workspace directory specified
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
"workspace_dir": "./isolated_workspace",
},
)
# ❌ Bad: No workspace directory
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
},
) # Commands can access entire filesystem
2. Use Command Restrictions#
Implement command allow-lists or deny-lists:
# ✅ Good: Command restrictions configured
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
"allowed_commands": ["ls", "cat", "grep", "pytest", "python"],
"denied_commands": ["rm", "dd", "format", "mkfs"],
},
)
# ❌ Bad: No command restrictions
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
},
) # All commands allowed (dangerous!)
3. Customize Dangerous Patterns#
Add custom patterns for your specific use case:
# ✅ Good: Custom dangerous patterns
custom_patterns = [
(r"\brm\s+-rf\s+/\s*$", "Root deletion blocked"),
(r"\bformat\s+[A-Z]:", "Drive formatting blocked"),
# Add domain-specific patterns
(r"\bdrop\s+database", "Database deletion blocked"),
]
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
"dangerous_patterns": custom_patterns,
},
)
4. Monitor Command Execution#
Implement logging and monitoring:
# Log all executed commands for audit purposes
# The shell tool returns detailed results including:
# - stdout: Command output
# - stderr: Error output
# - exit_code: Command exit status
# - timed_out: Whether command exceeded timeout
result = assistant.run(message="Execute command", max_turns=2).process()
# Log result for security auditing
Troubleshooting#
Common Issues#
1. Commands Blocked by Security
If commands are being blocked, check your security configuration:
# Review dangerous patterns
from autogen.tools.experimental.shell.shell_tool import ShellExecutor
for pattern, message in ShellExecutor.DEFAULT_DANGEROUS_PATTERNS:
print(f"Pattern: {pattern}")
print(f"Message: {message}\n")
# Check if command matches any pattern
import re
command = "rm -rf /tmp/test"
for pattern, message in ShellExecutor.DEFAULT_DANGEROUS_PATTERNS:
if re.search(pattern, command, re.IGNORECASE):
print(f"Blocked: {message}")
2. Workspace Directory Issues
Verify workspace directory setup:
import os
from pathlib import Path
workspace_dir = "./project_dir"
workspace_path = Path(workspace_dir).resolve()
# Check if directory exists
if not workspace_path.exists():
print(f"Creating workspace directory: {workspace_path}")
workspace_path.mkdir(parents=True, exist_ok=True)
# Verify write permissions
test_file = workspace_path / "test.txt"
try:
test_file.write_text("test")
test_file.unlink()
print("Workspace directory is writable")
except Exception as e:
print(f"Workspace directory not writable: {e}")
3. Timeout Issues
Adjust timeout for long-running commands:
# Configure longer timeout for build commands
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
"default_timeout": 300, # 5 minutes for builds
},
)
Benefits Summary#
- Multi-Layer Security: Four layers of protection against dangerous commands
- Workspace Isolation: Commands execute in isolated directories
- Concurrent Execution: Multiple commands can run simultaneously
- Flexible Configuration: Customize security settings for your use case
- Production Ready: Built-in safeguards and validation
- Cross-Platform: Works on Mac/Linux and Windows
- Comprehensive Protection: Default patterns protect against common threats
Getting Started#
- Install AG2 with OpenAI support:
- Configure LLM with shell tool:
from autogen import ConversableAgent, LLMConfig
import os
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
"workspace_dir": "./safe_workspace",
},
)
- Create an agent:
assistant = ConversableAgent(
name="Assistant",
system_message="You are a helpful assistant with shell access.",
llm_config=llm_config,
human_input_mode="NEVER",
)
- Execute commands:
- Add security controls:
llm_config = LLMConfig(
config_list={
"api_type": "responses",
"model": "gpt-5.1",
"api_key": os.getenv("OPENAI_API_KEY"),
"built_in_tools": ["shell"],
"allowed_commands": ["ls", "cat", "grep"],
"denied_commands": ["rm", "dd"],
"workspace_dir": "./safe_workspace",
},
)
- Review the documentation: OpenAI Responses API
Additional Resources#
- AG2 OpenAI Responses Documentation
- Shell Tool Example Notebook
- OpenAI Responses API Reference
- AG2 Agent Chat Documentation
AG2's shell tool provides enterprise-grade command execution with multiple layers of security protection. By combining workspace isolation, command filtering, and pattern matching, you can safely enable shell access in your agent workflows while maintaining strict control over what commands can execute. Start building your automated workflows today with confidence in the security and reliability of AG2's shell tool integration.