Code Execution#
AG2 supports two ways to let an agent run code: have the LLM provider execute it inside their own sandbox, or run it client-side through a sandboxed backend you control. Both produce the same conversational pattern — the model writes code, code runs, results come back — but the trade-offs are different.
| Built-in Provider | Remote (SandboxCodeTool) | |
|---|---|---|
| Where it runs | Provider's sandbox | A CodeEnvironment you supply (Daytona, Docker, custom) |
| Setup | Add the tool, done | Choose / configure a backend |
| Cost | Bundled in provider tokens | Your sandbox bill (Daytona) or free (local Docker) |
| Custom packages, images | No | Yes |
| State persistence | Provider-defined | Per-environment instance |
| Provider support | Only providers with native code-exec | Any provider |
Built-in Provider Code Execution#
Some providers expose a server-side Python sandbox the model can drive directly. AG2 surfaces this through CodeExecutionTool — a declaration-only tool that maps to each provider's native capability:
See the Built-in Tools page for the provider support matrix and version pinning.
When to use this
Cheapest to wire up, no infrastructure to run. The trade-off is no control over the runtime — you can't preinstall packages, persist files between calls, or use this on a provider without native code-execution support.
Remote Code Execution#
SandboxCodeTool exposes a run_code(code, language) function the agent can call. Because it's a regular function tool, it works on any model provider. Where the code actually runs is decided by the CodeEnvironment you hand it.
environment is required
SandboxCodeTool has no default backend. Pass DaytonaCodeEnvironment, DockerCodeEnvironment, or your own CodeEnvironment implementation.
When to use this
You need custom packages or images, persistent state across calls, your own infrastructure, or you're working with a provider that doesn't have a native code-execution capability.
Two environments are available: Daytona (hosted) and Docker (local container). You can also implement your own — see the Custom tab below.
Daytona is a hosted sandbox service. Strongest isolation; pay per use.
DaytonaCodeEnvironment reads DAYTONA_API_KEY, DAYTONA_API_URL, and DAYTONA_TARGET from the environment by default. Out-of-the-box supported languages: python, bash, javascript, typescript.
A local container managed via the Docker daemon. Free, cross-platform (Mac/Linux/Win via Docker Desktop), real container isolation.
Default supported languages: python and bash (both ship in python:3.12-slim). Add "javascript" / "typescript" only if your image has node / ts-node installed.
Safety defaults are deliberately strict:
network_mode="none"— no network access. Set to"bridge"to opt in.mem_limit="512m"— caps runaway processes.auto_remove=True— container is removed on stop.user=None— runs as the image's default user. For images that ship anobodyuser,user="nobody"is recommended.
SandboxCodeTool only depends on the CodeEnvironment protocol, so any backend that satisfies it works — e2b, an SSH host, an internal CI runner.
The context argument is the active ConversationContext, forwarded so backends can resolve Variable markers from context.variables (e.g. per-tenant credentials). Backends with no runtime-configurable parameters can ignore it.
Lifecycle#
The sandbox / container is created lazily on the first run_code call and reused for the lifetime of the environment instance. Cleanup is registered via atexit so resources are released even if you forget to close the environment. For tighter scoping, use the environment as an async context manager:
The same pattern works with DockerCodeEnvironment (container stopped + removed) and any other backend that implements __aenter__ / __aexit__.
Credentials and runtime config#
DaytonaCodeEnvironment accepts Variable markers for api_key, api_url, target, image, snapshot, and env_vars. DockerCodeEnvironment accepts them for image, env_vars, and network_mode. Variables resolve from context.variables on the first run_code call — useful for multi-tenant setups:
State persistence#
A single CodeEnvironment instance reuses the same sandbox / container across every run_code call routed through it. Files written in one call are visible in the next, and packages installed once stay installed. Each snippet still runs as a fresh process, so Python globals defined in one call are not visible to the next — persist state on disk.
| Scenario | Same sandbox? |
|---|---|
Multiple run_code calls within one agent.ask(...) | yes |
agent.ask(...) → reply.ask(...) (same agent, same tool) | yes |
Two agents sharing the same SandboxCodeTool instance | yes (shared filesystem state) |
New CodeEnvironment(...) per request | no — each spins up its own sandbox |
After await env.aclose() or process exit | no — sandbox is deleted |
For most chat-style agents, instantiate one CodeEnvironment per agent (or per conversation) so state is scoped the way you'd expect. The first tool call pays the sandbox-creation round-trip; subsequent calls reuse it.