Code Execution
AG2 agents can execute code from a message passed to them (e.g., those containing code blocks) and output a message with the results of the execution for the following agent to interpret.
There are two types of built-in code executors, one is the command line code executor, which runs code in a command line environment such as a MacOS or Linux shell, and the other is a Jupyter executor, which runs code in an interactive Jupyter kernel.
For each type of executor, AG2 provides two ways to execute code: locally and in a Docker container. For development and testing, not recommended for production, you can run it on the same host platform where AG2 is running, i.e., the local operating system. For better The other way is to execute code in a Docker container. The table below shows the combinations of code executors and execution environments.
Code Executor (autogen.coding ) | Environment | Platform |
---|---|---|
LocalCommandLineCodeExecutor | Shell | Local |
DockerCommandLineCodeExecutor | Shell | Docker |
jupyter.JupyterCodeExecutor | Jupyter Kernel (e.g., python3) | Local/Docker |
Local Execution
The figure below shows the architecture of the local command line code executor
(autogen.coding.LocalCommandLineCodeExecutor
).
:::danger Executing LLM-generated code poses a security risk to your host environment. :::
Upon receiving a message with a code block, the local command line code executor first writes the code block to a code file, then starts a new subprocess to execute the code file. The executor reads the console output of the code execution and sends it back as a reply message.
Here is an example of using the code executor to run a Python code block that prints a random number.
Before running this example, we need to make sure the matplotlib
and numpy
are installed.
First we create an agent with the code executor that uses a temporary directory to store the code files.
We specify human_input_mode="ALWAYS"
to manually validate the safety of the code being executed.
Now we have the agent generate a reply given a message with a Python code block.
During the generation of response, a human input is requested to give an opportunity to intercept the code execution. In this case, we choose to continue the execution, and the agent’s reply contains the output of the code execution.
We can take a look at the generated plot in the temporary directory.
Docker Execution
To mitigate the security risk of running LLM-generated code locally, we can use the docker command line code executor (autogen.coding.DockerCommandLineCodeExecutor
) to execute code in a docker container.
This way, the generated code can only access resources that are explicitly given to it.
The figure below illustrates how docker execution works.
Similar to the local command line code executor, the docker executor extracts code blocks from input messages, writes them to code files. For each code file, it starts a docker container to execute the code file, and reads the console output of the code execution.
To use docker execution, you need to install Docker on your machine. Once you have Docker installed and running, you can set up your code executor agent as follow:
The work_dir
in the constructor points to a local file system directory just like in the local execution case.
The docker container will mount this directory and the executor write code files and output to it.
Use Code Execution in Conversation
Writing and executing code is necessary for many tasks such as data analysis, machine learning, and mathematical modeling. In AG2, coding can be a conversation between a code writer agent and a code executor agent, mirroring the interaction between a programmer and a code interpreter.
The code writer agent can be powered by any LLM with code-writing capability.
And the code executor agent is powered by a code executor.
The following is an agent with a code writer role specified using system_message
. The system message contains important instruction on how to use the code executor in the code executor agent.
Now we can try a more complex example that involves using external packages that can get data.
Let’s say we want to get the stock price gains year-to-date for Tesla and Meta (formerly Facebook). We can also use the two agents with several iterations of conversation.
In the previous conversation, the code writer agent generated a code block to install necessary packages and another code block for a script to fetch the stock price and calculate gains year-to-date for Tesla and Meta. The code executor agent installed the packages, executed the script, and returned the results.
Let’s take a look at the chart that was generated.
Command Line or Jupyter Code Executor?
The command line code executor does not keep any state in memory between executions of different code blocks it receives, as it writes each code block to a separate file and executes the code block in a new process.
Contrast to the command line code executor, the Jupyter code executor runs all code blocks in the same Jupyter kernel, which keeps the state in memory between executions.
The choice between command line and Jupyter code executor depends on the nature of the code blocks in agents’ conversation. If each code block is a “script” that does not use variables from previous code blocks, the command line code executor is a good choice. If some code blocks contain expensive computations (e.g., training a machine learning model and loading a large amount of data), and you want to keep the state in memory to avoid repeated computations, the Jupyter code executor is a better choice.
More Code Execution examples
- Task Solving with Code Generation, Execution, and Debugging
- Auto-Generated Agent Chat: Task Solving with Code Gen, Execution, Debugging & Human Feedback