Tool Use
In the previous chapter, we explored code executors which give agents the super power of programming. Agents writing arbitrary code is useful, however, controlling what code an agent writes can be challenging. This is where tools come in.
Tools are pre-defined functions that agents can use. Instead of writing arbitrary code, agents can call tools to perform actions, such as searching the web, performing calculations, reading files, or calling remote APIs. Because you can control what tools are available to an agent, you can control what actions an agent can perform.
Creating Tools
Tools can be created as regular Python functions. For example, let’s create a calculator tool which can only perform a single operation at a time.
The above function takes three arguments: a
and b
are the integer
numbers to be operated on; operator
is the operation to be performed.
We used type hints to define the types of the arguments and the return
value.
Registering Tools
Once you have created a tool, you can register it with the agents that are involved in conversation.
In the above code, we registered the calculator
function as a tool
with the assistant and user proxy agents. We also provide a name and a
description for the tool for the assistant agent to understand its
usage.
Similar to code executors, a tool must be registered with at least two
agents for it to be useful in conversation. The agent registered with
the tool’s signature through
register_for_llm
can call the tool; the agent registered with the tool’s function object
through
register_for_execution
can execute the tool’s function.
Alternatively, you can use
autogen.register_function
function to register a tool with both agents at once.
Using Tool
Once the tool is registered, we can use it in conversation. In the code
below, we ask the assistant to perform some arithmetic calculation using
the calculator
tool.
Let’s verify the answer:
The answer is correct. You can see that the assistant is able to understand the tool’s usage and perform calculation correctly.
Tool Schema
If you are familiar with OpenAI’s tool use
API, you
might be wondering why we didn’t create a tool schema. In fact, the tool
schema is automatically generated from the function signature and the
type hints. You can see the tool schema by inspecting the llm_config
attribute of the agent.
You can see the tool schema has been automatically generated from the function signature and the type hints, as well as the description. This is why it is important to use type hints and provide a clear description for the tool as the LLM uses them to understand the tool’s usage.
You can also use Pydantic model for the type hints to provide more complex type schema. In the example below, we use a Pydantic model to define the calculator input.
Same as before, we register the tool with the agents using the name
"calculator"
.
You can see the tool schema has been updated to reflect the new type schema.
Let’s use the tool in conversation.
Let’s verify the answer:
Again, the answer is correct. You can see that the assistant is able to understand the new tool schema and perform calculation correctly.
How to hide tool usage and code execution within a single agent?
Sometimes it is preferable to hide the tool usage inside a single agent, i.e., the tool call and tool response messages are kept invisible from outside of the agent, and the agent responds to outside messages with tool usages as “internal monologues”. For example, you might want build an agent that is similar to the OpenAI’s Assistant which executes built-in tools internally.
To achieve this, you can use nested chats. Nested chats allow you to create “internal monologues” within an agent to call and execute tools. This works for code execution as well. See nested chats for tool use for an example.
Summary
In this chapter, we showed you how to create, register and use tools. Tools allows agents to perform actions without writing arbitrary code. In the next chapter, we will introduce conversation patterns, and show how to use the result of a conversation.