Ollama
Ollama is a local inference engine that enables you to run open-weight LLMs in your environment. It has native support for a large number of models such as Google’s Gemma, Meta’s Llama 2/3/3.1, Microsoft’s Phi 3, Mistral.AI’s Mistral/Mixtral, and Cohere’s Command R models.
Note: Previously, to use Ollama with AutoGen you required LiteLLM. Now it can be used directly and supports tool calling.
Features
When using this Ollama client class, messages are tailored to accommodate the specific requirements of Ollama’s API and this includes message role sequences, support for function/tool calling, and token usage.
Installing Ollama
For Mac and Windows, download Ollama.
For Linux:
Downloading models for Ollama
Ollama has a library of models to choose from, see them here.
Before you can use a model, you need to download it (using the name of the model from the library):
To view the models you have downloaded and can use:
Getting started with AutoGen and Ollama
When installing AutoGen, you need to install the pyautogen
package
with the Ollama library.
See the sample OAI_CONFIG_LIST
below showing how the Ollama client
class is used by specifying the api_type
as ollama
.
If you need to specify the URL for your Ollama install, use the
client_host
key in your config as per the below example:
API parameters
The following Ollama parameters can be added to your config. See this link for further information on them.
- num_predict (integer): -1 is infinite, -2 is fill context, 128 is default
- repeat_penalty (float)
- seed (integer)
- stream (boolean)
- temperature (float)
- top_k (int)
- top_p (float)
Example:
Two-Agent Coding Example
In this example, we run a two-agent chat with an AssistantAgent (primarily a coding agent) to generate code to count the number of prime numbers between 1 and 10,000 and then it will be executed.
We’ll use Meta’s Llama 3.1 model which is suitable for coding.
In this example we will specify the URL for the Ollama installation
using client_host
.
Importantly, we have tweaked the system message so that the model doesn’t return the termination keyword, which we’ve changed to FINISH, with the code block.
We can now start the chat.
Tool Calling - Native vs Manual
Ollama supports native tool calling (Ollama v0.3.1 library onward). If
you install AutoGen with pip install pyautogen[ollama]
you will be
able to use native tool calling.
The parameter native_tool_calls
in your configuration allows you to
specify if you want to use Ollama’s native tool calling (default) or
manual tool calling.
Native tool calling only works with certain models and an exception will be thrown if you try to use it with an unsupported model.
Manual tool calling allows you to use tool calling with any Ollama model. It incorporates guided tool calling messages into the prompt that guide the LLM through the process of selecting a tool and then evaluating the result of the tool. As to be expected, the ability to follow instructions and return formatted JSON is highly dependent on the model.
You can tailor the manual tool calling messages by adding these parameters to your configuration:
manual_tool_call_instruction
manual_tool_call_step1
manual_tool_call_step2
To use manual tool calling set native_tool_calls
to False
.
Reducing repetitive tool calls
By incorporating tools into a conversation, LLMs can often continually recommend them to be called, even after they’ve been called and a result returned. This can lead to a never ending cycle of tool calls.
To remove the chance of an LLM recommending a tool call, an additional
parameter called hide_tools
can be used to specify when tools are
hidden from the LLM. The string values for the parameter are:
- ‘never’: tools are never hidden
- ‘if_all_run’: tools are hidden if all tools have been called
- ‘if_any_run’: tools are hidden if any tool has been called
This can be used with native or manual tool calling, an example of a configuration is shown below.
Tool Call Example
In this example, instead of writing code, we will have an agent assist with some trip planning using multiple tool calling.
Again, we’ll use Meta’s versatile Llama 3.1.
Native Ollama tool calling will be used and we’ll utilise the
hide_tools
parameter to hide the tools once all have been called.
We’ll create our agents. Importantly, we’re using native Ollama tool calling and to help guide it we add the JSON to the system_message so that the number fields aren’t wrapped in quotes (becoming strings).
Create and register our functions (tools). See the tutorial chapter on tool use for more information.
And run it!
Great, we can see that Llama 3.1 has helped choose the right functions, their parameters, and then summarised them for us.