LiteLLM with Ollama
LiteLLM is an open-source locally run proxy server that provides an OpenAI-compatible API. It interfaces with a large number of providers that do the inference. To handle the inference, a popular open-source inference engine is Ollama.
As not all proxy servers support OpenAI’s Function Calling (usable with AutoGen), LiteLLM together with Ollama enable this useful feature.
Running this stack requires the installation of:
- AutoGen (installation instructions)
- LiteLLM
- Ollama
Note: We recommend using a virtual environment for your stack, see this article for guidance.
Installing LiteLLM
Install LiteLLM with the proxy server functionality:
Note: If using Windows, run LiteLLM and Ollama within a WSL2.
Installing Ollama
For Mac and Windows, download Ollama.
For Linux:
Downloading models
Ollama has a library of models to choose from, see them here.
Before you can use a model, you need to download it (using the name of the model from the library):
To view the models you have downloaded and can use:
Running LiteLLM proxy server
To run LiteLLM with the model you have downloaded, in your terminal:
This will run the proxy server and it will be available at ‘http://0.0.0.0:4000/’.
Using LiteLLM+Ollama with AutoGen
Now that we have the URL for the LiteLLM proxy server, you can use it within AutoGen in the same way as OpenAI or cloud-based proxy servers.
As you are running this proxy server locally, no API key is required.
Additionally, as the model is being set when running the LiteLLM
command, no model name needs to be configured in AutoGen. However,
model
and api_key
are mandatory fields for configurations within
AutoGen so we put dummy values in them, as per the example below.
An additional setting for the configuration is price
, which can be
used to set the pricing of tokens. As we’re running it locally, we’ll
put our costs as zero. Using this setting will also avoid a prompt being
shown when price can’t be determined.
Example with Function Calling
Function calling (aka Tool calling) is a feature of OpenAI’s API that AutoGen, LiteLLM, and Ollama support.
Below is an example of using function calling with LiteLLM and Ollama. Based on this currency conversion notebook.
LiteLLM is loaded in the same way as the previous example and we’ll continue to use Meta’s Llama3 model as it is good at constructing the function calling message required.
Note: LiteLLM version 1.41.27, or later, is required (to support function calling natively using Ollama).
In your terminal:
Then we run our program with function calling.
We can see that the currency conversion function was called with the correct values and a result was generated.