Enhanced Support for Non-OpenAI Models
TL;DR
- AutoGen has expanded integrations with a variety of cloud-based model providers beyond OpenAI.
- Leverage models and platforms from Gemini, Anthropic, Mistral AI, Together.AI, and Groq for your AutoGen agents.
- Utilise models specifically for chat, language, image, and coding.
- LLM provider diversification can provide cost and resilience benefits.
In addition to the recently released AutoGen Google Gemini client, new client classes for Mistral AI, Anthropic, Together.AI, and Groq enable you to utilize over 75 different large language models in your AutoGen agent workflow.
These new client classes tailor AutoGen’s underlying messages to each provider’s unique requirements and remove that complexity from the developer, who can then focus on building their AutoGen workflow.
Using them is as simple as installing the client-specific library and updating your LLM config with the relevant api_type
and model
. We’ll demonstrate how to use them below.
The community is continuing to enhance and build new client classes as cloud-based inference providers arrive. So, watch this space, and feel free to discuss or develop another one.
Benefits of choice
The need to use only the best models to overcome workflow-breaking LLM inconsistency has diminished considerably over the last 12 months.
These new classes provide access to the very largest trillion-parameter models from OpenAI, Google, and Anthropic, continuing to provide the most consistent and competent agent experiences. However, it’s worth trying smaller models from the likes of Meta, Mistral AI, Microsoft, Qwen, and many others. Perhaps they are capable enough for a task, or sub-task, or even better suited (such as a coding model)!
Using smaller models will have cost benefits, but they also allow you to test models that you could run locally, allowing you to determine if you can remove cloud inference costs altogether or even run an AutoGen workflow offline.
On the topic of cost, these client classes also include provider-specific token cost calculations so you can monitor the cost impact of your workflows. With costs per million tokens as low as 10 cents (and some are even free!), cost savings can be noticeable.
Mix and match
How does Google’s Gemini 1.5 Pro model stack up against Anthropic’s Opus or Meta’s Llama 3?
Now you have the ability to quickly change your agent configs and find out. If you want to run all three in the one workflow, AutoGen’s ability to associate specific configurations to each agent means you can select the best LLM for each agent.
Capabilities
The common requirements of text generation and function/tool calling are supported by these client classes.
Multi-modal support, such as for image/audio/video, is an area of active development. The Google Gemini client class can be used to create a multimodal agent.
Tips
Here are some tips when working with these client classes:
- Most to least capable - start with larger models and get your workflow working, then iteratively try smaller models.
- Right model - choose one that’s suited to your task, whether it’s coding, function calling, knowledge, or creative writing.
- Agent names - these cloud providers do not use the
name
field on a message, so be sure to use your agent’s name in theirsystem_message
anddescription
fields, as well as instructing the LLM to ‘act as’ them. This is particularly important for “auto” speaker selection in group chats as we need to guide the LLM to choose the next agent based on a name, so tweakselect_speaker_message_template
,select_speaker_prompt_template
, andselect_speaker_auto_multiple_template
with more guidance. - Context length - as your conversation gets longer, models need to support larger context lengths, be mindful of what the model supports and consider using Transform Messages to manage context size.
- Provider parameters - providers have parameters you can set such as temperature, maximum tokens, top-k, top-p, and safety. See each client class in AutoGen’s API Reference or documentation for details.
- Prompts - prompt engineering is critical in guiding smaller LLMs to do what you need. ConversableAgent, GroupChat, UserProxyAgent, and AssistantAgent all have customizable prompt attributes that you can tailor. Here are some prompting tips from Anthropic(+Library), Mistral AI, Together.AI, and Meta.
- Help! - reach out on the AutoGen Discord or log an issue if you need help with or can help improve these client classes.
Now it’s time to try them out.
Quickstart
Installation
Install the appropriate client based on the model you wish to use.
Configuration Setup
Add your model configurations to the OAI_CONFIG_LIST
. Ensure you specify the api_type
to initialize the respective client (Anthropic, Mistral, or Together).
Usage
The [config_list_from_json](https://ag2ai.github.io/ag2/docs/reference/oai/openai_utils/#config_list_from_json)
function loads a list of configurations from an environment variable or a json file.
Construct Agents
Construct a simple conversation between a User proxy and an Assistant agent
Start chat
NOTE: To integrate this setup into GroupChat, follow the tutorial with the same config as above.
Function Calls
Now, let’s look at how Anthropic’s Sonnet 3.5 is able to suggest multiple function calls in a single response.
This example is a simple travel agent setup with an agent for function calling and a user proxy agent for executing the functions.
One thing you’ll note here is Anthropic’s models are more verbose than OpenAI’s and will typically provide chain-of-thought or general verbiage when replying. Therefore we provide more explicit instructions to functionbot
to not reply with more than necessary. Even so, it can’t always help itself!
Let’s start with setting up our configuration and agents.
We define the two functions.
And then associate them with the user_proxy
for execution and functionbot
for the LLM to consider using them.
Finally, we start the conversation with a request for help from our customer on their upcoming trip to New York and the Euro they would like exchanged to USD.
Importantly, we’re also using Anthropic’s Sonnet to provide a summary through the summary_method
. Using summary_prompt
, we guide Sonnet to give us an email output.
After the conversation has finished, we’ll print out the summary.
Here’s the resulting output.
So we can see how Anthropic’s Sonnet is able to suggest multiple tools in a single response, with AutoGen executing them both and providing the results back to Sonnet. Sonnet then finishes with a nice email summary that can be the basis for continued real-life conversation with the customer.
More tips and tricks
For an interesting chess game between Anthropic’s Sonnet and Mistral’s Mixtral, we’ve put together a sample notebook that highlights some of the tips and tricks for working with non-OpenAI LLMs. See the notebook here.