Groq
Groq is a cloud based platform serving a number of popular open weight models at high inference speeds. Models include Meta’s Llama 3, Mistral AI’s Mixtral, and Google’s Gemma.
Although Groq’s API is aligned well with OpenAI’s, which is the native API used by AG2, this library provides the ability to set specific parameters as well as track API costs.
You will need a Groq account and create an API key. See their website for further details.
Groq provides a number of models to use, included below. See the list of models here (requires login).
See the sample OAI_CONFIG_LIST
below showing how the Groq client class is used by specifying the api_type
as groq
.
As an alternative to the api_key
key and value in the config, you can set the environment variable GROQ_API_KEY
to your Groq key.
Linux/Mac:
Windows:
API parameters
The following parameters can be added to your config for the Groq API. See this link for further information on them.
- frequency_penalty (number 0..1)
- max_tokens (integer >= 0)
- presence_penalty (number -2..2)
- seed (integer)
- temperature (number 0..2)
- top_p (number)
Example:
Two-Agent Coding Example
In this example, we run a two-agent chat with an AssistantAgent (primarily a coding agent) to generate code to count the number of prime numbers between 1 and 10,000 and then it will be executed.
We’ll use Meta’s Llama 3 model which is suitable for coding.
Tool Call Example
In this example, instead of writing code, we will show how we can use Meta’s Llama 3 model to perform parallel tool calling, where it recommends calling more than one tool at a time, using Groq’s cloud inference.
We’ll use a simple travel agent assistant program where we have a couple of tools for weather and currency conversion.
We start by importing libraries and setting up our configuration to use Meta’s Llama 3 model and the groq
client class.
Using its fast inference, Groq required less than 2 seconds for the whole chat!
Additionally, Llama 3 was able to call both tools and pass through the right parameters. The user_proxy
then executed them and this was passed back for Llama 3 to summarise the whole conversation.