RealtimeAgent in a Swarm Orchestration
Swarm Ochestration
AG2 supports RealtimeAgent, a powerful agent type that connects seamlessly to Gemini Multimodal Live API. With RealtimeAgent, you can add voice interaction and listening capabilities to your swarms, enabling dynamic and natural communication.
AG2 provides an intuitive programming interface to build and orchestrate swarms of agents. With RealtimeAgent, you can enhance swarm functionality, integrating real-time interactions alongside task automation. Check the Documentation and Blog for further insights.
In this notebook, we implement OpenAI’s airline customer service example in AG2 using the RealtimeAgent for enhanced interaction.
Note: This notebook cannot be run in Google Colab because it depends
on local JavaScript files and HTML templates. To execute the notebook
successfully, run it locally within the cloned project so that the
notebooks/agentchat_realtime_websocket/static
and
notebooks/agentchat_realtime_websocket/templates
folders are available
in the correct relative paths.
Install ag2
:
Install AG2 with fastapi and uvicorn dependencies
To use the realtime agent we will connect it to local fastapi service.
We have prepared a WebSocketAudioAdapter
to enable you to connect your
realtime agent to local fastapi service.
To be able to run this notebook, you will need to install ag2 with additional dependencies.
Import the dependencies
Prepare your llm_config
and realtime_llm_config
The
config_list_from_json
function loads a list of configurations from an environment variable or
a json file.
Prompts & Utility Functions
The prompts and utility functions remain unchanged from the original example.
Define Agents and register functions
Register Handoffs
Now we register the handoffs for the agents. Note that you don’t need to
define the transfer functions and pass them in. Instead, you can
directly register the handoffs using the ON_CONDITION
class.
Before you start the server
To run uvicorn server inside the notebook, you will need to use nest_asyncio. This is because Jupyter uses the asyncio event loop, and uvicorn uses its own event loop. nest_asyncio will allow uvicorn to run in Jupyter.
Please install nest_asyncio by running the following cell.
Define basic FastAPI app
- Define Port: Sets the
PORT
variable to5050
, which will be used for the server. - Initialize FastAPI App: Creates a
FastAPI
instance namedapp
, which serves as the main application. - Define Root Endpoint: Adds a
GET
endpoint at the root URL (/
). When accessed, it returns a JSON response with the message"Websocket Audio Stream Server is running!"
.
This sets up a basic FastAPI server and provides a simple health-check endpoint to confirm that the server is operational.
Prepare start-chat
endpoint
- Set the Working Directory: Define
notebook_path
as the current working directory usingos.getcwd()
. - Mount Static Files: Mount the
static
directory (insideagentchat_realtime_websocket
) to serve JavaScript, CSS, and other static assets under the/static
path. - Set Up Templates: Configure Jinja2 to render HTML templates from
the
templates
directory withinagentchat_realtime_websocket
. - Create the
/start-chat/
Endpoint: Define aGET
route that serves thechat.html
template. Pass the client’srequest
and theport
variable to the template for rendering a dynamic page for the audio chat interface.
This code sets up static file handling, template rendering, and a dedicated endpoint to deliver the chat interface.
Prepare endpint for converstion audio stream
- Set Up the WebSocket Endpoint: Define the
/media-stream
WebSocket route to handle audio streaming. - Accept WebSocket Connections: Accept incoming WebSocket connections from clients.
- Initialize Logger: Retrieve a logger instance for logging purposes.
- Configure Audio Adapter: Instantiate a
WebSocketAudioAdapter
, connecting the WebSocket to handle audio streaming with logging. - Set Up Realtime Agent: Create a
RealtimeAgent
with the following:- Name:
Airline_Realtime_Agent
. - System Message: Introduces the AI assistant and its capabilities.
- LLM Configuration: Uses
realtime_llm_config
for language model settings. - Audio Adapter: Leverages the previously created
audio_adapter
. - Logger: Logs activities for debugging and monitoring.
- Name:
- Register a swarm: Register a swarm to
RealtimeAgent
enabling it to respond to basic flight queries. - Run the Agent: Start the
realtime_agent
to handle interactions in real time.