Skip to content

Realtime API#

RealtimeAgent with Gemini API

Realtime agent communication with Gemini live API

TL;DR:

Why is this important?

We previously supported a Realtime Agent powered by OpenAI. In December 2024, Google rolled out Gemini 2.0, which includes the multi-modal live APIs. These APIs enable advanced capabilities such as real-time processing of audio inputs in live conversational settings. To ensure developers can fully leverage the capabilities of the latest LLMs, we now also support a RealtimeAgent powered by Gemini.

Real-Time Voice Interactions with the WebSocket Audio Adapter

Realtime agent communication over websocket

TL;DR: - Demo implementation: Implement a website using websockets and communicate using voice with the RealtimeAgent - Introducing WebSocketAudioAdapter: Stream audio directly from your browser using WebSockets. - Simplified Development: Connect to real-time agents quickly and effortlessly with minimal setup.

Realtime over WebSockets

In our previous blog post, we introduced a way to interact with the RealtimeAgent using TwilioAudioAdapter. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce theWebSocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser.

This post explores the features, benefits, and implementation of the WebSocketAudioAdapter, showing how it transforms the way we connect with real-time agents.

Introducing RealtimeAgent Capabilities in AG2

TL;DR: - RealtimeAgent is coming in the AG2 0.6 release, enabling real-time conversational AI. - Features include real-time voice interactions, seamless task delegation to Swarm teams, and Twilio-based telephony integration. - Learn how to integrate Twilio and RealtimeAgent into your swarm in this blogpost.

Realtime API Support: What's New?

We're thrilled to announce the release of RealtimeAgent, extending AG2's capabilities to support real-time conversational AI tasks. This new experimental feature makes it possible for developers to build agents capable of handling voice-based interactions with minimal latency, integrating OpenAI’s Realtime API, Twilio for telephony, and AG2’s Swarm orchestration.