We previously supported a Realtime Agent powered by OpenAI. In December 2024, Google rolled out Gemini 2.0, which includes the multi-modal live APIs. These APIs enable advanced capabilities such as real-time processing of audio inputs in live conversational settings. To ensure developers can fully leverage the capabilities of the latest LLMs, we now also support a RealtimeAgent powered by Gemini.
In this post, we’ll build upon the concepts introduced in our previous blog on Tools with Dependency Injection. We’ll take a deeper look at how ChatContext can be used to manage the flow of conversations in a more structured and secure way.
By using ChatContext, we can track and control the sequence of function calls during a conversation. This is particularly useful in situations where one task must be completed before another — for example, ensuring that a user logs in before they can check their account balance. This approach helps to prevent errors and enhances the security of the system.
Benefits of Using ChatContext: - Flow Control: Ensures tasks are performed in the correct order, reducing the chance of mistakes. - Enhanced Security: Prevents unauthorized actions, such as accessing sensitive data before authentication. - Simplified Debugging: Logs the conversation history, making it easier to trace and resolve issues.
Note
This blog builds on the concepts shared in the notebook.
TL;DR: - Build a real-time voice application using WebRTC and connect it with the RealtimeAgent. Demo implementation. - Optimized for Real-Time Interactions: Experience seamless voice communication with minimal latency and enhanced reliability.
TL;DR: - Demo implementation: Implement a website using websockets and communicate using voice with the RealtimeAgent - Introducing WebSocketAudioAdapter: Stream audio directly from your browser using WebSockets. - Simplified Development: Connect to real-time agents quickly and effortlessly with minimal setup.
In our previous blog post, we introduced a way to interact with the RealtimeAgent using TwilioAudioAdapter. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce theWebSocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser.
This post explores the features, benefits, and implementation of the WebSocketAudioAdapter, showing how it transforms the way we connect with real-time agents.
Dependency Injection is a secure way to connect external functions to agents without exposing sensitive data such as passwords, tokens, or personal information. This approach ensures that sensitive information remains protected while still allowing agents to perform their tasks effectively, even when working with large language models (LLMs).
In this guide, we’ll explore how to build secure workflows that handle sensitive data safely.
As an example, we’ll create an agent that retrieves user's account balance. The best part is that sensitive data like username and password are never shared with the LLM. Instead, it’s securely injected directly into the function at runtime, keeping it safe while maintaining seamless functionality.