Skip to content

2025#

RealtimeAgent with Gemini API

Realtime agent communication with Gemini live API

TL;DR:

Why is this important?

We previously supported a Realtime Agent powered by OpenAI. In December 2024, Google rolled out Gemini 2.0, which includes the multi-modal live APIs. These APIs enable advanced capabilities such as real-time processing of audio inputs in live conversational settings. To ensure developers can fully leverage the capabilities of the latest LLMs, we now also support a RealtimeAgent powered by Gemini.

Tools with ChatContext Dependency Injection

Introduction

In this post, we’ll build upon the concepts introduced in our previous blog on Tools with Dependency Injection. We’ll take a deeper look at how ChatContext can be used to manage the flow of conversations in a more structured and secure way.

By using ChatContext, we can track and control the sequence of function calls during a conversation. This is particularly useful in situations where one task must be completed before another — for example, ensuring that a user logs in before they can check their account balance. This approach helps to prevent errors and enhances the security of the system.

Benefits of Using ChatContext: - Flow Control: Ensures tasks are performed in the correct order, reducing the chance of mistakes. - Enhanced Security: Prevents unauthorized actions, such as accessing sensitive data before authentication. - Simplified Debugging: Logs the conversation history, making it easier to trace and resolve issues.

Note

This blog builds on the concepts shared in the notebook.

Real-Time Voice Interactions with the WebSocket Audio Adapter

Realtime agent communication over websocket

TL;DR: - Demo implementation: Implement a website using websockets and communicate using voice with the RealtimeAgent - Introducing WebSocketAudioAdapter: Stream audio directly from your browser using WebSockets. - Simplified Development: Connect to real-time agents quickly and effortlessly with minimal setup.

Realtime over WebSockets

In our previous blog post, we introduced a way to interact with the RealtimeAgent using TwilioAudioAdapter. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce theWebSocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser.

This post explores the features, benefits, and implementation of the WebSocketAudioAdapter, showing how it transforms the way we connect with real-time agents.

Tools Dependency Injection

Dependency Injection is a secure way to connect external functions to agents without exposing sensitive data such as passwords, tokens, or personal information. This approach ensures that sensitive information remains protected while still allowing agents to perform their tasks effectively, even when working with large language models (LLMs).

In this guide, we’ll explore how to build secure workflows that handle sensitive data safely.

As an example, we’ll create an agent that retrieves user's account balance. The best part is that sensitive data like username and password are never shared with the LLM. Instead, it’s securely injected directly into the function at runtime, keeping it safe while maintaining seamless functionality.

Why Dependency Injection Is Essential

Here’s why dependency injection is a game-changer for secure LLM workflows:

  • Enhanced Security: Your sensitive data is never directly exposed to the LLM.
  • Simplified Development: Secure data can be seamlessly accessed by functions without requiring complex configurations.
  • Unmatched Flexibility: It supports safe integration of diverse workflows, allowing you to scale and adapt with ease.

In this guide, we’ll explore how to set up dependency injection and build secure workflows. Let’s dive in!

Note: This blog builds upon the concepts covered in the following notebook.