Blog#

January 29, 2025
in Realtime API, Non-OpenAI Models
2 min read

RealtimeAgent with Gemini API

Realtime agent communication with Gemini live API

TL;DR:

RealtimeAgent now supports Gemini Multimodal Live API

We previously supported a Realtime Agent powered by OpenAI. In December 2024, Google rolled out Gemini 2.0, which includes the multi-modal live APIs. These APIs enable advanced capabilities such as real-time processing of audio inputs in live conversational settings. To ensure developers can fully leverage the capabilities of the latest LLMs, we now also support a RealtimeAgent powered by Gemini.

January 22, 2025
in Tools, Dependency Injection
4 min read

Tools with ChatContext Dependency Injection

Introduction

In this post, we’ll build upon the concepts introduced in our previous blog on Tools with Dependency Injection. We’ll take a deeper look at how ChatContext can be used to manage the flow of conversations in a more structured and secure way.

By using ChatContext, we can track and control the sequence of function calls during a conversation. This is particularly useful in situations where one task must be completed before another — for example, ensuring that a user logs in before they can check their account balance. This approach helps to prevent errors and enhances the security of the system.

Benefits of Using ChatContext: - Flow Control: Ensures tasks are performed in the correct order, reducing the chance of mistakes. - Enhanced Security: Prevents unauthorized actions, such as accessing sensitive data before authentication. - Simplified Debugging: Logs the conversation history, making it easier to trace and resolve issues.

Note

This blog builds on the concepts shared in the notebook.

January 10, 2025
in Structured messages
5 min read

Streaming input and output using WebSockets

Structured messages with websockets client

TL;DR

Learn how to build an agent chat application using WebSockets and IOStream
Explore a hands-on example of connecting a web application to a responsive chat with agents over WebSockets.
Streamlined Real-Time Interactions: WebSockets offer a low-latency, persistent connection for sending and receiving data in real time.

January 9, 2025
in Realtime API
6 min read

Real-Time Voice Interactions over WebRTC

Realtime agent communication over WebRTC

TL;DR: - Build a real-time voice application using WebRTC and connect it with the RealtimeAgent. Demo implementation. - Optimized for Real-Time Interactions: Experience seamless voice communication with minimal latency and enhanced reliability.

January 8, 2025
in Realtime API
6 min read

Real-Time Voice Interactions with the WebSocket Audio Adapter

Realtime agent communication over websocket

TL;DR: - Demo implementation: Implement a website using websockets and communicate using voice with the RealtimeAgent - Introducing WebSocketAudioAdapter: Stream audio directly from your browser using WebSockets. - Simplified Development: Connect to real-time agents quickly and effortlessly with minimal setup.

Realtime over WebSockets

In our previous blog post, we introduced a way to interact with the RealtimeAgent using TwilioAudioAdapter. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce theWebSocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser.

This post explores the features, benefits, and implementation of the WebSocketAudioAdapter, showing how it transforms the way we connect with real-time agents.

January 7, 2025
in Tools
6 min read

Tools Dependency Injection

Dependency Injection is a secure way to connect external functions to agents without exposing sensitive data such as passwords, tokens, or personal information. This approach ensures that sensitive information remains protected while still allowing agents to perform their tasks effectively, even when working with large language models (LLMs).

In this guide, we’ll explore how to build secure workflows that handle sensitive data safely.

As an example, we’ll create an agent that retrieves user's account balance. The best part is that sensitive data like username and password are never shared with the LLM. Instead, it’s securely injected directly into the function at runtime, keeping it safe while maintaining seamless functionality.

Why Dependency Injection Is Essential

Here’s why dependency injection is a game-changer for secure LLM workflows:

Enhanced Security: Your sensitive data is never directly exposed to the LLM.
Simplified Development: Secure data can be seamlessly accessed by functions without requiring complex configurations.
Unmatched Flexibility: It supports safe integration of diverse workflows, allowing you to scale and adapt with ease.

In this guide, we’ll explore how to set up dependency injection and build secure workflows. Let’s dive in!

Note: This blog builds upon the concepts covered in the following notebook.

December 20, 2024
in Realtime API, Swarm
11 min read

Introducing RealtimeAgent Capabilities in AG2

TL;DR: - RealtimeAgent is coming in the AG2 0.6 release, enabling real-time conversational AI. - Features include real-time voice interactions, seamless task delegation to Swarm teams, and Twilio-based telephony integration. - Learn how to integrate Twilio and RealtimeAgent into your swarm in this blogpost.

Realtime API Support: What's New?

We're thrilled to announce the release of RealtimeAgent, extending AG2's capabilities to support real-time conversational AI tasks. This new experimental feature makes it possible for developers to build agents capable of handling voice-based interactions with minimal latency, integrating OpenAI’s Realtime API, Twilio for telephony, and AG2’s Swarm orchestration.

December 20, 2024
in Research, Tutorial
5 min read

ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning

Key Updates in this Release:

Configuration Changes
All reasoning parameters are now configured through a single reason_config dictionary
Breaking Change: Parameters like max_depth, beam_size, and answer_approach have moved from constructor arguments into reason_config
New Search Strategies
Added Monte Carlo Tree Search (MCTS) as an alternative to Beam Search
Introduced Language Agent Tree Search (LATS) - an enhancement to MCTS that incorporates reflection prior to the next round of simulation.
Enhanced Features
New forest_size parameter enables maintaining multiple independent reasoning trees
Support for ground truth answers in prompts to generate training data for LLM fine-tuning

Tree of Thoughts

Introduction

In our previous post, we introduced the ReasoningAgent, which utilized Beam Search for systematic reasoning. Today, we include MCTS (Monte Carlo Tree Search) and Language Agent Tree Search (LATS) as alternative search strategies, which present advantages in different scenarios.

Our previous ReasoningAgent draws inspiration from OpenAI's 2023 paper, Let's Verify Step by Step, as well as the 2024 O1 feature. The landscape of contemporary research is rich, with notable works such as DeepSeek-R1, Macro-O1, and OpenR.

December 20, 2024
in Tools, Non-OpenAI Models
12 min read

Cross-Framework LLM Tool Integration with AG2

TL;DR AG2 lets you bring in Tools from different frameworks like LangChain, CrewAI, and PydanticAI.

LangChain Tools: Useful for tasks like API querying and web scraping.
CrewAI Tools: Offers a variety of tools for web scraping, search, and more.
PydanticAI Tools: Adds context-driven tools and structured data processing.

December 6, 2024
in RAG, Swarm
5 min read

Knowledgeable Agents with FalkorDB Graph RAG

FalkorDB Web

TL;DR: * We introduce a new ability for AG2 agents, Graph RAG with FalkorDB, providing the power of knowledge graphs * Structured outputs, using OpenAI models, provide strict adherence to data models to improve reliability and agentic flows * Nested chats are now available with a Swarm

FalkorDB Graph RAG

Typically, RAG uses vector databases, which store information as embeddings, mathematical representations of data points. When a query is received, it's also converted into an embedding, and the vector database retrieves the most similar embeddings based on distance metrics.

Graph-based RAG, on the other hand, leverages graph databases, which represent knowledge as a network of interconnected entities and relationships. When a query is received, Graph RAG traverses the graph to find relevant information based on the query's structure and semantics.