Blog#
Get Communicating with Discord, Slack, and Telegram
Welcome DiscordAgent, SlackAgent, and TelegramAgent
We want to help you focus on building workflows and enhancing agents, so we're building reference agents to get you going quicker.
Say hello to three new AG2 communication agents - DiscordAgent
, SlackAgent
, and TelegramAgent
, here so that you can use an agentic application to send and retrieve messages from messaging platforms.
Riding the Web with WebSurferAgent
Introduction
In our Adding Browsing Capabilities to AG2 guide, we explored how to build agents with basic web surfing capabilities. Now, let's take it to the next level with WebSurferAgent
—a powerful agent that comes with built-in web browsing tools right out of the box!
With WebSurferAgent
, your agents can seamlessly browse the web, retrieve real-time information, and interact with web pages—all with minimal setup.
Adding Browsing Capabilities to AG2
Introduction
Previously, in our Cross-Framework LLM Tool Integration guide, we combined tools from frameworks like LangChain, CrewAI, and PydanticAI to enhance AG2.
Now, we’re taking AG2 even further by integrating Browser Use
and Crawl4AI
, enabling agents to navigate websites, extract dynamic content, and interact with web pages. This unlocks new possibilities for automated data collection, web automation, and more.
RealtimeAgent with Gemini API
TL;DR:
- RealtimeAgent now supports Gemini Multimodal Live API
Why is this important?
We previously supported a Realtime Agent powered by OpenAI. In December 2024, Google rolled out Gemini 2.0, which includes the multi-modal live APIs. These APIs enable advanced capabilities such as real-time processing of audio inputs in live conversational settings. To ensure developers can fully leverage the capabilities of the latest LLMs, we now also support a RealtimeAgent powered by Gemini.
Tools with ChatContext Dependency Injection
Introduction
In this post, we’ll build upon the concepts introduced in our previous blog on Tools with Dependency Injection. We’ll take a deeper look at how ChatContext
can be used to manage the flow of conversations in a more structured and secure way.
By using ChatContext
, we can track and control the sequence of function calls during a conversation. This is particularly useful in situations where one task must be completed before another — for example, ensuring that a user logs in before they can check their account balance. This approach helps to prevent errors and enhances the security of the system.
Benefits of Using ChatContext
: - Flow Control: Ensures tasks are performed in the correct order, reducing the chance of mistakes. - Enhanced Security: Prevents unauthorized actions, such as accessing sensitive data before authentication. - Simplified Debugging: Logs the conversation history, making it easier to trace and resolve issues.
Note
This blog builds on the concepts shared in the notebook.
Streaming input and output using WebSockets
TL;DR
- Learn how to build an agent chat application using WebSockets and
IOStream
- Explore a hands-on example of connecting a web application to a responsive chat with agents over WebSockets.
- Streamlined Real-Time Interactions: WebSockets offer a low-latency, persistent connection for sending and receiving data in real time.
Real-Time Voice Interactions over WebRTC
TL;DR: - Build a real-time voice application using WebRTC and connect it with the RealtimeAgent
. Demo implementation. - Optimized for Real-Time Interactions: Experience seamless voice communication with minimal latency and enhanced reliability.
Real-Time Voice Interactions with the WebSocket Audio Adapter
TL;DR: - Demo implementation: Implement a website using websockets and communicate using voice with the RealtimeAgent
- Introducing WebSocketAudioAdapter
: Stream audio directly from your browser using WebSockets. - Simplified Development: Connect to real-time agents quickly and effortlessly with minimal setup.
Realtime over WebSockets
In our previous blog post, we introduced a way to interact with the RealtimeAgent
using TwilioAudioAdapter
. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce theWebSocketAudioAdapter
, a streamlined approach to real-time audio streaming directly via a web browser.
This post explores the features, benefits, and implementation of the WebSocketAudioAdapter
, showing how it transforms the way we connect with real-time agents.
Tools Dependency Injection
Dependency Injection is a secure way to connect external functions to agents without exposing sensitive data such as passwords, tokens, or personal information. This approach ensures that sensitive information remains protected while still allowing agents to perform their tasks effectively, even when working with large language models (LLMs).
In this guide, we’ll explore how to build secure workflows that handle sensitive data safely.
As an example, we’ll create an agent that retrieves user's account balance. The best part is that sensitive data like username and password are never shared with the LLM. Instead, it’s securely injected directly into the function at runtime, keeping it safe while maintaining seamless functionality.
Why Dependency Injection Is Essential
Here’s why dependency injection is a game-changer for secure LLM workflows:
- Enhanced Security: Your sensitive data is never directly exposed to the LLM.
- Simplified Development: Secure data can be seamlessly accessed by functions without requiring complex configurations.
- Unmatched Flexibility: It supports safe integration of diverse workflows, allowing you to scale and adapt with ease.
In this guide, we’ll explore how to set up dependency injection and build secure workflows. Let’s dive in!
Note: This blog builds upon the concepts covered in the following notebook.