Skip to content

Blog#

Get Communicating with Discord, Slack, and Telegram

Welcome DiscordAgent, SlackAgent, and TelegramAgent

We want to help you focus on building workflows and enhancing agents, so we're building reference agents to get you going quicker.

Say hello to three new AG2 communication agents - DiscordAgent, SlackAgent, and TelegramAgent, here so that you can use an agentic application to send and retrieve messages from messaging platforms.

Riding the Web with WebSurferAgent

Introduction

In our Adding Browsing Capabilities to AG2 guide, we explored how to build agents with basic web surfing capabilities. Now, let's take it to the next level with WebSurferAgent—a powerful agent that comes with built-in web browsing tools right out of the box!

With WebSurferAgent, your agents can seamlessly browse the web, retrieve real-time information, and interact with web pages—all with minimal setup.

WebSurferAgent Example

Adding Browsing Capabilities to AG2

Introduction

Previously, in our Cross-Framework LLM Tool Integration guide, we combined tools from frameworks like LangChain, CrewAI, and PydanticAI to enhance AG2.

Now, we’re taking AG2 even further by integrating Browser Use and Crawl4AI, enabling agents to navigate websites, extract dynamic content, and interact with web pages. This unlocks new possibilities for automated data collection, web automation, and more. Browser Use Example

RealtimeAgent with Gemini API

Realtime agent communication with Gemini live API

TL;DR:

Why is this important?

We previously supported a Realtime Agent powered by OpenAI. In December 2024, Google rolled out Gemini 2.0, which includes the multi-modal live APIs. These APIs enable advanced capabilities such as real-time processing of audio inputs in live conversational settings. To ensure developers can fully leverage the capabilities of the latest LLMs, we now also support a RealtimeAgent powered by Gemini.

Tools with ChatContext Dependency Injection

Introduction

In this post, we’ll build upon the concepts introduced in our previous blog on Tools with Dependency Injection. We’ll take a deeper look at how ChatContext can be used to manage the flow of conversations in a more structured and secure way.

By using ChatContext, we can track and control the sequence of function calls during a conversation. This is particularly useful in situations where one task must be completed before another — for example, ensuring that a user logs in before they can check their account balance. This approach helps to prevent errors and enhances the security of the system.

Benefits of Using ChatContext: - Flow Control: Ensures tasks are performed in the correct order, reducing the chance of mistakes. - Enhanced Security: Prevents unauthorized actions, such as accessing sensitive data before authentication. - Simplified Debugging: Logs the conversation history, making it easier to trace and resolve issues.

Note

This blog builds on the concepts shared in the notebook.

Real-Time Voice Interactions with the WebSocket Audio Adapter

Realtime agent communication over websocket

TL;DR: - Demo implementation: Implement a website using websockets and communicate using voice with the RealtimeAgent - Introducing WebSocketAudioAdapter: Stream audio directly from your browser using WebSockets. - Simplified Development: Connect to real-time agents quickly and effortlessly with minimal setup.

Realtime over WebSockets

In our previous blog post, we introduced a way to interact with the RealtimeAgent using TwilioAudioAdapter. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce theWebSocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser.

This post explores the features, benefits, and implementation of the WebSocketAudioAdapter, showing how it transforms the way we connect with real-time agents.

Tools Dependency Injection

Dependency Injection is a secure way to connect external functions to agents without exposing sensitive data such as passwords, tokens, or personal information. This approach ensures that sensitive information remains protected while still allowing agents to perform their tasks effectively, even when working with large language models (LLMs).

In this guide, we’ll explore how to build secure workflows that handle sensitive data safely.

As an example, we’ll create an agent that retrieves user's account balance. The best part is that sensitive data like username and password are never shared with the LLM. Instead, it’s securely injected directly into the function at runtime, keeping it safe while maintaining seamless functionality.

Why Dependency Injection Is Essential

Here’s why dependency injection is a game-changer for secure LLM workflows:

  • Enhanced Security: Your sensitive data is never directly exposed to the LLM.
  • Simplified Development: Secure data can be seamlessly accessed by functions without requiring complex configurations.
  • Unmatched Flexibility: It supports safe integration of diverse workflows, allowing you to scale and adapt with ease.

In this guide, we’ll explore how to set up dependency injection and build secure workflows. Let’s dive in!

Note: This blog builds upon the concepts covered in the following notebook.