2025#

April 16, 2025
in Research, Reasoning
7 min read

The Myth of Reasoning

threads

TL;DR

Human reasoning is often mischaracterized as purely logical; it's iterative, intuitive, and driven by communication needs.
AI can be a valuable partner in augmenting human reasoning.
Viewing AI as a system of components, rather than a monolithic model, may better capture the iterative nature of reasoning.

One major criticism of AI today, including the state-of-the-art LLMs, is that they fall short in reasoning capability compared to humans. This criticism often stems from a fundamental misunderstanding of how human reasoning actually works. We tend to hold up an idealized image of human thought – rational, logical, step-by-step – and judge AI against this standard. But is this image accurate?

February 13, 2025
in Tools, AG2 Agents
3 min read

DeepResearchAgent - Your Shortcut for Faster Research

DeepResearchAgent workflow

February 5, 2025
in AG2 Agents, Tools
9 min read

Get Communicating with Discord, Slack, and Telegram

Welcome DiscordAgent, SlackAgent, and TelegramAgent

We want to help you focus on building workflows and enhancing agents, so we're building reference agents to get you going quicker.

Say hello to three new AG2 communication agents - DiscordAgent, SlackAgent, and TelegramAgent, here so that you can use an agentic application to send and retrieve messages from messaging platforms.

January 31, 2025
in Tools, AG2 Agents
15 min read

Riding the Web with WebSurferAgent

Introduction

In our Adding Browsing Capabilities to AG2 guide, we explored how to build agents with basic web surfing capabilities. Now, let's take it to the next level with WebSurferAgent—a powerful agent that comes with built-in web browsing tools right out of the box!

With WebSurferAgent, your agents can seamlessly browse the web, retrieve real-time information, and interact with web pages—all with minimal setup.

WebSurferAgent Example

January 31, 2025
in Tools
15 min read

Adding Browsing Capabilities to AG2

Introduction

Previously, in our Cross-Framework LLM Tool Integration guide, we combined tools from frameworks like LangChain, CrewAI, and PydanticAI to enhance AG2.

Now, we’re taking AG2 even further by integrating Browser Use and Crawl4AI, enabling agents to navigate websites, extract dynamic content, and interact with web pages. This unlocks new possibilities for automated data collection, web automation, and more. Browser Use Example

January 29, 2025
in Realtime API, Non-OpenAI Models
2 min read

RealtimeAgent with Gemini API

Realtime agent communication with Gemini live API

TL;DR:

RealtimeAgent now supports Gemini Multimodal Live API

Why is this important?

We previously supported a Realtime Agent powered by OpenAI. In December 2024, Google rolled out Gemini 2.0, which includes the multi-modal live APIs. These APIs enable advanced capabilities such as real-time processing of audio inputs in live conversational settings. To ensure developers can fully leverage the capabilities of the latest LLMs, we now also support a RealtimeAgent powered by Gemini.

January 22, 2025
in Tools, Dependency Injection
4 min read

Tools with ChatContext Dependency Injection

Introduction

In this post, we’ll build upon the concepts introduced in our previous blog on Tools with Dependency Injection. We’ll take a deeper look at how ChatContext can be used to manage the flow of conversations in a more structured and secure way.

By using ChatContext, we can track and control the sequence of function calls during a conversation. This is particularly useful in situations where one task must be completed before another — for example, ensuring that a user logs in before they can check their account balance. This approach helps to prevent errors and enhances the security of the system.

Benefits of Using ChatContext: - Flow Control: Ensures tasks are performed in the correct order, reducing the chance of mistakes. - Enhanced Security: Prevents unauthorized actions, such as accessing sensitive data before authentication. - Simplified Debugging: Logs the conversation history, making it easier to trace and resolve issues.

Note

This blog builds on the concepts shared in the notebook.

January 10, 2025
in Structured messages
5 min read

Streaming input and output using WebSockets

Structured messages with websockets client

TL;DR

Learn how to build an agent chat application using WebSockets and IOStream
Explore a hands-on example of connecting a web application to a responsive chat with agents over WebSockets.
Streamlined Real-Time Interactions: WebSockets offer a low-latency, persistent connection for sending and receiving data in real time.

January 9, 2025
in Realtime API
6 min read

Real-Time Voice Interactions over WebRTC

Realtime agent communication over WebRTC

TL;DR: - Build a real-time voice application using WebRTC and connect it with the RealtimeAgent. Demo implementation. - Optimized for Real-Time Interactions: Experience seamless voice communication with minimal latency and enhanced reliability.

January 8, 2025
in Realtime API
6 min read

Real-Time Voice Interactions with the WebSocket Audio Adapter

Realtime agent communication over websocket

TL;DR: - Demo implementation: Implement a website using websockets and communicate using voice with the RealtimeAgent - Introducing WebSocketAudioAdapter: Stream audio directly from your browser using WebSockets. - Simplified Development: Connect to real-time agents quickly and effortlessly with minimal setup.

Realtime over WebSockets

In our previous blog post, we introduced a way to interact with the RealtimeAgent using TwilioAudioAdapter. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce theWebSocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser.

This post explores the features, benefits, and implementation of the WebSocketAudioAdapter, showing how it transforms the way we connect with real-time agents.