Skip to content

LLM#

ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning

Key Updates in this Release:

  1. Configuration Changes
  2. All reasoning parameters are now configured through a single reason_config dictionary
  3. Breaking Change: Parameters like max_depth, beam_size, and answer_approach have moved from constructor arguments into reason_config

  4. New Search Strategies

  5. Added Monte Carlo Tree Search (MCTS) as an alternative to Beam Search
  6. Introduced Language Agent Tree Search (LATS) - an enhancement to MCTS that incorporates reflection prior to the next round of simulation.

  7. Enhanced Features

  8. New forest_size parameter enables maintaining multiple independent reasoning trees
  9. Support for ground truth answers in prompts to generate training data for LLM fine-tuning

Tree of Thoughts

Introduction

In our previous post, we introduced the ReasoningAgent, which utilized Beam Search for systematic reasoning. Today, we include MCTS (Monte Carlo Tree Search) and Language Agent Tree Search (LATS) as alternative search strategies, which present advantages in different scenarios.

Our previous ReasoningAgent draws inspiration from OpenAI's 2023 paper, Let's Verify Step by Step, as well as the 2024 O1 feature. The landscape of contemporary research is rich, with notable works such as DeepSeek-R1, Macro-O1, and OpenR.

Cross-Framework LLM Tool Integration with AG2

TL;DR AG2 lets you bring in Tools from different frameworks like LangChain, CrewAI, and PydanticAI.

  • LangChain Tools: Useful for tasks like API querying and web scraping.
  • CrewAI Tools: Offers a variety of tools for web scraping, search, and more.
  • PydanticAI Tools: Adds context-driven tools and structured data processing.

With AG2, you can combine these tools and enhance your agents' capabilities.

In this post, we’ll walk through how to integrate tools from various frameworks—like LangChain Tools, CrewAI Tools, and PydanticAI Tools—into AG2.

Because, really, the magic happens when you combine them all. This allows you to use tools from different frameworks within AG2, giving your agents more power and flexibility. This blog builds upon the concepts covered in the Tool Integration notebook.

In this post, you will understand how to configure agents, adapt these tools for use in AG2, and validate the integration through practical examples.

LangChain is a popular framework with lots of tools for working with LLMs. It's got a range of tools that can be easily integrated into AG2. If you want to see the full list, check out the LangChain Community Tools. You can quickly add things like API queries, web scraping, and text generation to your AG2 setup.

Installation

To get LangChain tools working with AG2, you’ll need to install a couple of dependencies:

pip install ag2[openai,interop-langchain]

If you have been using autogen or pyautogen, all you need to do is upgrade it using:

pip install -U autogen[openai,interop-langchain]
or
pip install -U pyautogen[openai,interop-langchain]
as pyautogen, autogen, and ag2 are aliases for the same PyPI package.

Also, we’ll use LangChain’s Wikipedia Tool, which needs the wikipedia package. Install it like this:

pip install wikipedia

Imports

Now, let’s import the necessary modules and tools.

import os

from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

from autogen import AssistantAgent, UserProxyAgent, LLMConfig
from autogen.interop import Interoperability

Agent Configuration

Let’s set up the agents for interaction. - config_list is where you define the LLM configuration, like the model and API key. - UserProxyAgent simulates user inputs without requiring actual human interaction (set to NEVER). - AssistantAgent represents the AI agent, configured with the LLM settings.

llm_config = LLMConfig(api_type="openai", model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
)

with llm_config:
    chatbot = AssistantAgent(name="chatbot")

Tool Integration

Here’s where we connect everything. - First, we set up WikipediaAPIWrapper, which fetches the top Wikipedia result (with a character limit). - Then, we use WikipediaQueryRun to perform Wikipedia queries. - Interoperability helps convert the LangChain tool to AG2’s format. - Finally, we register the tool for use with both the user_proxy and chatbot.

api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=1000)
langchain_tool = WikipediaQueryRun(api_wrapper=api_wrapper)

interop = Interoperability()
ag2_tool = interop.convert_tool(tool=langchain_tool, type="langchain")

ag2_tool.register_for_execution(user_proxy)
ag2_tool.register_for_llm(chatbot)

Initiating the Chat

Once everything’s set up, we can send a message to the chatbot, and it’ll use the Wikipedia tool to fetch the relevant information.

message = "Tell me about the history of the United States"
user_proxy.initiate_chat(recipient=chatbot, message=message, max_turns=2)

Output

When the chat is initiated, here’s the output you’ll see:

User (to chatbot):

Tell me about the history of the United States

--------------------------------------------------------------------------------
chatbot (to User):

***** Suggested tool call (call_hhy2G43ymytUFmJlDsK9J0tk): wikipedia *****
Arguments:
{"tool_input":{"query":"history of the United States"}}
**************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION wikipedia...
User (to chatbot):

***** Response from calling tool (call_hhy2G43ymytUFmJlDsK9J0tk) *****
Page: History of the United States
Summary: The history of the lands that became the United States began with the arrival of the first people in the Americas around 15,000 BC. After European colonization of North America began in the late 15th century, wars and epidemics decimated Indigenous societies. By the 1760s, the thirteen British colonies were established. The Southern Colonies built an agricultural system on slave labor, enslaving millions from Africa. After defeating France, the British Parliament imposed a series of taxes; resistance to these taxes, especially the Boston Tea Party in 1773, led to Parliament issuing the Intolerable Acts designed to end self-government.
In 1776, the United States declared its independence. Led by General George Washington, it won the Revolutionary War in 1783. The Constitution was adopted in 1789, and a Bill of Rights was added in 1791 to guarantee inalienable rights. Washington, the first president, and his adviser Alexander Hamilton created a
**********************************************************************

--------------------------------------------------------------------------------
chatbot (to User):

The history of the United States begins with the arrival of the first peoples in the Americas around 15,000 BC. This pre-Columbian era was followed by European colonization, beginning in the late 15th century, which dramatically altered the indigenous societies through wars and epidemics.

By the 1760s, thirteen British colonies were established along the Atlantic seaboard. In the Southern Colonies, an agricultural economy heavily reliant on enslaved labor from Africa was developed. The British victory over France in the Seven Years' War led Parliament to impose various taxes on the colonies. Resistance to these taxes, exemplified by the Boston Tea Party in 1773, prompted the Parliament to enact the Intolerable Acts, seeking to curtail colonial self-governance.

The United States declared independence in 1776. Under the leadership of General George Washington, the American Revolutionary War concluded successfully in 1783. Subsequently, the U.S. Constitution was adopted in 1789, with the Bill of Rights added in 1791 to ensure inalienable rights. During this early period, President George Washington and his advisor Alexander Hamilton played significant roles in forming the young nation's governmental and economic foundations.

This overview covers the early formation and foundational moments of what became the United States, setting the stage for the country's subsequent expansion and development. TERMINATE

--------------------------------------------------------------------------------

CrewAI provides a variety of powerful tools designed for tasks such as web scraping, search, code interpretation, and more. These tools are easy to integrate into the AG2 framework, allowing you to enhance your agents with advanced capabilities. You can explore the full list of available tools in the CrewAI Tools repository.

Installation

At the moment CrewAI tool work for python < 3.13, so to use them, please make sure you are using an appropriate version of python.

Install the required packages for integrating CrewAI tools into the AG2 framework. This ensures all dependencies for both frameworks are installed.

pip install ag2[openai,interop-crewai]

Imports

Import necessary modules and tools. - ScrapeWebsiteTool: A CrewAI tool for web scraping. - AssistantAgent and UserProxyAgent: Core AG2 classes. - Interoperability: This module acts as a bridge, making it easier to integrate CrewAI tools with AG2’s architecture.

import os

from crewai_tools import ScrapeWebsiteTool

from autogen import AssistantAgent, UserProxyAgent, LLMConfig
from autogen.interop import Interoperability

Agent Configuration

Configure the agents for the interaction. - config_list defines the LLM configurations, including the model and API key. - UserProxyAgent simulates user inputs without requiring actual human interaction (set to NEVER). - AssistantAgent represents the AI agent, configured with the LLM settings.

llm_config = LLMConfig(api_type="openai", model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
)

with llm_config:
    chatbot = AssistantAgent(name="chatbot")

Tool Integration

Integrate the CrewAI tool with AG2. - Interoperability converts the CrewAI tool to a format compatible with AG2. - ScrapeWebsiteTool is used for web scraping tasks. - Register the tool for both execution and interaction with LLMs.

interop = Interoperability()
crewai_tool = ScrapeWebsiteTool()
ag2_tool = interop.convert_tool(tool=crewai_tool, type="crewai")

ag2_tool.register_for_execution(user_proxy)
ag2_tool.register_for_llm(chatbot)

Initiating the chat

Initiate the conversation between the UserProxyAgent and the AssistantAgent to utilize the CrewAI tool.

message = "Scrape the website https://ag2.ai/"
chat_result = user_proxy.initiate_chat(recipient=chatbot, message=message, max_turns=2)

Output

The chatbot provides results based on the web scraping operation:

User (to chatbot):

Scrape the website https://ag2.ai/

--------------------------------------------------------------------------------
chatbot (to User):

***** Suggested tool call (call_ZStuwmexfN7j56uJKOi6BCid): Read_website_content *****
Arguments:
{"args":{"website_url":"https://ag2.ai/"}}
*************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION Read_website_content...
Using Tool: Read website content
User (to chatbot):

***** Response from calling tool (call_ZStuwmexfN7j56uJKOi6BCid) *****

AgentOS
Join our growing community of over 20,000 agent builders Join our growing community of over 20,000 agent builders The Open-Source AgentOS Build production-ready multi-agent systems in minutes, not months. Github Discord The End-to-End Platform for Multi-Agent Automation The End-to-End Platform for Multi-Agent Automation Flexible Agent Construction and Orchestration Create specialized agents that work together seamlessly. AG2 makes it easy to define roles, configure behaviors, and orchestrate collaboration - all through simple, intuitive code. → Assistant agents for problem-solving → Executor agents for taking action → Critic agents for validation → Group chat managers for coordination Built-in Conversation Patterns Built-in Conversation Patterns Stop wrestling with agent coordination. AG2 handles message routing, state management, and conversation flow automatically. → Two-agent conversations → Group chats with dynamic speaker selection → Sequential chats with context carryover → Nested conversations for modularity Seamless Human-AI collaboration Seamless Human-AI collaboration Seamlessly integrate human oversight and input into your agent workflows. → Configurable human input modes → Flexible intervention points → Optional human approval workflows → Interactive conversation interfaces → Context-aware human handoff Roadmap AG2 STUDIO → Visual agent system design → Real-time testing and debugging → One-click deployment to production → Perfect for prototyping and MVPs AG2 STUDIO → Visual agent system design → Real-time testing and debugging → One-click deployment to production → Perfect for prototyping and MVPs AG2 STUDIO → Visual agent system design → Real-time testing and debugging → One-click deployment to production → Perfect for prototyping and MVPs AG2 MARKETPLACE → Share and monetize your agents → Discover pre-built solution templates → Quick-start your agent development → Connect with other builders AG2 MARKETPLACE → Share and monetize your agents → Discover pre-built solution templates → Quick-start your agent development → Connect with other builders AG2 MARKETPLACE → Share and monetize your agents → Discover pre-built solution templates → Quick-start your agent development → Connect with other builders SCALING TOOLS → Zero to production deployment guides → Usage analytics and cost optimization → Team collaboration features → Enterprise-ready security controls SCALING TOOLS → Zero to production deployment guides → Usage analytics and cost optimization → Team collaboration features → Enterprise-ready security controls SCALING TOOLS → Zero to production deployment guides → Usage analytics and cost optimization → Team collaboration features → Enterprise-ready security controls AG2 STUDIO → Visual agent system design → Real-time testing and debugging → One-click deployment to production → Perfect for prototyping and MVPs AG2 STUDIO → Visual agent system design → Real-time testing and debugging → One-click deployment to production → Perfect for prototyping and MVPs AG2 MARKETPLACE → Share and monetize your agents → Discover pre-built solution templates → Quick-start your agent development → Connect with other builders AG2 MARKETPLACE → Share and monetize your agents → Discover pre-built solution templates → Quick-start your agent development → Connect with other builders SCALING TOOLS → Zero to production deployment guides → Usage analytics and cost optimization → Team collaboration features → Enterprise-ready security controls SCALING TOOLS → Zero to production deployment guides → Usage analytics and cost optimization → Team collaboration features → Enterprise-ready security controls Whether you're a solo founder prototyping the next big AI product, or an enterprise team deploying at scale we're building AG2 for you. This is AgentOS - making multi-agent development accessible to everyone. Github Join Our Growing Community Join Our Growing Community → 20,000+ active agent builders → Daily technical discussions → Weekly community calls → Open RFC process → Regular contributor events (Coming soon) Discord Problem Features Roadmap Community Documentation Problem Features Roadmap Community Documentation Problem Features Roadmap Community Documentation

**********************************************************************

--------------------------------------------------------------------------------
chatbot (to User):

The website "https://ag2.ai/" promotes a platform named AgentOS, which is designed for building multi-agent systems efficiently. Key highlights from the website are:

- **Community**: They have a growing community of over 20,000 agent builders.

- **End-to-End Platform**: AG2 is described as an end-to-end platform for multi-agent automation. It supports flexible agent construction and orchestration, helping to define roles, configure behaviors, and orchestrate collaboration.

- **Agent Types**: It includes assistant agents for problem-solving, executor agents for taking action, critic agents for validation, and group chat managers for coordination.

- **Built-in Conversation Patterns**: AG2 offers capabilities for message routing, state management, and conversation flow management, supporting various conversation types like two-agent conversations, group chats, and nested conversations.

- **Human-AI Collaboration**: The platform facilitates seamless integration of human oversight and input, with options for human intervention and approval workflows.

- **AG2 Studio**: This feature provides visual agent system design, real-time testing, debugging, and one-click deployment, suited for prototyping and MVPs.

- **AG2 Marketplace**: Provides a place to share, monetize agents, discover pre-built solution templates, and connect with other builders.

- **Scaling Tools**: Includes guides for deployment, analytics, cost optimization, team collaboration features, and enterprise-ready security controls.

- **Community and Documentation**: They encourage connecting through GitHub and Discord and have regular community calls and events planned.

This comprehensive platform seems to aim at both individual developers and enterprise teams looking to deploy multi-agent systems effectively and collaboratively. TERMINATE

--------------------------------------------------------------------------------

You can also access a detailed summary of the interaction:

print(chat_result.summary)
The website "https://ag2.ai/" promotes a platform named AgentOS, which is designed for building multi-agent systems efficiently. Key highlights from the website are:

- **Community**: They have a growing community of over 20,000 agent builders.

- **End-to-End Platform**: AG2 is described as an end-to-end platform for multi-agent automation. It supports flexible agent construction and orchestration, helping to define roles, configure behaviors, and orchestrate collaboration.

- **Agent Types**: It includes assistant agents for problem-solving, executor agents for taking action, critic agents for validation, and group chat managers for coordination.

- **Built-in Conversation Patterns**: AG2 offers capabilities for message routing, state management, and conversation flow management, supporting various conversation types like two-agent conversations, group chats, and nested conversations.

- **Human-AI Collaboration**: The platform facilitates seamless integration of human oversight and input, with options for human intervention and approval workflows.

- **AG2 Studio**: This feature provides visual agent system design, real-time testing, debugging, and one-click deployment, suited for prototyping and MVPs.

- **AG2 Marketplace**: Provides a place to share, monetize agents, discover pre-built solution templates, and connect with other builders.

- **Scaling Tools**: Includes guides for deployment, analytics, cost optimization, team collaboration features, and enterprise-ready security controls.

- **Community and Documentation**: They encourage connecting through GitHub and Discord and have regular community calls and events planned.

This comprehensive platform seems to aim at both individual developers and enterprise teams looking to deploy multi-agent systems effectively and collaboratively.

PydanticAI is a newer framework that brings powerful features for working with LLMs. Although it doesn't yet have a collection of pre-built tools like other frameworks, it offers useful capabilities such as dependency injection. This feature allows you to inject a "Context" into tools, which can help pass parameters or manage state without relying on LLMs. Though it's still evolving, you can easily integrate PydanticAI tools into AG2 to boost agent capabilities, particularly for tasks that involve structured data and context-driven logic.

Installation

To get PydanticAI tools working with AG2, install the necessary dependencies:

pip install ag2[openai,interop-pydantic-ai]

Imports

Import necessary modules and tools. - BaseModel: Used to define data structures for tool inputs and outputs. - RunContext: Provides context during the execution of tools. - PydanticAITool: Represents a tool in the PydanticAI framework. - AssistantAgent and UserProxyAgent: Agents that facilitate communication in the AG2 framework. - Interoperability: This module acts as a bridge, making it easier to integrate PydanticAI tools with AG2’s architecture.

import os
from typing import Optional

from pydantic import BaseModel
from pydantic_ai import RunContext
from pydantic_ai.tools import Tool as PydanticAITool

from autogen import AssistantAgent, UserProxyAgent, LLMConfig
from autogen.interop import Interoperability

Agent Configuration

Configure the agents for the interaction. - config_list defines the LLM configurations, including the model and API key. - UserProxyAgent simulates user inputs without requiring actual human interaction (set to NEVER). - AssistantAgent represents the AI agent, configured with the LLM settings.

llm_config = LLMConfig(api_type="openai", model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
)

with llm_config:
    chatbot = AssistantAgent(name="chatbot")

Tool Integration

To integrate a PydanticAI tool into AG2:

  • First, define a Player model using BaseModel to structure the input data.
  • Use RunContext to inject dependencies (like the Player instance) securely into the tool.
  • The get_player function defines the tool’s functionality, retrieving injected data through ctx.deps.
  • Then, convert the tool into an AG2-compatible format with Interoperability.
  • Register the tool for execution and interaction with both the user_proxy and chatbot.
class Player(BaseModel):
    name: str
    age: int


def get_player(ctx: RunContext[Player], additional_info: Optional[str] = None) -> str:  # type: ignore[valid-type]
    """Get the player's name.

    Args:
        additional_info: Additional information which can be used.
    """
    return f"Name: {ctx.deps.name}, Age: {ctx.deps.age}, Additional info: {additional_info}"  # type: ignore[attr-defined]


interop = Interoperability()
pydantic_ai_tool = PydanticAITool(get_player, takes_ctx=True)

# player will be injected as a dependency
player = Player(name="Luka", age=25)
ag2_tool = interop.convert_tool(tool=pydantic_ai_tool, type="pydanticai", deps=player)

ag2_tool.register_for_execution(user_proxy)
ag2_tool.register_for_llm(chatbot)

Initiating the chat

Now that everything is set up, you can initiate a chat between the UserProxyAgent and the AssistantAgent:

  • The user_proxy sends a message to the chatbot.
  • The user requests player information, and includes "goal keeper" as additional context.
  • The Player data is securely injected into the tool, and the chatbot can access and use it during the chat.
user_proxy.initiate_chat(
    recipient=chatbot, message="Get player, for additional information use 'goal keeper'", max_turns=3
)

Output

User (to chatbot):

Get player, for additional information use 'goal keeper'

--------------------------------------------------------------------------------
chatbot (to User):

***** Suggested tool call (call_lPXIohFiJfnjmgwDnNFPQCzc): get_player *****
Arguments:
{"additional_info":"goal keeper"}
***************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION get_player...
User (to chatbot):

***** Response from calling tool (call_lPXIohFiJfnjmgwDnNFPQCzc) *****
Name: Luka, Age: 25, Additional info: goal keeper
**********************************************************************

--------------------------------------------------------------------------------
chatbot (to User):

The player's name is Luka, who is a 25-year-old goalkeeper. TERMINATE

LangChain Tools Integration

ReasoningAgent - Tree of Thoughts with Beam Search in AG2

TL;DR: * We introduce ReasoningAgent, an AG2 agent that implements tree-of-thought reasoning with beam search to solve complex problems. * ReasoningAgent explores multiple reasoning paths in parallel and uses a grader agent to evaluate and select the most promising paths. * The exploration trajectory and thought tree can be saved locally for further analysis. These logs can even be saved as SFT dataset and preference dataset for DPO and PPO training.

Tree of Thoughts

Introduction

Large language models (LLMs) have shown impressive capabilities in various tasks, but they can still struggle with complex reasoning problems that require exploring multiple solution paths. To address this limitation, we introduce ReasoningAgent, an AG2 agent that implements tree-of-thought reasoning with beam search.

The key idea behind ReasoningAgent is to: 1. Generate multiple possible reasoning steps at each point 2. Evaluate these steps using a grader agent 3. Keep track of the most promising paths using beam search 4. Continue exploring those paths while pruning less promising ones

This approach allows the agent to systematically explore different reasoning strategies while managing computational resources efficiently.

Agentic testing for prompt leakage security

Prompt leakage social img

Introduction

As Large Language Models (LLMs) become increasingly integrated into production applications, ensuring their security has never been more crucial. One of the most pressing security concerns for these models is prompt injection, specifically prompt leakage.

LLMs often rely on system prompts (also known as system messages), which are internal instructions or guidelines that help shape their behavior and responses. These prompts can sometimes contain sensitive information, such as confidential details or internal logic, that should never be exposed to external users. However, with careful probing and targeted attacks, there is a risk that this sensitive information can be unintentionally revealed.

To address this issue, we have developed the Prompt Leakage Probing Framework, a tool designed to probe LLM agents for potential prompt leakage vulnerabilities. This framework serves as a proof of concept (PoC) for creating and testing various scenarios to evaluate how easily system prompts can be exposed. By automating the detection of such vulnerabilities, we aim to provide a powerful tool for testing the security of LLMs in real-world applications.

Introducing CaptainAgent for Adaptive Team Building

TL;DR - We introduce CaptainAgent, an agent equipped with the capability to adaptively assemble a team of agents through retrieval-selection-generation process to handle complex tasks via the nested chat conversation pattern in AG2. - CaptainAgent supports all types of ConversableAgents implemented in AG2.

Illustration of how CaptainAgent build a team

Introduction

Given an ad-hoc task, dynamically assembling a group of agents capable of effectively solving the problem is a complex challenge. In many cases, we manually design and select the agents involved. In this blog, we introduce CaptainAgent, an intelligent agent that can autonomously assemble a team of agents tailored to meet diverse and complex task requirements. CaptainAgent iterates over the following two steps until the problem is successfully solved. - (Step 1) CaptainAgent will break down the task, recommend several roles needed for each subtask, and then create a team of agents accordingly. Each agent in the team is either generated from scratch or retrieved and selected from an agent library if provided. Each of them will also be equipped with predefined tools retrieved from a tool library if provided. Building workflow - (Step 2) For each subtask, the corresponding team of agents will jointly solve it. Once it's done, a summarization and reflection step will be triggered to generate a report based on the multi-agent conversation history. Based on the report, CaptainAgent will decide whether to adjust the subtasks and corresponding team (go to Step 1) or to terminate and output the results. Building workflow

The design of CaptainAgent allows it to leverage agents and tools from a pre-specified agent library and tool library. In the following section, we demonstrate how to use CaptainAgent with or without the provided library.

AgentOps, the Best Tool for AutoGen Agent Observability

AgentOps and AutoGen

TL;DR

  • AutoGen® offers detailed multi-agent observability with AgentOps.
  • AgentOps offers the best experience for developers building with AutoGen in just two lines of code.
  • Enterprises can now trust AutoGen in production with detailed monitoring and logging from AgentOps.

AutoGen is excited to announce an integration with AgentOps, the industry leader in agent observability and compliance. Back in February, Bloomberg declared 2024 the year of AI Agents. And it's true! We've seen AI transform from simplistic chatbots to autonomously making decisions and completing tasks on a user's behalf.

However, as with most new technologies, companies and engineering teams can be slow to develop processes and best practices. One part of the agent workflow we're betting on is the importance of observability. Letting your agents run wild might work for a hobby project, but if you're building enterprise-grade agents for production, it's crucial to understand where your agents are succeeding and failing. Observability isn't just an option; it's a requirement.

As agents evolve into even more powerful and complex tools, you should view them increasingly as tools designed to augment your team's capabilities. Agents will take on more prominent roles and responsibilities, take action, and provide immense value. However, this means you must monitor your agents the same way a good manager maintains visibility over their personnel. AgentOps offers developers observability for debugging and detecting failures. It provides the tools to monitor all the key metrics your agents use in one easy-to-read dashboard. Monitoring is more than just a “nice to have”; it's a critical component for any team looking to build and scale AI agents.

AgentEval: A Developer Tool to Assess Utility of LLM-powered Applications

Fig.1: An AgentEval framework with verification step

Fig.1 illustrates the general flow of AgentEval with verification step

TL;DR: * As a developer, how can you assess the utility and effectiveness of an LLM-powered application in helping end users with their tasks? * To shed light on the question above, we previously introduced AgentEval — a framework to assess the multi-dimensional utility of any LLM-powered application crafted to assist users in specific tasks. We have now embedded it as part of the AutoGen library to ease developer adoption. * Here, we introduce an updated version of AgentEval that includes a verification process to estimate the robustness of the QuantifierAgent. More details can be found in this paper.

Introduction

Previously introduced AgentEval is a comprehensive framework designed to bridge the gap in assessing the utility of LLM-powered applications. It leverages recent advancements in LLMs to offer a scalable and cost-effective alternative to traditional human evaluations. The framework comprises three main agents: CriticAgent, QuantifierAgent, and VerifierAgent, each playing a crucial role in assessing the task utility of an application.

AutoDefense - Defend against jailbreak attacks with AutoGen

architecture

TL;DR

  • We propose AutoDefense, a multi-agent defense framework using AutoGen to protect LLMs from jailbreak attacks.
  • AutoDefense employs a response-filtering mechanism with specialized LLM agents collaborating to analyze potentially harmful responses.
  • Experiments show our three-agents (consisting of an intention analyzer, a prompt analyzer, and a judge) defense agency with LLaMA-2-13B effectively reduces jailbreak attack success rate while maintaining low false positives on normal user requests.

StateFlow - Build State-Driven Workflows with Customized Speaker Selection in GroupChat

TL;DR: Introduce Stateflow, a task-solving paradigm that conceptualizes complex task-solving processes backed by LLMs as state machines. Introduce how to use GroupChat to realize such an idea with a customized speaker selection function.

Introduction

It is a notable trend to use Large Language Models (LLMs) to tackle complex tasks, e.g., tasks that require a sequence of actions and dynamic interaction with tools and external environments. In this paper, we propose StateFlow, a novel LLM-based task-solving paradigm that conceptualizes complex task-solving processes as state machines. In StateFlow, we distinguish between "process grounding” (via state and state transitions) and "sub-task solving” (through actions within a state), enhancing control and interpretability of the task-solving procedure. A state represents the status of a running process. The transitions between states are controlled by heuristic rules or decisions made by the LLM, allowing for a dynamic and adaptive progression. Upon entering a state, a series of actions is executed, involving not only calling LLMs guided by different prompts, but also the utilization of external tools as needed.

AgentOptimizer - An Agentic Way to Train Your LLM Agent

Overall structure of AgentOptimizer

TL;DR: Introducing AgentOptimizer, a new class for training LLM agents in the era of LLMs as a service. AgentOptimizer is able to prompt LLMs to iteratively optimize function/skills of AutoGen agents according to the historical conversation and performance.

More information could be found in:

Paper: https://arxiv.org/abs/2402.11359.

Notebook: https://github.com/ag2ai/ag2/blob/main/notebook/agentchat_agentoptimizer.ipynb.

Introduction

In the traditional ML pipeline, we train a model by updating its weights according to the loss on the training set, while in the era of LLM agents, how should we train an agent? Here, we take an initial step towards the agent training. Inspired by the function calling capabilities provided by OpenAI, we draw an analogy between model weights and agent functions/skills, and update an agent’s functions/skills based on its historical performance on a training set. Specifically, we propose to use the function calling capabilities to formulate the actions that optimize the agents’ functions as a set of function calls, to support iteratively adding, revising, and removing existing functions. We also include two strategies, roll-back, and early-stop, to streamline the training process to overcome the performance-decreasing problem when training. As an agentic way of training an agent, our approach helps enhance the agents’ abilities without requiring access to the LLM's weights.