Skip to content

Blog#

All About Agent Descriptions

TL;DR

AutoGen 0.2.2 introduces a description field to ConversableAgent (and all subclasses), and changes GroupChat so that it uses agent descriptions rather than system_messages when choosing which agents should speak next.

This is expected to simplify GroupChat’s job, improve orchestration, and make it easier to implement new GroupChat or GroupChat-like alternatives.

If you are a developer, and things were already working well for you, no action is needed -- backward compatibility is ensured because the description field defaults to the system_message when no description is provided.

However, if you were struggling with getting GroupChat to work, you can now try updating the description field.

AgentOptimizer - An Agentic Way to Train Your LLM Agent

Overall structure of AgentOptimizer

TL;DR: Introducing AgentOptimizer, a new class for training LLM agents in the era of LLMs as a service. AgentOptimizer is able to prompt LLMs to iteratively optimize function/skills of AutoGen agents according to the historical conversation and performance.

More information could be found in:

Paper: https://arxiv.org/abs/2402.11359.

Notebook: https://github.com/ag2ai/ag2/blob/main/notebook/agentchat_agentoptimizer.ipynb.

Introduction

In the traditional ML pipeline, we train a model by updating its weights according to the loss on the training set, while in the era of LLM agents, how should we train an agent? Here, we take an initial step towards the agent training. Inspired by the function calling capabilities provided by OpenAI, we draw an analogy between model weights and agent functions/skills, and update an agent’s functions/skills based on its historical performance on a training set. Specifically, we propose to use the function calling capabilities to formulate the actions that optimize the agents’ functions as a set of function calls, to support iteratively adding, revising, and removing existing functions. We also include two strategies, roll-back, and early-stop, to streamline the training process to overcome the performance-decreasing problem when training. As an agentic way of training an agent, our approach helps enhance the agents’ abilities without requiring access to the LLM's weights.

AutoGen Studio: Interactively Explore Multi-Agent Workflows

AutoGen Studio Playground View: Solving a task with multiple agents that generate a pdf document with images.

AutoGen Studio: Solving a task with multiple agents that generate a pdf document with images.

TL;DR

To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Studio, an interface powered by AutoGen. It allows you to:

  • Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
  • Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken).
  • Explicitly add skills to your agents and accomplish more tasks.
  • Publish your sessions to a local gallery.

See the official AutoGen Studio documentation here for more details.

AutoGen Studio is open source code here, and can be installed via pip. Give it a try!

pip install autogenstudio

Agent AutoBuild - Automatically Building Multi-agent Systems

Overall structure of AutoBuild

TL;DR: Introducing AutoBuild, building multi-agent system automatically, fast, and easily for complex tasks with minimal user prompt required, powered by a new designed class AgentBuilder. AgentBuilder also supports open-source LLMs by leveraging vLLM and FastChat. Checkout example notebooks and source code for reference:

Introduction

In this blog, we introduce AutoBuild, a pipeline that can automatically build multi-agent systems for complex tasks. Specifically, we design a new class called AgentBuilder, which will complete the generation of participant expert agents and the construction of group chat automatically after the user provides descriptions of a building task and an execution task.

AgentBuilder supports open-source models on Hugging Face powered by vLLM and FastChat. Once the user chooses to use open-source LLM, AgentBuilder will set up an endpoint server automatically without any user participation.

How to Assess Utility of LLM-powered Applications?

Fig.1: A verification framework

Fig.1 illustrates the general flow of AgentEval

TL;DR: * As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks? * To shed light on the question above, we introduce AgentEval — the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks. AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria. * We demonstrate how AgentEval work using math problems dataset as an example in the following notebook. Any feedback would be useful for future development. Please contact us on our Discord.

Introduction

AutoGen aims to simplify the development of LLM-powered multi-agent systems for various applications, ultimately making end users' lives easier by assisting with their tasks. Next, we all yearn to understand how our developed systems perform, their utility for users, and, perhaps most crucially, how we can enhance them. Directly evaluating multi-agent systems poses challenges as current approaches predominantly rely on success metrics – essentially, whether the agent accomplishes tasks. However, comprehending user interaction with a system involves far more than success alone. Take math problems, for instance; it's not merely about the agent solving the problem. Equally significant is its ability to convey solutions based on various criteria, including completeness, conciseness, and the clarity of the provided explanation. Furthermore, success isn't always clearly defined for every task.

Rapid advances in LLMs and multi-agent systems have brought forth many emerging capabilities that we're keen on translating into tangible utilities for end users. We introduce the first version of AgentEval framework - a tool crafted to empower developers in swiftly gauging the utility of LLM-powered applications designed to help end users accomplish the desired task.

Fig.2: An overview of the tasks taxonomy

Fig. 2 provides an overview of the tasks taxonomy

Let's first look into an overview of the suggested task taxonomy that a multi-agent system can be designed for. In general, the tasks can be split into two types, where: * Success is not clearly defined - refer to instances when users utilize a system in an assistive manner, seeking suggestions rather than expecting the system to solve the task. For example, a user might request the system to generate an email. In many cases, this generated content serves as a template that the user will later edit. However, defining success precisely for such tasks is relatively complex. * Success is clearly defined - refer to instances where we can clearly define whether a system solved the task or not. Consider agents that assist in accomplishing household tasks, where the definition of success is clear and measurable. This category can be further divided into two separate subcategories: * The optimal solution exits - these are tasks where only one solution is possible. For example, if you ask your assistant to turn on the light, the success of this task is clearly defined, and there is only one way to accomplish it. * Multiple solutions exist - increasingly, we observe situations where multiple trajectories of agent behavior can lead to either success or failure. In such cases, it is crucial to differentiate between the various successful and unsuccessful trajectories. For example, when you ask the agent to suggest you a food recipe or tell you a joke.

In our AgentEval framework, we are currently focusing on tasks where Success is clearly defined. Next, we will introduce the suggested framework.

EcoAssistant - Using LLM Assistants More Accurately and Affordably

system

TL;DR: * Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably. * We show how to let the LLM assistant agent leverage external API to solve user query. * We show how to reduce the cost of using GPT models via Assistant Hierarchy. * We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.

EcoAssistant

In this blog, we introduce the EcoAssistant, a system built upon AutoGen with the goal of solving user queries more accurately and affordably.

Multimodal with GPT-4V and LLaVA

LMM Teaser

In Brief: * Introducing the Multimodal Conversable Agent and the LLaVA Agent to enhance LMM functionalities. * Users can input text and images simultaneously using the <img img_path> tag to specify image loading. * Demonstrated through the GPT-4V notebook. * Demonstrated through the LLaVA notebook.

Introduction

Large multimodal models (LMMs) augment large language models (LLMs) with the ability to process multi-sensory data.

This blog post and the latest AutoGen update concentrate on visual comprehension. Users can input images, pose questions about them, and receive text-based responses from these LMMs. We support the gpt-4-vision-preview model from OpenAI and LLaVA model from Microsoft now.

Here, we emphasize the Multimodal Conversable Agent and the LLaVA Agent due to their growing popularity. GPT-4V represents the forefront in image comprehension, while LLaVA is an efficient model, fine-tuned from LLama-2.

AutoGen's Teachable Agents

Teachable Agent Architecture

TL;DR:

  • We introduce Teachable Agents so that users can teach their LLM-based assistants new facts, preferences, and skills.
  • We showcase examples of teachable agents learning and later recalling facts, preferences, and skills in subsequent chats.

Introduction

Conversational assistants based on LLMs can remember the current chat with the user, and can also demonstrate in-context learning of user teachings during the conversation. But the assistant's memories and learnings are lost once the chat is over, or when a single chat grows too long for the LLM to handle effectively. Then in subsequent chats the user is forced to repeat any necessary instructions over and over.

Teachability addresses these limitations by persisting user teachings across chat boundaries in long-term memory implemented as a vector database. Instead of copying all of memory into the context window, which would eat up valuable space, individual memories (called memos) are retrieved into context as needed. This allows the user to teach frequently used facts and skills to the teachable agent just once, and have it recall them in later chats.

Any instantiated agent that inherits from ConversableAgent can be made teachable by instantiating a Teachability object and calling its add_to_agent(agent) method. In order to make effective decisions about memo storage and retrieval, the Teachability object calls an instance of TextAnalyzerAgent (another AutoGen agent) to identify and reformulate text as needed for remembering facts, preferences, and skills. Note that this adds extra LLM calls involving a relatively small number of tokens, which can add a few seconds to the time a user waits for each response.

Retrieval-Augmented Generation (RAG) Applications with AutoGen

Last update: August 14, 2024; AutoGen version: v0.2.35

RAG Architecture

TL;DR: * We introduce RetrieveUserProxyAgent, RAG agents of AutoGen that allows retrieval-augmented generation, and its basic usage. * We showcase customizations of RAG agents, such as customizing the embedding function, the text split function and vector database. * We also showcase two advanced usage of RAG agents, integrating with group chat and building a Chat application with Gradio.