Skip to content

Tutorial#

ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning

Key Updates in this Release:

  1. Configuration Changes
  2. All reasoning parameters are now configured through a single reason_config dictionary
  3. Breaking Change: Parameters like max_depth, beam_size, and answer_approach have moved from constructor arguments into reason_config

  4. New Search Strategies

  5. Added Monte Carlo Tree Search (MCTS) as an alternative to Beam Search
  6. Introduced Language Agent Tree Search (LATS) - an enhancement to MCTS that incorporates reflection prior to the next round of simulation.

  7. Enhanced Features

  8. New forest_size parameter enables maintaining multiple independent reasoning trees
  9. Support for ground truth answers in prompts to generate training data for LLM fine-tuning

Tree of Thoughts

Introduction

In our previous post, we introduced the ReasoningAgent, which utilized Beam Search for systematic reasoning. Today, we include MCTS (Monte Carlo Tree Search) and Language Agent Tree Search (LATS) as alternative search strategies, which present advantages in different scenarios.

Our previous ReasoningAgent draws inspiration from OpenAI's 2023 paper, Let's Verify Step by Step, as well as the 2024 O1 feature. The landscape of contemporary research is rich, with notable works such as DeepSeek-R1, Macro-O1, and OpenR.

Agents in AutoGen

agents

TL;DR

  • AutoGen agents unify different agent definitions.
  • When talking about multi vs. single agents, it is beneficial to clarify whether we refer to the interface or the architecture.

I often get asked two common questions: 1. What's an agent? 1. What are the pros and cons of multi vs. single agent?

This blog collects my thoughts from several interviews and recent learnings.

What's an agent?

There are many different types of definitions of agents. When building AutoGen, I was looking for the most generic notion that can incorporate all these different types of definitions. And to do that we really need to think about the minimal set of concepts that are needed.

In AutoGen, we think about the agent as an entity that can act on behalf of human intent. They can send messages, receive messages, respond to other agents after taking actions and interact with other agents. We think it's a minimal set of capabilities that an agent needs to have underneath. They can have different types of backends to support them to perform actions and generate replies. Some of the agents can use AI models to generate replies. Some other agents can use functions underneath to generate tool-based replies and other agents can use human input as a way to reply to other agents. And you can also have agents that mix these different types of backends or have more complex agents that have internal conversations among multiple agents. But on the surface, other agents still perceive it as a single entity to communicate to.

With this definition, we can incorporate both very simple agents that can solve simple tasks with a single backend, but also we can have agents that are composed of multiple simpler agents. One can recursively build up more powerful agents. The agent concept in AutoGen can cover all these different complexities.

What's New in AutoGen?

autogen is loved

TL;DR

  • AutoGen has received tremendous interest and recognition.
  • AutoGen has many exciting new features and ongoing research.

Five months have passed since the initial spinoff of AutoGen from FLAML. What have we learned since then? What are the milestones achieved? What's next?

Background

AutoGen was motivated by two big questions:

  • What are future AI applications like?
  • How do we empower every developer to build them?

Last year, I worked with my colleagues and collaborators from Penn State University and University of Washington, on a new multi-agent framework, to enable the next generation of applications powered by large language models. We have been building AutoGen, as a programming framework for agentic AI, just like PyTorch for deep learning. We developed AutoGen in an open source project FLAML: a fast library for AutoML and tuning. After a few studies like EcoOptiGen and MathChat, in August, we published a technical report about the multi-agent framework. In October, we moved AutoGen from FLAML to a standalone repo on GitHub, and published an updated technical report.

FSM Group Chat -- User-specified agent transitions

FSM Group Chat

Finite State Machine (FSM) Group Chat allows the user to constrain agent transitions.

TL;DR

Recently, FSM Group Chat is released that allows the user to input a transition graph to constrain agent transitions. This is useful as the number of agents increases because the number of transition pairs (N choose 2 combinations) increases exponentially increasing the risk of sub-optimal transitions, which leads to wastage of tokens and/or poor outcomes.

AutoGen with Custom Models: Empowering Users to Use Their Own Inference Mechanism

TL;DR

AutoGen now supports custom models! This feature empowers users to define and load their own models, allowing for a more flexible and personalized inference mechanism. By adhering to a specific protocol, you can integrate your custom model for use with AutoGen and respond to prompts any way needed by using any model/API call/hardcoded response you want.

NOTE: Depending on what model you use, you may need to play with the default prompts of the Agent's

Code execution is now by default inside docker container

TL;DR

AutoGen 0.2.8 enhances operational safety by making 'code execution inside a Docker container' the default setting, focusing on informing users about its operations and empowering them to make informed decisions regarding code execution.

The new release introduces a breaking change where the use_docker argument is set to True by default in code executing agents. This change underscores our commitment to prioritizing security and safety in AutoGen.

All About Agent Descriptions

TL;DR

AutoGen 0.2.2 introduces a description field to ConversableAgent (and all subclasses), and changes GroupChat so that it uses agent descriptions rather than system_messages when choosing which agents should speak next.

This is expected to simplify GroupChat’s job, improve orchestration, and make it easier to implement new GroupChat or GroupChat-like alternatives.

If you are a developer, and things were already working well for you, no action is needed -- backward compatibility is ensured because the description field defaults to the system_message when no description is provided.

However, if you were struggling with getting GroupChat to work, you can now try updating the description field.

AutoGen's Teachable Agents

Teachable Agent Architecture

TL;DR:

  • We introduce Teachable Agents so that users can teach their LLM-based assistants new facts, preferences, and skills.
  • We showcase examples of teachable agents learning and later recalling facts, preferences, and skills in subsequent chats.

Introduction

Conversational assistants based on LLMs can remember the current chat with the user, and can also demonstrate in-context learning of user teachings during the conversation. But the assistant's memories and learnings are lost once the chat is over, or when a single chat grows too long for the LLM to handle effectively. Then in subsequent chats the user is forced to repeat any necessary instructions over and over.

Teachability addresses these limitations by persisting user teachings across chat boundaries in long-term memory implemented as a vector database. Instead of copying all of memory into the context window, which would eat up valuable space, individual memories (called memos) are retrieved into context as needed. This allows the user to teach frequently used facts and skills to the teachable agent just once, and have it recall them in later chats.

Any instantiated agent that inherits from ConversableAgent can be made teachable by instantiating a Teachability object and calling its add_to_agent(agent) method. In order to make effective decisions about memo storage and retrieval, the Teachability object calls an instance of TextAnalyzerAgent (another AutoGen agent) to identify and reformulate text as needed for remembering facts, preferences, and skills. Note that this adds extra LLM calls involving a relatively small number of tokens, which can add a few seconds to the time a user waits for each response.