Agents in AutoGen
TL;DR
- AutoGen agents unify different agent definitions.
- When talking about multi vs. single agents, it is beneficial to clarify whether we refer to the interface or the architecture.
I often get asked two common questions:
- What’s an agent?
- What are the pros and cons of multi vs. single agent?
This blog collects my thoughts from several interviews and recent learnings.
What’s an agent?
There are many different types of definitions of agents. When building AutoGen, I was looking for the most generic notion that can incorporate all these different types of definitions. And to do that we really need to think about the minimal set of concepts that are needed.
In AutoGen, we think about the agent as an entity that can act on behalf of human intent. They can send messages, receive messages, respond to other agents after taking actions and interact with other agents. We think it’s a minimal set of capabilities that an agent needs to have underneath. They can have different types of backends to support them to perform actions and generate replies. Some of the agents can use AI models to generate replies. Some other agents can use functions underneath to generate tool-based replies and other agents can use human input as a way to reply to other agents. And you can also have agents that mix these different types of backends or have more complex agents that have internal conversations among multiple agents. But on the surface, other agents still perceive it as a single entity to communicate to.
With this definition, we can incorporate both very simple agents that can solve simple tasks with a single backend, but also we can have agents that are composed of multiple simpler agents. One can recursively build up more powerful agents. The agent concept in AutoGen can cover all these different complexities.
What are the pros and cons of multi vs. single agent?
This question can be asked in a variety of ways.
Why should I use multiple agents instead of a single agent?
Why think about multi-agents when we don’t have a strong single agent?
Does multi-agent increase the complexity, latency and cost?
When should I use multi-agents vs. single agent?
When we use the word ‘multi-agent’ and ‘single-agent’, I think there are at least two different dimensions we need to think about.
- Interface. This means, from the user’s point of view, do they interact with the system in a single interaction point or do they see explicitly multiple agents working and need to interact with multiple of them?
- Architecture. Are there multiple agents underneath running at the backend?
A particular system can have a single-agent interface and a multi-agent architecture, but the users don’t need to know that.
Interface
A single interaction point can make many applications’ user experience more straightforward. There are also cases where it is not the best solution. For example, when the application is about having multiple agents debate about a subject, the users need to see what each agent says. In that case, it’s beneficial for them to actually see the multi-agents’ behavior. Another example is the social simulation experiment: People also want to see the behavior of each agent.
Architecture
The multi-agent design of the architecture is easier to maintain, understand and extend than a single agent system. Even for the single agent based interface, a multi-agent implementation can potentially make the system more modular, and easier for developers to add or remove components of functionality. It’s very important to recognize that the multi-agent architecture is a good way to build a single agent. While not obvious, it has a root in the society of mind theory by Marvin Minsky in 1986. Starting from simple agents, one can compose and coordinate them effectively to exhibit a higher level of intelligence.
We don’t have a good single agent that can do everything we want, yet. And why is that? It could be because we haven’t figured out the right way of composing the multi-agent to build this powerful single agent. But firstly, we need to have the framework that allows easy experimentation of these different ways of combining models and agents. For example,
- Different conversation patterns like sequential chats, group chat, constrained group chat, customized group chat, nested chat and recursive composition of them.
- Different prompting and reasoning techniques such as reflection.
- Tool use and code execution, as well as their combination.
- Planning and task decomposition.
- Retrieve augmentation.
- Integrating multiple models, APIs, modalities and memories.
My own experience is that if people practice using multi-agents to solve a problem, they often reach a solution faster. I have high hopes that they can figure out a robust way of building a complex, multi-faceted single agent using this way. Otherwise there are too many possibilities to build this single agent. Without good modularity, it is prone to hitting a complexity limit while keeping the system easy to maintain and modify.
On the other hand, we don’t have to stop there. We can think about a multi-agent system as a way to multiply the power of a single agent. We can connect them with other agents to accomplish bigger goals.
Benefits of multi-agents
There are at least two types of applications that benefit from multi-agent systems.
- Single-agent interface. Developers often find that they need to extend the system with different capabilities, tools, etc. And if they implement that single agent interface with a multi-agent architecture, they can often increase the capability to handle more complex tasks or improve the quality of the response. One example is complex data analytics. It often requires agents of different roles to solve a task. Some agents are good at retrieving the data and presenting to others. Some other agents are good at running deep analytics and providing insights. We can also have agents which can critique and suggest more actions. Or agents that can do planning, and so on. Usually, to accomplish a complex task, one can build these agents with different roles.
An example of a real-world production use case:
If you don’t know about Chi Wang and Microsoft Research’s work, please check it out. I want to give a real world production use case for Skypoint AI platform client Tabor AI https://tabor.ai AI Copilot for Medicare brokers - selecting a health plan every year for seniors (65 million seniors have to do this every year in the US) is a cumbersome and frustrating task. This process took hours to complete by human research, now with AI agents 5 to 10 minutes without compromising on quality or accuracy of the results. It’s fun to see agents doing retail shopping etc. where accuracy is not that mission critical. AI in regulated industries like healthcare, public sector, financial services is a different beast, this is Skypoint AI platform (AIP) focus.
Tisson Mathew, CEO @ Skypoint
- Multi-agent interface. For example, a chess game needs to have at least two different players. A football game involves even more entities. Multi-agent debates and social simulations are good examples, too.
Cost of multi-agents
Very complex multi-agent systems with leading frontier models are expensive, but compared to having humans accomplish the same task they can be exponentially more affordable.
While not inexpensive to operate, our multi-agent powered venture analysis system at BetterFutureLabs is far more affordable and exponentially faster than human analysts performing a comparable depth of analysis.
Justin Trugman, Cofounder & Head of Technology at BetterFutureLabs
Will using multiple agents always increase the cost, latency, and chance of failures, compared to using a single agent? It depends on how the multi-agent system is designed, and surprisingly, the answer can, actually, be the opposite.
- Even if the performance of a single agent is good enough, you may also want to make this single agent teach some other relatively cheaper agent so that they can become better with low cost. EcoAssistant is a good example of combining GPT-4 and GPT-3.5 agents to reduce the cost while improving the performance even compared to using a single GPT-4 agent.
- A recent use case reports that sometimes using multi-agents with a cheap model can outperform a single agent with an expensive model:
Our research group at Tufts University continues to make important improvements in addressing the challenges students face when transitioning from undergraduate to graduate-level courses, particularly in the Doctor of Physical Therapy program at the School of Medicine. With the ongoing support from the Data Intensive Studies Center (DISC) and our collaboration with Chi Wang’s team at Microsoft, we are now leveraging StateFlow with Autogen to create even more effective assessments tailored to course content. This State-driven workflow approach complements our existing work using multiple agents in sequential chat, teachable agents, and round-robin style debate formats… By combining StateFlow with multiple agents it’s possible to maintain high-quality results/output while using more cost-effective language models (GPT 3.5). This cost savings, coupled with the increased relevance and accuracy of our results, has really demonstrated for us Autogen’s immense potential for developing efficient and scalable educational solutions that can be adapted to various contexts and budgets.
Benjamin D Stern, MS, DPT, Assistant Professor, Doctor of Physical Therapy Program, Tufts University School of Medicine
- AutoDefense demonstrates that using multi-agents reduces the risk of suffering from jailbreak attacks.
There are certainly tradeoffs to make. The large design space of multi-agents offers these tradeoffs and opens up new opportunities for optimization.
Over a year since the debut of Ask AT&T, the generative AI platform to which we’ve onboarded over 80,000 users, AT&T has been enhancing its capabilities by incorporating ‘AI Agents’. These agents, powered by the Autogen framework pioneered by Microsoft (https://docs.ag2.ai/blog/2023-12-01-AutoGenStudio/), are designed to tackle complicated workflows and tasks that traditional language models find challenging. To drive collaboration, AT&T is contributing back to the open-source project by introducing features that facilitate enhanced security and role-based access for various projects and data.
Andy Markus, Chief Data Officer at AT&T
Watch/read the interviews/articles
- Interview with Joanne Chen from Foundation Capital in AI in the Real World (Forbes article, YouTube).
- Interview with Arthur Holland Michel from The Economist in Today’s AI models are impressive. Teams of them will be formidable.
- Interview with Thomas Maybrier from Valory in AI Frontiers.
Do you find this note helpful? Would you like to share your thoughts, use cases, findings? Please join our Discord server for discussion.
Acknowledgements
This blogpost is revised based on feedback from Wael Karkoub, Mark Sze, Justin Trugman, Eric Zhu.