Skip to content

Blog#

AutoGen Studio: Interactively Explore Multi-Agent Workflows

AutoGen Studio Playground View: Solving a task with multiple agents that generate a pdf document with images.

AutoGen Studio: Solving a task with multiple agents that generate a pdf document with images.

TL;DR

To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Studio, an interface powered by AutoGen. It allows you to:

  • Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
  • Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken).
  • Explicitly add skills to your agents and accomplish more tasks.
  • Publish your sessions to a local gallery.

See the official AutoGen Studio documentation for more details.

AutoGen Studio is open source code here, and can be installed via pip. Give it a try!

pip install autogenstudio

Agent AutoBuild - Automatically Building Multi-agent Systems

Overall structure of AutoBuild

TL;DR: Introducing AutoBuild, building multi-agent system automatically, fast, and easily for complex tasks with minimal user prompt required, powered by a new designed class AgentBuilder. AgentBuilder also supports open-source LLMs by leveraging vLLM and FastChat. Checkout example notebooks and source code for reference:

Introduction

In this blog, we introduce AutoBuild, a pipeline that can automatically build multi-agent systems for complex tasks. Specifically, we design a new class called AgentBuilder, which will complete the generation of participant expert agents and the construction of group chat automatically after the user provides descriptions of a building task and an execution task.

AgentBuilder supports open-source models on Hugging Face powered by vLLM and FastChat. Once the user chooses to use open-source LLM, AgentBuilder will set up an endpoint server automatically without any user participation.

EcoAssistant - Using LLM Assistants More Accurately and Affordably

system

TL;DR: * Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably. * We show how to let the LLM assistant agent leverage external API to solve user query. * We show how to reduce the cost of using GPT models via Assistant Hierarchy. * We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.

EcoAssistant

In this blog, we introduce the EcoAssistant, a system built upon AutoGen with the goal of solving user queries more accurately and affordably.

MathChat - An Conversational Framework to Solve Math Problems

MathChat WorkFlow TL;DR:

  • We introduce MathChat, a conversational framework leveraging Large Language Models (LLMs), specifically GPT-4, to solve advanced mathematical problems.
  • MathChat improves LLM's performance on challenging math problem-solving, outperforming basic prompting and other strategies by about 6%. The improvement was especially notable in the Algebra category, with a 15% increase in accuracy.
  • Despite the advancement, GPT-4 still struggles to solve very challenging math problems, even with effective prompting strategies. Further improvements are needed, such as the development of more specific assistant models or the integration of new tools and prompts.

Achieve More, Pay Less - Use GPT-4 Smartly

An adaptive way of using GPT-3.5 and GPT-4 outperforms GPT-4 in both coding success rate and inference cost

TL;DR:

  • A case study using the HumanEval benchmark shows that an adaptive way of using multiple GPT models can achieve both much higher accuracy (from 68% to 90%) and lower inference cost (by 18%) than using GPT-4 for coding.

GPT-4 is a big upgrade of foundation model capability, e.g., in code and math, accompanied by a much higher (more than 10x) price per token to use over GPT-3.5-Turbo. On a code completion benchmark, HumanEval, developed by OpenAI, GPT-4 can successfully solve 68% tasks while GPT-3.5-Turbo does 46%. It is possible to increase the success rate of GPT-4 further by generating multiple responses or making multiple calls. However, that will further increase the cost, which is already nearly 20 times of using GPT-3.5-Turbo and with more restricted API call rate limit. Can we achieve more with less?

In this blog post, we will explore a creative, adaptive way of using GPT models which leads to a big leap forward.

Does Model and Inference Parameter Matter in LLM Applications? - A Case Study for MATH

level 2 algebra

TL;DR: * Just by tuning the inference parameters like model, number of responses, temperature etc. without changing any model weights or prompt, the baseline accuracy of untuned gpt-4 can be improved by 20% in high school math competition problems. * For easy problems, the tuned gpt-3.5-turbo model vastly outperformed untuned gpt-4 in accuracy (e.g., 90% vs. 70%) and cost efficiency. For hard problems, the tuned gpt-4 is much more accurate (e.g., 35% vs. 20%) and less expensive than untuned gpt-4. * AutoGen can help with model selection, parameter tuning, and cost-saving in LLM applications.