TL;DR:

  • Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably.
  • We show how to let the LLM assistant agent leverage external API to solve user query.
  • We show how to reduce the cost of using GPT models via Assistant Hierarchy.
  • We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.

EcoAssistant

In this blog, we introduce the EcoAssistant, a system built upon AutoGen with the goal of solving user queries more accurately and affordably.

Problem setup

Recently, users have been using conversational LLMs such as ChatGPT for various queries. Reports indicate that 23% of ChatGPT user queries are for knowledge extraction purposes. Many of these queries require knowledge that is external to the information stored within any pre-trained large language models (LLMs). These tasks can only be completed by generating code to fetch necessary information via external APIs that contain the requested information. In the table below, we show three types of user queries that we aim to address in this work.

DatasetAPIExample query
PlacesGoogle PlacesI’m looking for a 24-hour pharmacy in Montreal, can you find one for me?
WeatherWeather APIWhat is the current cloud coverage in Mumbai, India?
StockAlpha Vantage Stock APICan you give me the opening price of Microsoft for the month of January 2023?

Leveraging external APIs

To address these queries, we first build a two-agent system based on AutoGen, where the first agent is a LLM assistant agent (AssistantAgent in AutoGen) that is responsible for proposing and refining the code and the second agent is a code executor agent (UserProxyAgent in AutoGen) that would extract the generated code and execute it, forwarding the output back to the LLM assistant agent. A visualization of the two-agent system is shown below.

To instruct the assistant agent to leverage external APIs, we only need to add the API name/key dictionary at the beginning of the initial message. The template is shown below, where the red part is the information of APIs and black part is user query.

Importantly, we don’t want to reveal our real API key to the assistant agent for safety concerns. Therefore, we use a fake API key to replace the real API key in the initial message. In particular, we generate a random token (e.g., 181dbb37) for each API key and replace the real API key with the token in the initial message. Then, when the code executor execute the code, the fake API key would be automatically replaced by the real API key.

Solution Demonstration

In most practical scenarios, queries from users would appear sequentially over time. Our EcoAssistant leverages past success to help the LLM assistants address future queries via Solution Demonstration. Specifically, whenever a query is deemed successfully resolved by user feedback, we capture and store the query and the final generated code snippet. These query-code pairs are saved in a specialized vector database. When new queries appear, EcoAssistant retrieves the most similar query from the database, which is then appended with the associated code to the initial prompt for the new query, serving as a demonstration. The new template of initial message is shown below, where the blue part corresponds to the solution demonstration.

We found that this utilization of past successful query-code pairs improves the query resolution process with fewer iterations and enhances the system’s performance.

Assistant Hierarchy

LLMs usually have different prices and performance, for example, GPT-3.5-turbo is much cheaper than GPT-4 but also less accurate. Thus, we propose the Assistant Hierarchy to reduce the cost of using LLMs. The core idea is that we use the cheaper LLMs first and only use the more expensive LLMs when necessary. By this way, we are able to reduce the reliance on expensive LLMs and thus reduce the cost. In particular, given multiple LLMs, we initiate one assistant agent for each and start the conversation with the most cost-effective LLM assistant. If the conversation between the current LLM assistant and the code executor concludes without successfully resolving the query, EcoAssistant would then restart the conversation with the next more expensive LLM assistant in the hierarchy. We found that this strategy significantly reduces costs while still effectively addressing queries.

A Synergistic Effect

We found that the Assistant Hierarchy and Solution Demonstration of EcoAssistant have a synergistic effect. Because the query-code database is shared by all LLM assistants, even without specialized design, the solution from more powerful LLM assistant (e.g., GPT-4) could be later retrieved to guide weaker LLM assistant (e.g., GPT-3.5-turbo). Such a synergistic effect further improves the performance and reduces the cost of EcoAssistant.

Experimental Results

We evaluate EcoAssistant on three datasets: Places, Weather, and Stock. When comparing it with a single GPT-4 assistant, we found that EcoAssistant achieves a higher success rate with a lower cost as shown in the figure below. For more details about the experimental results and other experiments, please refer to our paper.

Further reading

Please refer to our paper and codebase for more details about EcoAssistant.

If you find this blog useful, please consider citing:

@article{zhang2023ecoassistant,
  title={EcoAssistant: Using LLM Assistant More Affordably and Accurately},
  author={Zhang, Jieyu and Krishna, Ranjay and Awadallah, Ahmed H and Wang, Chi},
  journal={arXiv preprint arXiv:2310.03046},
  year={2023}
}