TL;DR: * We introduce a new ability for AG2 agents, Graph RAG with FalkorDB, providing the power of knowledge graphs * Structured outputs, using OpenAI models, provide strict adherence to data models to improve reliability and agentic flows * Nested chats are now available with a Swarm
Typically, RAG uses vector databases, which store information as embeddings, mathematical representations of data points. When a query is received, it's also converted into an embedding, and the vector database retrieves the most similar embeddings based on distance metrics.
Graph-based RAG, on the other hand, leverages graph databases, which represent knowledge as a network of interconnected entities and relationships. When a query is received, Graph RAG traverses the graph to find relevant information based on the query's structure and semantics.
TL;DR: * Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably. * We show how to let the LLM assistant agent leverage external API to solve user query. * We show how to reduce the cost of using GPT models via Assistant Hierarchy. * We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.
Last update: August 14, 2024; AutoGen version: v0.2.35
TL;DR: * We introduce RetrieveUserProxyAgent, RAG agents of AutoGen that allows retrieval-augmented generation, and its basic usage. * We showcase customizations of RAG agents, such as customizing the embedding function, the text split function and vector database. * We also showcase two advanced usage of RAG agents, integrating with group chat and building a Chat application with Gradio.
Retrieval augmentation has emerged as a practical and effective approach for mitigating the intrinsic limitations of LLMs by incorporating external documents. In this blog post, we introduce RAG agents of AutoGen that allows retrieval-augmented generation. The system consists of two agents: a Retrieval-augmented User Proxy agent, called RetrieveUserProxyAgent, and an Assistant agent, called RetrieveAssistantAgent; RetrieveUserProxyAgent is extended from built-in agents from AutoGen, while RetrieveAssistantAgent can be any conversable agent with LLM configured. The overall architecture of the RAG agents is shown in the figure above.
To use Retrieval-augmented Chat, one needs to initialize two agents including Retrieval-augmented User Proxy and Retrieval-augmented Assistant. Initializing the Retrieval-Augmented User Proxy necessitates specifying a path to the document collection. Subsequently, the Retrieval-Augmented User Proxy can download the documents, segment them into chunks of a specific size, compute embeddings, and store them in a vector database. Once a chat is initiated, the agents collaboratively engage in code generation or question-answering adhering to the procedures outlined below: 1. The Retrieval-Augmented User Proxy retrieves document chunks based on the embedding similarity, and sends them along with the question to the Retrieval-Augmented Assistant. 2. The Retrieval-Augmented Assistant employs an LLM to generate code or text as answers based on the question and context provided. If the LLM is unable to produce a satisfactory response, it is instructed to reply with “Update Context” to the Retrieval-Augmented User Proxy. 3. If a response includes code blocks, the Retrieval-Augmented User Proxy executes the code and sends the output as feedback. If there are no code blocks or instructions to update the context, it terminates the conversation. Otherwise, it updates the context and forwards the question along with the new context to the Retrieval-Augmented Assistant. Note that if human input solicitation is enabled, individuals can proactively send any feedback, including Update Context”, to the Retrieval-Augmented Assistant. 4. If the Retrieval-Augmented Assistant receives “Update Context”, it requests the next most similar chunks of documents as new context from the Retrieval-Augmented User Proxy. Otherwise, it generates new code or text based on the feedback and chat history. If the LLM fails to generate an answer, it replies with “Update Context” again. This process can be repeated several times. The conversation terminates if no more documents are available for the context.