DocAgent
In the realm of AI and automation, handling documents and extracting information efficiently is of utmost importance. DocAgent
introduces an agentic solution to this problem. It handles document ingestion and query tasks seamlessly, and with natural language instructions, by leveraging an internal swarm of agents to streamline document processing and information retrieval.
Tip
DocAgent is new and keen to support your RAG needs. In the first iterations it has been kept simple and uses either local vector Chroma database (default) or an in-memory option (InMemoryQueryEngine).
DocAgent will continue to be developed to a production-ready standard and your feedback and contributions are most welcome!
Installation#
Install AG2 with the rag
extra to install the necessary packages for the DocAgent.
Capabilities#
The document agent can perform the following tasks: 1. Ingest documents from a local file or URL. Supported formats:
- PDF
- DOCX (DOCX, DOTX, DOCM, DOTM)
- XLSX
- PPTX (PPTX, POTX, PPSX, PPTM, POTM, PPSM)
- HTML
- ASCIIDOC (ADOC, ASCIIDOC, ASC)
- MD (MD, MARKDOWN)
- XML (XML, NXML)
- TXT
- JSON
- CSV
- IMAGE (BMP, JPG, JPEG, PNG, TIFF, TIF)
- Answer questions with RAG capability
Tip
DocAgent answers questions only related to ingested documents, even if it using an LLM it won't answer general knowledge questions.
Internal Swarm#
DocAgent
leverages a Swarm
of internal agents to handle the complex document processing tasks fluidly and efficiently.
Here's a breakdown of how the internal swarm chat is used within the DocAgent
:
Swarm Agents#
DocAgent
orchestrates the following swarm agents:
- Triage Agent: Decides what type of task to perform from user requests.
- Task Manager Agent: Manages the tasks and initiates actions.
- Data Ingestion Agent: Ingests the documents.
- Query Agent: Answers user questions based on ingested documents.
- Error Agent: If anything fails, the error agent will report the problem back.
- Summary Agent: Generates a summary of the completed tasks.
Workflow#
- Initialization: The
DocAgent
initializes the swarm agents and sets up the context variables. - Triage User Requests: The
Triage Agent
categorizes the tasks into ingestions and queries. - Task Management: The
Task Manager Agent
manages the tasks and ensures they are executed in the correct sequence. - Data Ingestion: The
Data Ingestion Agent
processes the documents. - Query Execution: The
Query Agent
answers the user's questions. - Summary Generation: The
Summary Agent
generates a summary of the completed tasks.
Parsing documents and web pages#
Internally, DocAgent uses Docling to convert documents into Markdown files, which are then using for ingestion into a data store (which is then used for querying).
For files (whether local files or URLs that point to files), DocAgent will download those files for Docling to convert. If it's a web page, DocAgent will use Selenium to browse to the page and extract the content.
Data Store and Query Engine#
DocAgent can currently use two different data stores and query engines.
Vector Store: Chroma database#
Chroma is a popular, open-source, vector database. DocAgent will take the Markdown and embed and store the data into the vector database.
This requires the OPENAI_API_KEY
environment variable as OpenAI's GPT 4o model is used to create the embeddings.
DocAgent will use LlamaIndex's vector store library to embed and query the Chroma database.
This is the default data store and query engine, so you do not need to create it and pass it to the DocAgent.
See the section on Collections below to understand how to use them effectively.
In-memory#
DocAgent can also store the Docling Markdown in memory when digesting documents.
When it comes time to query, DocAgent will inject the Markdown into the system message of an internal query agent and use an LLM to respond to the question.
Import notes about using the in-memory query engine:
- The full Markdown content, from the ingested documents, is put into the context window, so be cautious of putting too much content in as it may be too much for the LLM.
- The Markdown is put into the start of the system message to maximize the chance of utilizing the LLM's caching ability and reduce cost. However, adding more documents in subsequent messages will change the system message and the cache will not be effective, so try to ingest all the documents before querying.
- As it's in-memory, ingestion content will be lost when the agent is destroyed.
- LLMs can be better at answering some queries than the vector store as they are processing all of the context to answer the query. However, this does come at the cost of more token usage.
To use the in-memory query engine, you will need to create it and pass it to the DocAgent, see the following example:
from autogen.agents.experimental import DocAgent, InMemoryQueryEngine
# Example LLM Config
llm_config_list = {"config_list": {"api_type": "openai", "model": "gpt-4o-mini", "cache_seed": None}}
# Create our In-Memory Query Engine
inmemory_qe = InMemoryQueryEngine(llm_config=llm_config_list)
# Include the Query Engine when creating your DocAgent
with llm_config
doc_agent = DocAgent(
name="doc_agent",
query_engine=inmemory_qe,
)
...
Example#
Tip
The internal ingestation of documents requires an LLM and it will use OpenAI's GPT-4o. Please ensure you have an OPENAI_API_KEY
environment variable set.
Warning
This agent is currently in our experimental
namespace, indicating that we have tested the functionality but the agent's interface may change. Please use it with that in mind.
If you do find any bugs please log an issue in the AG2 repository.
In the following simple example we ask the DocAgent
to ingest a document and then provide a financial summary.
Note that the request is handled in natural language and the output will show the internal agents working together to understand, classify, ingest, and query.
from autogen import LLMConfig
from autogen.agents.experimental import DocAgent
llm_config = LLMConfig(api_type="openai", model="gpt-4o")
# Create our DocAgent
with llm_config:
document_agent = DocAgent()
# Update this path to suit your environment
response = document_agent.run(
"Can you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf and tell me the fiscal year 2024 financial summary?",
max_turns=1
)
# Iterate through the chat automatically with console output
response.process()
user (to Document_Agent):
Can you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf and tell me the fiscal year 2024 financial summary?
--------------------------------------------------------------------------------
_User (to chat_manager):
Can you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf and tell me the fiscal year 2024 financial summary?
--------------------------------------------------------------------------------
Next speaker: DocumentTriageAgent
DocumentTriageAgent (to chat_manager):
{"ingestions":[{"path_or_url":"../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"}],"queries":[{"query_type":"RAG_QUERY","query":"What is the fiscal year 2024 financial summary?"}]}
--------------------------------------------------------------------------------
Next speaker: TaskManagerAgent
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}
>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):
***** Suggested tool call (call_NG9mq8dtBEthy8YDsREjiRER): initiate_tasks *****
Arguments:
{"ingestions": [{"path_or_url": "../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"}], "queries": [{"query_type": "RAG_QUERY", "query": "What is the fiscal year 2024 financial summary?"}]}
*******************************************************************************
--------------------------------------------------------------------------------
Next speaker: _Swarm_Tool_Executor
>>>>>>>> EXECUTING FUNCTION initiate_tasks...
Call ID: call_NG9mq8dtBEthy8YDsREjiRER
Input arguments: {'ingestions': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'queries': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'context_variables': {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}}
initiate_tasks context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}
_Swarm_Tool_Executor (to chat_manager):
***** Response from calling tool (call_NG9mq8dtBEthy8YDsREjiRER) *****
Updated context variables with task decisions
**********************************************************************
--------------------------------------------------------------------------------
Next speaker: TaskManagerAgent
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):
***** Suggested tool call (call_I30PrRbDngJPmidOOKutGhsa): transfer_TaskManagerAgent_to_DoclingDocIngestAgent *****
Arguments:
{}
*******************************************************************************************************************
--------------------------------------------------------------------------------
Next speaker: _Swarm_Tool_Executor
>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_DoclingDocIngestAgent...
Call ID: call_I30PrRbDngJPmidOOKutGhsa
Input arguments: {}
_Swarm_Tool_Executor (to chat_manager):
***** Response from calling tool (call_I30PrRbDngJPmidOOKutGhsa) *****
Swarm agent --> DoclingDocIngestAgent
**********************************************************************
--------------------------------------------------------------------------------
Next speaker: DoclingDocIngestAgent
>>>>>>>> USING AUTO REPLY...
DoclingDocIngestAgent (to chat_manager):
***** Suggested tool call (call_zloTpZZrxiNcvyflE0AHSjAw): data_ingest_task *****
Arguments:
{}
*********************************************************************************
--------------------------------------------------------------------------------
Next speaker: _Swarm_Tool_Executor
>>>>>>>> EXECUTING FUNCTION data_ingest_task...
Call ID: call_zloTpZZrxiNcvyflE0AHSjAw
Input arguments: {'context_variables': {'CompletedTaskCount': 0, 'DocumentsToIngest': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}}
INFO:autogen.agents.experimental.document_agent.document_utils:Error when checking if ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf is a valid URL: Invalid URL.
INFO:autogen.agents.experimental.document_agent.document_utils:Detected file. Returning file path...
INFO:docling.document_converter:Going to convert document batch...
/home/vscode/.local/lib/python3.10/site-packages/docling/models/easyocr_model.py:58: UserWarning: Deprecated field. Better to set the `accelerator_options.device` in `pipeline_options`. When `use_gpu and accelerator_options.device == AcceleratorDevice.CUDA` the GPU is used to run EasyOCR. Otherwise, EasyOCR runs in CPU.
warnings.warn(
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.pipeline.base_pipeline:Processing document Toast_financial_report.pdf
INFO:docling.document_converter:Finished converting document Toast_financial_report.pdf in 16.33 sec.
INFO:autogen.agents.experimental.document_agent.parser_utils:Document converted in 16.33 seconds.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Collection docling-parsed-docs was created in the database.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Loading input doc: /workspaces/ag2/notebook/parsed_docs/Toast_financial_report.md
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Documents are loaded successfully.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:VectorDB index was created with input documents
docling ingest: {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
{'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
_Swarm_Tool_Executor (to chat_manager):
***** Response from calling tool (call_zloTpZZrxiNcvyflE0AHSjAw) *****
Data Ingestion Task Completed for ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf
**********************************************************************
--------------------------------------------------------------------------------
Next speaker: TaskManagerAgent
context_variables {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
context_variables {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):
***** Suggested tool call (call_jDHDy8MHHqvucS89X62yUpFE): transfer_TaskManagerAgent_to_QueryAgent *****
Arguments:
{}
********************************************************************************************************
--------------------------------------------------------------------------------
Next speaker: _Swarm_Tool_Executor
>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_QueryAgent...
Call ID: call_jDHDy8MHHqvucS89X62yUpFE
Input arguments: {}
_Swarm_Tool_Executor (to chat_manager):
***** Response from calling tool (call_jDHDy8MHHqvucS89X62yUpFE) *****
Swarm agent --> QueryAgent
**********************************************************************
--------------------------------------------------------------------------------
Next speaker: QueryAgent
>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):
***** Suggested tool call (call_YFLZPfxUvG0cBqWw0dnH2keA): execute_rag_query *****
Arguments:
{}
**********************************************************************************
--------------------------------------------------------------------------------
Next speaker: _Swarm_Tool_Executor
>>>>>>>> EXECUTING FUNCTION execute_rag_query...
Call ID: call_YFLZPfxUvG0cBqWw0dnH2keA
Input arguments: {'context_variables': {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}}
_Swarm_Tool_Executor (to chat_manager):
***** Response from calling tool (call_YFLZPfxUvG0cBqWw0dnH2keA) *****
For the fiscal year 2024, Toast, Inc. reported total assets of $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023. The total liabilities were $807 million, and stockholders' equity was $1,420 million. The company achieved total revenue of $3,622 million for the nine months ended September 30, 2024, with a gross profit of $857 million. Operating expenses totaled $873 million, resulting in a loss from operations of $16 million. The net loss for the period was $13 million, with basic and diluted loss per share of $0.02.
**********************************************************************
--------------------------------------------------------------------------------
Next speaker: QueryAgent
>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):
For the fiscal year 2024, Toast, Inc. reported the following financial summary:
- Total assets: $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023.
- Total liabilities: $807 million.
- Stockholders' equity: $1,420 million.
- Total revenue: $3,622 million for the nine months ended September 30, 2024.
- Gross profit: $857 million.
- Operating expenses: $873 million, leading to a loss from operations of $16 million.
- Net loss for the period: $13 million.
- Basic and diluted loss per share: $0.02.
--------------------------------------------------------------------------------
Next speaker: TaskManagerAgent
context_variables {'CompletedTaskCount': 2, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': [{'query': 'What is the fiscal year 2024 financial summary?', 'result': "For the fiscal year 2024, Toast, Inc. reported total assets of $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023. The total liabilities were $807 million, and stockholders' equity was $1,420 million. The company achieved total revenue of $3,622 million for the nine months ended September 30, 2024, with a gross profit of $857 million. Operating expenses totaled $873 million, resulting in a loss from operations of $16 million. The net loss for the period was $13 million, with basic and diluted loss per share of $0.02."}], 'TaskInitiated': True}
context_variables {'CompletedTaskCount': 2, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': [{'query': 'What is the fiscal year 2024 financial summary?', 'result': "For the fiscal year 2024, Toast, Inc. reported total assets of $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023. The total liabilities were $807 million, and stockholders' equity was $1,420 million. The company achieved total revenue of $3,622 million for the nine months ended September 30, 2024, with a gross profit of $857 million. Operating expenses totaled $873 million, resulting in a loss from operations of $16 million. The net loss for the period was $13 million, with basic and diluted loss per share of $0.02."}], 'TaskInitiated': True}
>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):
***** Suggested tool call (call_3XuWepUFg5FPt6C6JtzrxbFJ): transfer_TaskManagerAgent_to_SummaryAgent *****
Arguments:
{}
**********************************************************************************************************
--------------------------------------------------------------------------------
Next speaker: _Swarm_Tool_Executor
>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_SummaryAgent...
Call ID: call_3XuWepUFg5FPt6C6JtzrxbFJ
Input arguments: {}
_Swarm_Tool_Executor (to chat_manager):
***** Response from calling tool (call_3XuWepUFg5FPt6C6JtzrxbFJ) *****
Swarm agent --> SummaryAgent
**********************************************************************
--------------------------------------------------------------------------------
Next speaker: SummaryAgent
>>>>>>>> USING AUTO REPLY...
SummaryAgent (to chat_manager):
The fiscal year 2024 financial summary for Toast, Inc. is as follows:
- Total assets increased to $2,227 million by September 30, 2024, from $1,958 million at the end of 2023.
- The company's total liabilities stood at $807 million.
- Stockholders' equity was reported at $1,420 million.
- Toast, Inc. generated total revenue of $3,622 million for the nine months ending September 30, 2024.
- The company recorded a gross profit of $857 million.
- Operating expenses amounted to $873 million, resulting in an operating loss of $16 million.
- There was a net loss of $13 million for the period, with a basic and diluted loss per share of $0.02.
--------------------------------------------------------------------------------
Document_Agent (to user):
The fiscal year 2024 financial summary for Toast, Inc. is as follows:
- Total assets increased to $2,227 million by September 30, 2024, from $1,958 million at the end of 2023.
- The company's total liabilities stood at $807 million.
- Stockholders' equity was reported at $1,420 million.
- Toast, Inc. generated total revenue of $3,622 million for the nine months ending September 30, 2024.
- The company recorded a gross profit of $857 million.
- Operating expenses amounted to $873 million, resulting in an operating loss of $16 million.
- There was a net loss of $13 million for the period, with a basic and diluted loss per share of $0.02.
--------------------------------------------------------------------------------
DocAgent Performance#
How does it perform? See our DocAgent Performance page for the full breakdown on how it handles different tasks.
If you have tasks you'd like DocAgent to perform, please share them as an Issue on our GitHub repo.
Summary:
# | Task | Ingested | In-memory Query Engine | Chroma Query Engine |
---|---|---|---|---|
1 | URL to Markdown file, query to summarize | ✅ | ✅ | ✅ |
2 | URL to Microsoft Word document, query highlights | ✅ | ✅ | ✅ |
3 | URL to PDF annual report, query specific figures | ✅ | ✅ | ✅ |
4 | URL to PDF document, query to explain | ✅ | ✅ | ✅ |
5 | Local file, PDF, query to explain | ✅ | ✅ | ✅ |
6 | URL to JPG of scanned invoice, query a figure | ❌ | 🔶 | 🔶 |
7 | Local file, PNG of scanned invoice, query a figure | ❌ | ❌ | ❌ |
8 | URL to XLSX using a redirect URL, query table | ✅ | 🔶 | 🔶 |
9 | URL to XLSX, query data | ❌ | 🔶 | ✅ |
10 | URL to CSV, query a figure | ❌ | N/A | N/A |
11 | URL to CSV, query to summarize | ✅ | ✅ | ✅ |
12 | URL with CSV, query unrelated | ✅ | ✅ | ✅ |
13 | Local files, 2 x Markdowns, Query to compare | ✅ | ✅ | ✅ |
14 | Local file, Markdown, unrelated query | ✅ | ✅ | ✅ |
15 | Local file, Markdown, unrelated query but general knowledge | ✅ | ✅ | ✅ |
16 | No files to ingest but has query | N/A | ✅ | ✅ |
17 | Local file, PDF of annual report, query a figure | ✅ | ✅ | ✅ |
18 | Local file, Microsoft Word, query a figure | ✅ | ✅ | ❌ |
19 | URL to web page with query to summarize | ✅ | ✅ | ✅ |
20a | Local files, PDF and DOCX, one query to cover both | ✅ | ✅ | ✅ |
20b | Follow-up query to DocAgent | N/A | ✅ | ❌ |
Collections#
By default, DocAgent
will ingest documents into the same collection. Every time you run the agent it will utilise this collection, enabling you to keep documents ingested across different runs.
However, if you want to run multiple DocAgent
s or want to ingest into a clean or specific vector store collection, you can use the collection_name
parameter when creating the agent to set a unique collection name.
from autogen import LLMConfig
from autogen.agents.experimental import DocAgent
llm_config = LLMConfig(api_type="openai", model="gpt-4o")
# Create our DocAgents with their own collections
# so that their ingested data and queries will be unique to them
with llm_config:
document_agent_apple = DocAgent(
collection_name="apple_financials"
)
document_agent_nvidia = DocAgent(
collection_name="nvidia_financials"
)
...
Further examples#
See this notebook for more examples of using document agent.