In the realm of AI and automation, handling documents and extracting information efficiently is of utmost importance. DocAgent introduces an agentic solution to this problem. It handles document ingestion and query tasks seamlessly, and with natural language instructions, by leveraging an internal swarm of agents to streamline document processing and information retrieval.

DocAgent is new and keen to support your RAG needs. However, as a first iteration it has been kept simple and uses a local vector database (Chroma).

DocAgent will continue to be developed to a production-ready standard and your feedback and contributions are most welcome!

Installation

Install AG2 with the rag extra to install the necessary packages for the DocAgent.

pip install ag2[rag]

Capabilities

The document agent can perform the following tasks:

  1. Ingest documents from a local file or URL. Supported formats:
    • PDF
    • IMAGE
    • DOCX
    • HTML
    • PPTX
    • ASCIIDOC
    • MD
  2. Answer questions with RAG capability

Currently, DocAgent answers questions only related to ingested documents. If it can’t find relevant information it will respond with that note.

Internal Swarm

DocAgent leverages a Swarm of internal agents to handle the complex document processing tasks fluidly and efficiently.

Here’s a breakdown of how the internal swarm chat is used within the DocAgent:

Swarm Agents

DocAgent orchestrates the following swarm agents:

  • Triage Agent: Decides what type of task to perform from user requests.
  • Task Manager Agent: Manages the tasks and initiates actions.
  • Data Ingestion Agent: Ingests the documents.
  • Query Agent: Answers user questions based on ingested documents.
  • Error Agent: If anything fails, the error agent will report the problem back.
  • Summary Agent: Generates a summary of the completed tasks.

Workflow

  1. Initialization: The DocAgent initializes the swarm agents and sets up the context variables.
  2. Triage User Requests: The Triage Agent categorizes the tasks into ingestions and queries.
  3. Task Management: The Task Manager Agent manages the tasks and ensures they are executed in the correct sequence.
  4. Data Ingestion: The Data Ingestion Agent processes the documents.
  5. Query Execution: The Query Agent answers the user’s questions.
  6. Summary Generation: The Summary Agent generates a summary of the completed tasks.

Example

The internal ingestation of documents requires an LLM and it will use OpenAI’s GPT-4o. Please ensure you have an OPENAI_API_KEY environment variable set.

This agent is currently in our experimental namespace, indicating that we have tested the functionality but the agent’s interface may change. Please use it with that in mind.

If you do find any bugs please log an issue in the AG2 repository.

In the following simple example we ask the DocAgent to ingest a document and then provide a financial summary.

Note that the request is handled in natural language and the output will show the internal agents working together to understand, classify, ingest, and query.

from autogen.agents.experimental import DocAgent

llm_config = {"api_type": "openai", "model": "gpt-4o"}

# Create our DocAgent
document_agent = DocAgent(llm_config=llm_config)

# Update this path to suit your environment
document_agent.run(
    "Can you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf and tell me the fiscal year 2024 financial summary?",
    max_turns=1
)
user (to Document_Agent):

Can you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf and tell me the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------
_User (to chat_manager):

Can you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf and tell me the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------

Next speaker: DocumentTriageAgent

DocumentTriageAgent (to chat_manager):

{"ingestions":[{"path_or_url":"../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"}],"queries":[{"query_type":"RAG_QUERY","query":"What is the fiscal year 2024 financial summary?"}]}

--------------------------------------------------------------------------------

Next speaker: TaskManagerAgent

context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}

>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):

***** Suggested tool call (call_NG9mq8dtBEthy8YDsREjiRER): initiate_tasks *****
Arguments:
{"ingestions": [{"path_or_url": "../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"}], "queries": [{"query_type": "RAG_QUERY", "query": "What is the fiscal year 2024 financial summary?"}]}
*******************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Swarm_Tool_Executor


>>>>>>>> EXECUTING FUNCTION initiate_tasks...
Call ID: call_NG9mq8dtBEthy8YDsREjiRER
Input arguments: {'ingestions': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'queries': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'context_variables': {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}}
initiate_tasks context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': []}
_Swarm_Tool_Executor (to chat_manager):

***** Response from calling tool (call_NG9mq8dtBEthy8YDsREjiRER) *****
Updated context variables with task decisions
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: TaskManagerAgent

context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
context_variables {'CompletedTaskCount': 0, 'DocumentsToIngest': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}

>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):

***** Suggested tool call (call_I30PrRbDngJPmidOOKutGhsa): transfer_TaskManagerAgent_to_DoclingDocIngestAgent *****
Arguments:
{}
*******************************************************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Swarm_Tool_Executor


>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_DoclingDocIngestAgent...
Call ID: call_I30PrRbDngJPmidOOKutGhsa
Input arguments: {}
_Swarm_Tool_Executor (to chat_manager):

***** Response from calling tool (call_I30PrRbDngJPmidOOKutGhsa) *****
Swarm agent --> DoclingDocIngestAgent
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: DoclingDocIngestAgent


>>>>>>>> USING AUTO REPLY...
DoclingDocIngestAgent (to chat_manager):

***** Suggested tool call (call_zloTpZZrxiNcvyflE0AHSjAw): data_ingest_task *****
Arguments:
{}
*********************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Swarm_Tool_Executor


>>>>>>>> EXECUTING FUNCTION data_ingest_task...
Call ID: call_zloTpZZrxiNcvyflE0AHSjAw
Input arguments: {'context_variables': {'CompletedTaskCount': 0, 'DocumentsToIngest': [{'path_or_url': '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'}], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}}
INFO:autogen.agents.experimental.document_agent.document_utils:Error when checking if ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf is a valid URL: Invalid URL.
INFO:autogen.agents.experimental.document_agent.document_utils:Detected file. Returning file path...
INFO:docling.document_converter:Going to convert document batch...
/home/vscode/.local/lib/python3.10/site-packages/docling/models/easyocr_model.py:58: UserWarning: Deprecated field. Better to set the `accelerator_options.device` in `pipeline_options`. When `use_gpu and accelerator_options.device == AcceleratorDevice.CUDA` the GPU is used to run EasyOCR. Otherwise, EasyOCR runs in CPU.
  warnings.warn(
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cpu'
INFO:docling.pipeline.base_pipeline:Processing document Toast_financial_report.pdf
INFO:docling.document_converter:Finished converting document Toast_financial_report.pdf in 16.33 sec.
INFO:autogen.agents.experimental.document_agent.parser_utils:Document converted in 16.33 seconds.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Collection docling-parsed-docs was created in the database.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Loading input doc: /workspaces/ag2/notebook/parsed_docs/Toast_financial_report.md
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Documents are loaded successfully.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:VectorDB index was created with input documents
docling ingest: {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
 {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
_Swarm_Tool_Executor (to chat_manager):

***** Response from calling tool (call_zloTpZZrxiNcvyflE0AHSjAw) *****
Data Ingestion Task Completed for ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: TaskManagerAgent

context_variables {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}
context_variables {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}

>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):

***** Suggested tool call (call_jDHDy8MHHqvucS89X62yUpFE): transfer_TaskManagerAgent_to_QueryAgent *****
Arguments:
{}
********************************************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Swarm_Tool_Executor


>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_QueryAgent...
Call ID: call_jDHDy8MHHqvucS89X62yUpFE
Input arguments: {}
_Swarm_Tool_Executor (to chat_manager):

***** Response from calling tool (call_jDHDy8MHHqvucS89X62yUpFE) *****
Swarm agent --> QueryAgent
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: QueryAgent


>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):

***** Suggested tool call (call_YFLZPfxUvG0cBqWw0dnH2keA): execute_rag_query *****
Arguments:
{}
**********************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Swarm_Tool_Executor


>>>>>>>> EXECUTING FUNCTION execute_rag_query...
Call ID: call_YFLZPfxUvG0cBqWw0dnH2keA
Input arguments: {'context_variables': {'CompletedTaskCount': 1, 'DocumentsToIngest': [], 'QueriesToRun': [{'query_type': 'RAG_QUERY', 'query': 'What is the fiscal year 2024 financial summary?'}], 'QueryResults': [], 'TaskInitiated': True}}
_Swarm_Tool_Executor (to chat_manager):

***** Response from calling tool (call_YFLZPfxUvG0cBqWw0dnH2keA) *****
For the fiscal year 2024, Toast, Inc. reported total assets of $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023. The total liabilities were $807 million, and stockholders' equity was $1,420 million. The company achieved total revenue of $3,622 million for the nine months ended September 30, 2024, with a gross profit of $857 million. Operating expenses totaled $873 million, resulting in a loss from operations of $16 million. The net loss for the period was $13 million, with basic and diluted loss per share of $0.02.
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: QueryAgent


>>>>>>>> USING AUTO REPLY...
QueryAgent (to chat_manager):

For the fiscal year 2024, Toast, Inc. reported the following financial summary:
- Total assets: $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023.
- Total liabilities: $807 million.
- Stockholders' equity: $1,420 million.
- Total revenue: $3,622 million for the nine months ended September 30, 2024.
- Gross profit: $857 million.
- Operating expenses: $873 million, leading to a loss from operations of $16 million.
- Net loss for the period: $13 million.
- Basic and diluted loss per share: $0.02.

--------------------------------------------------------------------------------

Next speaker: TaskManagerAgent

context_variables {'CompletedTaskCount': 2, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': [{'query': 'What is the fiscal year 2024 financial summary?', 'result': "For the fiscal year 2024, Toast, Inc. reported total assets of $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023. The total liabilities were $807 million, and stockholders' equity was $1,420 million. The company achieved total revenue of $3,622 million for the nine months ended September 30, 2024, with a gross profit of $857 million. Operating expenses totaled $873 million, resulting in a loss from operations of $16 million. The net loss for the period was $13 million, with basic and diluted loss per share of $0.02."}], 'TaskInitiated': True}
context_variables {'CompletedTaskCount': 2, 'DocumentsToIngest': [], 'QueriesToRun': [], 'QueryResults': [{'query': 'What is the fiscal year 2024 financial summary?', 'result': "For the fiscal year 2024, Toast, Inc. reported total assets of $2,227 million as of September 30, 2024, compared to $1,958 million as of December 31, 2023. The total liabilities were $807 million, and stockholders' equity was $1,420 million. The company achieved total revenue of $3,622 million for the nine months ended September 30, 2024, with a gross profit of $857 million. Operating expenses totaled $873 million, resulting in a loss from operations of $16 million. The net loss for the period was $13 million, with basic and diluted loss per share of $0.02."}], 'TaskInitiated': True}

>>>>>>>> USING AUTO REPLY...
TaskManagerAgent (to chat_manager):

***** Suggested tool call (call_3XuWepUFg5FPt6C6JtzrxbFJ): transfer_TaskManagerAgent_to_SummaryAgent *****
Arguments:
{}
**********************************************************************************************************

--------------------------------------------------------------------------------

Next speaker: _Swarm_Tool_Executor


>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_SummaryAgent...
Call ID: call_3XuWepUFg5FPt6C6JtzrxbFJ
Input arguments: {}
_Swarm_Tool_Executor (to chat_manager):

***** Response from calling tool (call_3XuWepUFg5FPt6C6JtzrxbFJ) *****
Swarm agent --> SummaryAgent
**********************************************************************

--------------------------------------------------------------------------------

Next speaker: SummaryAgent


>>>>>>>> USING AUTO REPLY...
SummaryAgent (to chat_manager):

The fiscal year 2024 financial summary for Toast, Inc. is as follows:

- Total assets increased to $2,227 million by September 30, 2024, from $1,958 million at the end of 2023.
- The company's total liabilities stood at $807 million.
- Stockholders' equity was reported at $1,420 million.
- Toast, Inc. generated total revenue of $3,622 million for the nine months ending September 30, 2024.
- The company recorded a gross profit of $857 million.
- Operating expenses amounted to $873 million, resulting in an operating loss of $16 million.
- There was a net loss of $13 million for the period, with a basic and diluted loss per share of $0.02.

--------------------------------------------------------------------------------
Document_Agent (to user):

The fiscal year 2024 financial summary for Toast, Inc. is as follows:

- Total assets increased to $2,227 million by September 30, 2024, from $1,958 million at the end of 2023.
- The company's total liabilities stood at $807 million.
- Stockholders' equity was reported at $1,420 million.
- Toast, Inc. generated total revenue of $3,622 million for the nine months ending September 30, 2024.
- The company recorded a gross profit of $857 million.
- Operating expenses amounted to $873 million, resulting in an operating loss of $16 million.
- There was a net loss of $13 million for the period, with a basic and diluted loss per share of $0.02.

--------------------------------------------------------------------------------

Collections

By default, DocAgent will ingest documents into the same collection. Every time you run the agent it will utilise this collection, enabling you to keep documents ingested across different runs.

However, if you want to run multiple DocAgents or want to ingest into a clean or specific vector store collection, you can use the collection_name parameter when creating the agent to set a unique collection name.

from autogen.agents.experimental import DocAgent

llm_config = {"api_type": "openai", "model": "gpt-4o"}

# Create our DocAgents with their own collections
# so that their ingested data and queries will be unique to them
document_agent_apple = DocAgent(
    llm_config=llm_config,
    collection_name="apple_financials"
    )

document_agent_nvidia = DocAgent(
    llm_config=llm_config,
    collection_name="nvidia_financials"
    )

...

Further examples

See this notebook for more examples of using document agent.