Use ChromaDBQueryEngine to query Markdown files
ChromaDB Query Engine for document queries
This notebook demonstrates the use of the ChromaDBQueryEngine
for
retrieval-augmented question answering over documents. It shows how to
set up the engine with Docling parsed Markdown files, and execute
natural language queries against the indexed data.
The ChromaDBQueryEngine
integrates persistent ChromaDB vector storage
with LlamaIndex for efficient document retrieval.
You can create and add this ChromaDBQueryEngine to DocAgent to use.
Load LLM configuration
This demonstration requires an OPENAI_API_KEY
to be in your
environment variables. See our
documentation
for guidance.
Refer to this link for running Chromadb in a Docker container. If the host and port are not provided, the engine will create an in-memory ChromaDB client.
Here we can see the default collection name in the vector store, this is
where all documents will be ingested. When creating the
ChromaDBQueryEngine
you can specify a collection_name
to ingest
into.
Let’s ingest a document and query it.
If you don’t have your documents ingested yet, follow the next two
cells. Otherwise skip to the connect_db
cell.
init_db
will overwrite the existing collection with the same name.
If the given collection already has the document you need, you can use
connect_db
to avoid overwriting the existing collection.
Great, we got the data we needed. Now, let’s add another document.
And query again from the same database but this time for another corporate entity.