VectorDB

VectorDB(*args, **kwargs)

Abstract class for vector database. A vector database is responsible for storing and retrieving documents.

Parameters:
NameDescription
*args
**kwargs

Class Attributes

active_collection



embedding_function



type



Instance Methods

create_collection

create_collection(
    self,
    collection_name: str,
    overwrite: bool = False,
    get_or_create: bool = True
) -> Any

Create a collection in the vector database.
Case 1. if the collection does not exist, create the collection.
Case 2. the collection exists, if overwrite is True, it will overwrite the collection.
Case 3. the collection exists and overwrite is False, if get_or_create is True, it will get the collection, otherwise it raise a ValueError.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Type: str
overwriteboolWhether to overwrite the collection if it exists.

Default is False.

Type: bool

Default: False
get_or_createboolWhether to get the collection if it exists.

Default is True.

Type: bool

Default: True
Returns:
TypeDescription
AnyAny | The collection object.

delete_collection

delete_collection(self, collection_name: str) -> Any

Delete the collection from the vector database.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Type: str
Returns:
TypeDescription
AnyAny

delete_docs

delete_docs(
    self,
    ids: list[str | int],
    collection_name: str = None,
    **kwargs
) -> None

Delete documents from the collection of the vector database.

Parameters:
NameDescription
idsList[ItemID]A list of document ids.

Each id is a typed ItemID.

Type: list[str | int]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
**kwargs
Returns:
TypeDescription
NoneNone

get_collection

get_collection(self, collection_name: str = None) -> Any

Get the collection from the vector database.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Default is None.

If None, return the current active collection.

Type: str

Default: None
Returns:
TypeDescription
AnyAny | The collection object.

get_docs_by_ids

get_docs_by_ids(
    self,
    ids: list[str | int] = None,
    collection_name: str = None,
    include: list[str] | None = None,
    **kwargs: Any
) -> list[Document]

Retrieve documents from the collection of the vector database based on the ids.

Parameters:
NameDescription
idsList[ItemID]A list of document ids.

If None, will return all the documents.

Default is None.

Type: list[str | int]

Default: None
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
includeList[str]The fields to include.

Default is None.

If None, will include [“metadatas”, “documents”], ids will always be included.

This may differ depending on the implementation.

Type: list[str] | None

Default: None
**kwargsType: Any
Returns:
TypeDescription
list[Document]List[Document] | The results.

insert_docs

insert_docs(
    self,
    docs: list[Document],
    collection_name: str = None,
    upsert: bool = False,
    **kwargs
) -> None

Insert documents into the collection of the vector database.

Parameters:
NameDescription
docsList[Document]A list of documents.

Each document is a TypedDict Document.

Type: list[Document]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
upsertboolWhether to update the document if it exists.

Default is False.

Type: bool

Default: False
**kwargs
Returns:
TypeDescription
NoneNone

retrieve_docs

retrieve_docs(
    self,
    queries: list[str],
    collection_name: str = None,
    n_results: int = 10,
    distance_threshold: float = -1,
    **kwargs: Any
) -> list[list[tuple[Document, float]]]

Retrieve documents from the collection of the vector database based on the queries.

Parameters:
NameDescription
queriesList[str]A list of queries.

Each query is a string.

Type: list[str]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
n_resultsintThe number of relevant documents to return.

Default is 10.

Type: int

Default: 10
distance_thresholdfloatThe threshold for the distance score, only distance smaller than it will be returned.

Don’t filter with it if 0. Default is -1.

Type: float

Default: -1
**kwargsType: Any
Returns:
TypeDescription
list[list[tuple[Document, float]]]QueryResults | The query results. Each query result is a list of list of tuples containing the document and the distance.

update_docs

update_docs(
    self,
    docs: list[Document],
    collection_name: str = None,
    **kwargs
) -> None

Update documents in the collection of the vector database.

Parameters:
NameDescription
docsList[Document]A list of documents.

Type: list[Document]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
**kwargs
Returns:
TypeDescription
NoneNone