Skip to content

Chroma Core Concepts

Tenancy and DB Hierarchies

The following picture illustrates the tenancy and DB hierarchy in Chroma:

Tenancy and DB Hierarchy

Storage

In Chroma single-node, all data about tenancy, databases, collections and documents is stored in a single SQLite database.

Tenants

A tenant is a logical grouping for a set of databases. A tenant is designed to model a single organization or user. A tenant can have multiple databases.

Databases

A database is a logical grouping for a set of collections. A database is designed to model a single application or project. A database can have multiple collections.

Collections

Collections are the grouping mechanism for embeddings, documents, and metadata.

Documents

Chunks of text

Documents in ChromaDB lingo are chunks of text that fits within the embedding model's context window. Unlike other frameworks that use the term "document" to mean a file, ChromaDB uses the term "document" to mean a chunk of text.

Documents are raw chunks of text that are associated with an embedding. Documents are stored in the database and can be queried for.

Metadata

Metadata is a dictionary of key-value pairs that can be associated with an embedding. Metadata is stored in the database and can be queried for.

Metadata values can be of the following types:

  • strings
  • integers
  • floats (float32)
  • booleans

Embedding Function

Also referred to as embedding model, embedding functions in ChromaDB are wrappers that expose a consistent interface for generating embedding vectors from documents or text queries.

For a list of supported embedding functions see Chroma's official documentation.

Distance Function

Distance functions help in calculating the difference (distance) between two embedding vectors. ChromaDB supports the following distance functions:

  • Cosine - Useful for text similarity
  • Euclidean (L2) - Useful for text similarity, more sensitive to noise than cosine
  • Inner Product (IP) - Recommender systems

Embedding Vector

A representation of a document in the embedding space in te form of a vector, list of 32-bit floats (or ints).

Embedding Model

Document and Metadata Index

The document and metadata index is stored in SQLite database.

Vector Index (HNSW Index)

Under the hood (ca. v0.4.22) Chroma uses its own fork HNSW lib for indexing and searching vectors.

In a single-node mode, Chroma will create a single HNSW index for each collection. The index is stored in a subdir of your persistent dir, named after the collection id (UUID-based).

The HNSW lib uses fast ANN algo to search the vectors in the index.