Skip to content

Frequently Asked Questions and Commonly Encountered Issues

This section provides answers to frequently asked questions and information on commonly encountered problem when working with Chroma. These information below is based on interactions with the Chroma community.

Frequently Asked Questions

What does Chroma use to index embedding vectors?

Chroma uses its own fork of HNSW lib for indexing and searching embeddings.

Alternative Questions:

  • What library does Chroma use for vector index and search?
  • What algorithm does Chroma use for vector search?

How to set dimensionality of my collections?

When creating a collection, its dimensionality is determined by the dimensionality of the first embedding added to it. Once the dimensionality is set, it cannot be changed. Therefore, it is important to consistently use embeddings of the same dimensionality when adding or querying a collection.

Example:

import chromadb

client = chromadb.Client()

collection = client.create_collection("name")  # dimensionality is not set yet

# add an embedding to the collection
collection.add(ids=["id1"], embeddings=[[1, 2, 3]])  # dimensionality is set to 3

Alternative Questions:

  • Can I change the dimensionality of a collection?

Can I use transformers models with Chroma?

Generally, yes you can use transformers models with Chroma. Although Chroma does not provide a wrapper for this, you can use SentenceTransformerEmbeddingFunction to achieve the same result. The sentence-transformer library will implicitly do mean-pooling on the last hidden layer, and you'll get a warning about it - No sentence-transformers model found with name [model name]. Creating a new one with MEAN pooling.

Example:

from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

ef = SentenceTransformerEmbeddingFunction(model_name="FacebookAI/xlm-roberta-large-finetuned-conll03-english")

print(ef(["test"]))

Warning

Not all models will work with the above method. Also mean pooling may not be the best strategy for the model. Read the model card and try to understand what if any pooling the creators recommend. You may also want to normalize the embeddings before adding them to Chroma (pass normalize_embeddings=True to the SentenceTransformerEmbeddingFunction EF constructor).

Commonly Encountered Problems

Collection Dimensionality Mismatch

Symptoms:

This error usually exhibits in the following error message:

chromadb.errors.InvalidDimensionException: Embedding dimension XXX does not match collection dimensionality YYY

Context:

When adding/upserting or querying Chroma collection. This error is more visible/pronounced when using the Python APIs, but will also show up in also surface in other clients.

Cause:

You are trying to add or query a collection with vectors of a different dimensionality than the collection was created with.

Explanation/Solution:

When you first create a collection client.create_collection("name"), the collection will not have knowledge of its dimensionality so that allows you to add vectors of any dimensionality to it. However, once your first batch of embeddings is added to the collection, the collection will be locked to that dimensionality. Any subsequent query or add operation must use embeddings of the same dimensionality. The dimensionality of the embeddings is a characteristic of the embedding model (EmbeddingFunction) used to generate the embeddings, therefore it is important to consistently use the same EmbeddingFunction when adding or querying a collection.

Tip

If you do not specify an embedding_function when creating (client.create_collection) or getting (client.get_or_create_collection) a collection, Chroma wil use its default embedding function.

Large Distances in Search Results

Symptoms:

When querying a collection, you get results that are in the 10s or 100s.

Context:

Frequently when using you own embedding function.

Cause:

The embeddings are not normalized.

Explanation/Solution:

L2 (Euclidean distance) and IP (inner product) distance metrics are sensitive to the magnitude of the vectors. Chroma uses L2 by default. Therefore, it is recommended to normalize the embeddings before adding them to Chroma.

Here is an example how to normalize embeddings using L2 norm:

import numpy as np


def normalize_L2(vector):
    """Normalizes a vector to unit length using L2 norm."""
    norm = np.linalg.norm(vector)
    if norm == 0:
        return vector
    return vector / norm

OperationalError: no such column: collections.topic

Symptoms:

The error OperationalError: no such column: collections.topic is raised when trying to access Chroma locally or remotely.

Context:

After upgrading to Chroma 0.5.0 or accessing your Chroma persistent data with Chroma client version 0.5.0.

Cause:

In version 0.5.x Chroma has made some SQLite3 schema changes that are not backwards compatible with the previous versions. Once you access your persistent data on the server or locally with the new Chroma version it will automatically migrate to the new schema. This operation is not reversible.

Explanation/Solution:

To resolve this issue you will need to upgrade all your clients accessing the Chroma data to version 0.5.x.

Here's a link to the migration performed by Chroma - https://github.com/chroma-core/chroma/blob/main/chromadb/migrations/sysdb/00005-remove-topic.sqlite.sql