Skip to content

Image Search (Multimodal Retrieval)

This page covers local/OSS image search with Chroma using OpenCLIP embeddings. You can index image URIs and query by:

  • text (query_texts) for text-to-image retrieval
  • image URI (query_uris) for image-to-image retrieval

This is useful when you need one collection to support both semantic text prompts and visual similarity search.

Language Scope

This strategy currently ships with a Python runnable example. The OpenCLIP + ImageLoader workflow demonstrated here is Python-centric in this cookbook.

Quick Start Pattern

Use this default flow:

  1. Create a collection with OpenCLIPEmbeddingFunction and ImageLoader.
  2. Add local image URIs via collection.add(..., uris=[...]).
  3. Query by text for discovery (query_texts=["a photo of a dog"]).
  4. Query by image URI for visual nearest-neighbor lookup (query_uris=[...]).
  5. Add metadata filters (where) when you need constrained retrieval by source, category, or tenant.

Short Snippet

Concept Snippet

This snippet is intentionally small and may omit setup. Use the runnable example for an end-to-end script.

from chromadb.utils.data_loaders import ImageLoader
from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction

collection = client.create_collection(
    name="multimodal_collection",
    embedding_function=OpenCLIPEmbeddingFunction(),
    data_loader=ImageLoader(),
)

collection.add(
    ids=["cat", "dog"],
    uris=["/path/to/cat.png", "/path/to/dog.jpg"],
    metadatas=[{"label": "cat"}, {"label": "dog"}],
)

text_results = collection.query(
    query_texts=["a photo of a dog"],
    n_results=2,
    include=["uris", "metadatas", "distances"],
)

image_results = collection.query(
    query_uris=["/path/to/dog.jpg"],
    n_results=2,
    include=["uris", "metadatas", "distances"],
)

End-to-End Python Walkthrough

The runnable example uses:

  • local persistence (PersistentClient)
  • bundled sample images (cat, dog, dog-with-person)
  • OpenCLIP embeddings generated by Chroma

Run:

cd examples/image-search/python
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python image_search.py

Expected output includes:

  • ranked text-to-image results for "a photo of a dog"
  • ranked text-to-image results for "a photo of a cat"
  • ranked image-to-image results using dog-with-person.jpg as query

Operational Notes

  • First run downloads model weights; startup is slower initially.
  • CPU-only execution works but is slower than GPU.
  • Keep image files accessible on disk if you store local URIs.
  • Use metadata (where) to scope search in multi-tenant or multi-source pipelines.
  • Example data is written under examples/image-search/python/chroma_data/image_search_example.

Core References

Full Runnable Example