Image Search (Multimodal Retrieval)¶
This page covers local/OSS image search with Chroma using OpenCLIP embeddings. You can index image URIs and query by:
- text (
query_texts) for text-to-image retrieval - image URI (
query_uris) for image-to-image retrieval
This is useful when you need one collection to support both semantic text prompts and visual similarity search.
Language Scope
This strategy currently ships with a Python runnable example.
The OpenCLIP + ImageLoader workflow demonstrated here is Python-centric in this cookbook.
Quick Start Pattern¶
Use this default flow:
- Create a collection with
OpenCLIPEmbeddingFunctionandImageLoader. - Add local image URIs via
collection.add(..., uris=[...]). - Query by text for discovery (
query_texts=["a photo of a dog"]). - Query by image URI for visual nearest-neighbor lookup (
query_uris=[...]). - Add metadata filters (
where) when you need constrained retrieval by source, category, or tenant.
Short Snippet¶
Concept Snippet
This snippet is intentionally small and may omit setup. Use the runnable example for an end-to-end script.
from chromadb.utils.data_loaders import ImageLoader
from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
collection = client.create_collection(
name="multimodal_collection",
embedding_function=OpenCLIPEmbeddingFunction(),
data_loader=ImageLoader(),
)
collection.add(
ids=["cat", "dog"],
uris=["/path/to/cat.png", "/path/to/dog.jpg"],
metadatas=[{"label": "cat"}, {"label": "dog"}],
)
text_results = collection.query(
query_texts=["a photo of a dog"],
n_results=2,
include=["uris", "metadatas", "distances"],
)
image_results = collection.query(
query_uris=["/path/to/dog.jpg"],
n_results=2,
include=["uris", "metadatas", "distances"],
)
End-to-End Python Walkthrough¶
The runnable example uses:
- local persistence (
PersistentClient) - bundled sample images (
cat,dog,dog-with-person) - OpenCLIP embeddings generated by Chroma
Run:
cd examples/image-search/python
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python image_search.py
Expected output includes:
- ranked text-to-image results for
"a photo of a dog" - ranked text-to-image results for
"a photo of a cat" - ranked image-to-image results using
dog-with-person.jpgas query
Operational Notes¶
- First run downloads model weights; startup is slower initially.
- CPU-only execution works but is slower than GPU.
- Keep image files accessible on disk if you store local URIs.
- Use metadata (
where) to scope search in multi-tenant or multi-source pipelines. - Example data is written under
examples/image-search/python/chroma_data/image_search_example.
Core References¶
- Collections API (
add,query, result shapes) - Embedding Models (overview of supported models)
- Embedding Functions GPU Support (OpenCLIP)