Filters¶
Chroma provides two types of filters:
- Metadata - filter documents based on metadata using
whereclause in eitherCollection.query()orCollection.get() - Document - filter documents based on document content using
where_documentinCollection.query()orCollection.get().
Those familiar with MongoDB queries will find Chroma's filters very similar.
Runnable Examples
Complete, runnable filtering examples for each language are available in the examples/filtering directory:
Interactive playground
Build filters interactively
Compose where and where_document, preview payloads, and copy Cloud or Local starter code.
Metadata Filters¶
Schema¶
Filter Schema vs Record Metadata Schema
The JSON schema below validates where filter expressions, not the metadata contract of records you ingest.
For application-layer metadata validation/enforcement patterns, see Metadata Schema Validation.
You can use the following JSON schema to validate your where filters:
{
"$schema": "https://json-schema.org/draft/2020-12/schema#",
"title": "Chroma Metadata Where Filter Schema",
"description": "Schema for Chroma metadata filters used in where clauses",
"oneOf": [
{
"type": "object",
"patternProperties": {
"^[^$].*$": {
"oneOf": [
{
"type": ["string", "number", "boolean"]
},
{
"type": "object",
"properties": {
"$eq": {"type": ["string", "number", "boolean"]},
"$ne": {"type": ["string", "number", "boolean"]},
"$gt": {"type": "number"},
"$gte": {"type": "number"},
"$lt": {"type": "number"},
"$lte": {"type": "number"},
"$in": {
"oneOf": [
{
"type": "array",
"items": { "type": "string" },
"minItems": 1
},
{
"type": "array",
"items": { "type": "number" },
"minItems": 1
},
{
"type": "array",
"items": { "type": "boolean" },
"minItems": 1
}
]
},
"$nin": {
"oneOf": [
{
"type": "array",
"items": { "type": "string" },
"minItems": 1
},
{
"type": "array",
"items": { "type": "number" },
"minItems": 1
},
{
"type": "array",
"items": { "type": "boolean" },
"minItems": 1
}
]
},
"$contains": {"type": ["string", "number", "boolean"]},
"$not_contains": {"type": ["string", "number", "boolean"]}
},
"additionalProperties": false,
"minProperties": 1,
"maxProperties": 1
}
]
}
},
"minProperties": 1
},
{
"type": "object",
"properties": {
"$and": {
"type": "array",
"items": {"$ref": "#"},
"minItems": 2
},
"$or": {
"type": "array",
"items": {"$ref": "#"},
"minItems": 2
}
},
"additionalProperties": false,
"minProperties": 1,
"maxProperties": 1
}
]
}
Equality ($eq)¶
This filter matches attribute values that equal to a specified string, boolean, integer or float value. The value check is case-sensitive.
Supported value types are: string, boolean, integer or float (or number in JS/TS)
Simple equality:
Single condition
If you are using simple equality expression {"metadata_field": "is_equal_to_this"}, you can only specify a single condition.
Alternative syntax:
Validation Failures
When validation fails, similar to this message is expected to be returned by Chroma - ValueError: Expected where value to be a str, int, float, or operator expression, got X in get. with X refering to the inferred type of the data.
Inequality ($ne)¶
This filter matches attribute values that are not equal to a specified string, boolean, integer or float value. The value check is case-sensitive.
Supported value types are: string, boolean, integer or float (or number in JS/TS)
Greater Than ($gt)¶
This filter matches attribute values that are strictly greater than a specified numeric (integer or float) value.
Greater Than
The $gt operator is only supported for numerical values - int or float values.
Greater Than or Equal ($gte)¶
This filter matches attribute values that are greater than or equal a specified numeric (integer or float) value.
Greater Than or Equal
The $gte operator is only supported for numerical values - int or float values.
Less Than ($lt)¶
This filter matches attribute values that are less than specified numeric (integer or float) value.
Supported values: integer or float
Less Than or Equal ($lte)¶
This filter matches attribute values that are less than or equal specified numeric (integer or float) value.
Supported values: integer or float
In ($in)¶
This filter matches attribute values that are in the given list of values.
Supported value types are: string, boolean, integer or float (or number in JS/TS)
In
The $in operator is only supported for list of values of the same type.
Not In ($nin)¶
This filter matches attribute that do not have the given key or the values of which are not in the given list of values.
Supported value types are: string, boolean, integer or float (or number in JS/TS)
Not In
The $nin operator is only supported for list of values of the same type.
Array Metadata¶
Chroma >= 1.5.0
Array metadata and the $contains/$not_contains operators require Chroma 1.5.0 or later.
Chroma supports storing arrays in metadata fields. All elements in an array must be of the same type.
Supported array element types: string, integer, float, boolean
Constraints:
- Empty arrays are not allowed
- Nested arrays (arrays of arrays) are not supported
- All elements must be the same type (no mixed-type arrays)
Runnable Array Examples
End-to-end array metadata examples are included in the filtering examples:
Storing Array Metadata¶
Here is an example of a research paper collection using array metadata to store multi-value fields like topics, authors, and review scores:
import chromadb
client = chromadb.Client()
collection = client.create_collection("research_papers")
collection.add(
ids=["paper-1", "paper-2", "paper-3"],
documents=[
"We introduce a transformer-based architecture for low-resource language translation.",
"A study on the effects of soil microbiome diversity on crop yield in arid climates.",
"Applying reinforcement learning to optimize energy consumption in smart grid networks.",
],
metadatas=[
{
"authors": ["Chen", "Okafor", "Müller"],
"topics": ["nlp", "transformers", "low-resource"],
"review_scores": [8, 7, 9],
"year": 2024,
},
{
"authors": ["Patel", "Johansson"],
"topics": ["agriculture", "microbiome", "climate"],
"review_scores": [6, 7, 7],
"year": 2023,
},
{
"authors": ["Chen", "Williams"],
"topics": ["reinforcement-learning", "energy", "smart-grid"],
"review_scores": [9, 8, 9],
"year": 2024,
},
],
)
import { ChromaClient } from "chromadb";
const client = new ChromaClient();
const collection = await client.createCollection({ name: "research_papers" });
await collection.add({
ids: ["paper-1", "paper-2", "paper-3"],
documents: [
"We introduce a transformer-based architecture for low-resource language translation.",
"A study on the effects of soil microbiome diversity on crop yield in arid climates.",
"Applying reinforcement learning to optimize energy consumption in smart grid networks.",
],
metadatas: [
{
authors: ["Chen", "Okafor", "Müller"],
topics: ["nlp", "transformers", "low-resource"],
review_scores: [8, 7, 9],
year: 2024,
},
{
authors: ["Patel", "Johansson"],
topics: ["agriculture", "microbiome", "climate"],
review_scores: [6, 7, 7],
year: 2023,
},
{
authors: ["Chen", "Williams"],
topics: ["reinforcement-learning", "energy", "smart-grid"],
review_scores: [9, 8, 9],
year: 2024,
},
],
});
Go Client
Array metadata support in the Go client is pending. See the chroma-go repository for updates.
let mut metadata1 = Metadata::new();
metadata1.insert("authors".into(), vec!["Chen", "Okafor", "Müller"].into());
metadata1.insert("topics".into(), vec!["nlp", "transformers", "low-resource"].into());
metadata1.insert("review_scores".into(), vec![8i64, 7, 9].into());
metadata1.insert("year".into(), MetadataValue::Int(2024));
let mut metadata2 = Metadata::new();
metadata2.insert("authors".into(), vec!["Patel", "Johansson"].into());
metadata2.insert("topics".into(), vec!["agriculture", "microbiome", "climate"].into());
metadata2.insert("review_scores".into(), vec![6i64, 7, 7].into());
metadata2.insert("year".into(), MetadataValue::Int(2023));
let mut metadata3 = Metadata::new();
metadata3.insert("authors".into(), vec!["Chen", "Williams"].into());
metadata3.insert("topics".into(), vec!["reinforcement-learning", "energy", "smart-grid"].into());
metadata3.insert("review_scores".into(), vec![9i64, 8, 9].into());
metadata3.insert("year".into(), MetadataValue::Int(2024));
collection.add(
vec!["paper-1".into(), "paper-2".into(), "paper-3".into()],
vec![embed1, embed2, embed3],
Some(vec![
Some("We introduce a transformer-based architecture for low-resource language translation.".into()),
Some("A study on the effects of soil microbiome diversity on crop yield in arid climates.".into()),
Some("Applying reinforcement learning to optimize energy consumption in smart grid networks.".into()),
]),
None,
Some(vec![Some(metadata1), Some(metadata2), Some(metadata3)]),
).await?;
Contains ($contains)¶
Returns records where an array metadata field includes a specific value. The filter value must be a scalar matching the array's element type.
Go Client
Array metadata $contains in the Go client is pending. See the chroma-go repository for updates.
Go Client
Array metadata $contains in the Go client is pending. See the chroma-go repository for updates.
Not Contains ($not_contains)¶
Returns records where an array metadata field does not include a specific value.
Go Client
Array metadata $not_contains in the Go client is pending. See the chroma-go repository for updates.
Combining Array Filters¶
Array filters work with $and/$or logical operators and can be mixed with scalar filters:
Go Client
Array metadata $contains in the Go client is pending. See the chroma-go repository for updates.
// Papers by "Chen" published in 2024 that cover "energy" (Search API - Chroma Cloud)
let results = collection.search(vec![
SearchPayload::default()
.r#where(
(Key::field("authors").contains("Chen"))
& (Key::field("topics").contains("energy"))
& (Key::field("year").eq(2024)),
)
.rank(knn_query.clone())
.limit(Some(10), 0),
]).await?;
// Returns paper-3
Logical Operator: And ($and)¶
The $and logical operator joins two or more simple ($eq, $ne, $gt etc.) filters together and matches records for which all of the conditions in the list are satisfied.
let results = collection.get(
None,
Some(Where::Composite(CompositeExpression {
operator: BooleanOperator::And,
children: vec![
Where::Metadata(MetadataExpression {
key: "metadata_field1".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("value1".to_string()),
),
}),
Where::Metadata(MetadataExpression {
key: "metadata_field2".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("value2".to_string()),
),
}),
],
})),
None, None, None,
).await?;
Logical Operator: Or ($or)¶
The $or logical operator that joins two or more simple ($eq, $ne, $gt etc.) filters together and matches records for which at least one of the conditions in the list is satisfied.
let results = collection.get(
None,
Some(Where::Composite(CompositeExpression {
operator: BooleanOperator::Or,
children: vec![
Where::Metadata(MetadataExpression {
key: "metadata_field1".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("value1".to_string()),
),
}),
Where::Metadata(MetadataExpression {
key: "metadata_field2".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("value2".to_string()),
),
}),
],
})),
None, None, None,
).await?;
Logical Operator Nesting¶
Logical Operators can be nested.
let results = collection.get(
None,
Some(Where::Composite(CompositeExpression {
operator: BooleanOperator::And,
children: vec![
Where::Metadata(MetadataExpression {
key: "metadata_field1".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("value1".to_string()),
),
}),
Where::Composite(CompositeExpression {
operator: BooleanOperator::And,
children: vec![
Where::Metadata(MetadataExpression {
key: "metadata_field2".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("value2".to_string()),
),
}),
Where::Metadata(MetadataExpression {
key: "metadata_field3".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("value3".to_string()),
),
}),
],
}),
],
})),
None, None, None,
).await?;
Document Filters¶
Rust: Search API Required
In the Rust client, document filters use the Search API (collection.search()) with the Key::Document builder. The Search API requires Chroma Cloud.
Schema¶
You can use the following JSON schema to validate where_document expressions:
{
"$schema": "https://json-schema.org/draft/2020-12/schema#",
"title": "Chroma Document Filter Schema",
"description": "Schema for Chroma document filters used in where_document clauses",
"type": "object",
"oneOf": [
{
"properties": {
"$contains": {
"type": "string"
}
},
"required": ["$contains"],
"additionalProperties": false
},
{
"properties": {
"$not_contains": {
"type": "string"
}
},
"required": ["$not_contains"],
"additionalProperties": false
},
{
"properties": {
"$regex": {
"type": "string"
}
},
"required": ["$regex"],
"additionalProperties": false
},
{
"properties": {
"$not_regex": {
"type": "string"
}
},
"required": ["$not_regex"],
"additionalProperties": false
},
{
"properties": {
"$and": {
"type": "array",
"items": {"$ref": "#"},
"minItems": 2
}
},
"required": ["$and"],
"additionalProperties": false
},
{
"properties": {
"$or": {
"type": "array",
"items": {"$ref": "#"},
"minItems": 2
}
},
"required": ["$or"],
"additionalProperties": false
}
]
}
Contains ($contains)¶
Case-Sensitive
The $contains document filter performs a case-sensitive full-text search. For example, {"$contains": "Hello"} will not match a document containing only "hello".
Not Contains ($not_contains)¶
Regex ($regex)¶
Matches documents whose content matches the given regular expression pattern.
Not Regex ($not_regex)¶
Matches documents whose content does not match the given regular expression pattern.
Logical Operator: And ($and)¶
Logical Operators can be nested.
Logical Operator: Or ($or)¶
Pagination¶
Collection.get() allows users to specify page details limit and offset.
Interactive Filter Playground (Cloud + Local)¶
Use this interactive sandbox to sketch a filter payload before running Chroma. Switch between Cloud and Local tabs to see how the client code changes while the filter shape remains consistent. For nested logic and full schema control, switch either section to JSON mode.
Filter Playground
Build metadata (where) or document-text (where_document) filters and preview generated code.
Cloud advanced options
Local query options
Metadata filters (where)
Document filters (where_document)
Filter JSON
Playground scope: this is a learning aid for composing filter payloads and starter client code; it is not a full schema validator.