Keyword Search¶
This page covers local/OSS keyword filtering for collection get() and query() calls.
Use document filters (where_document) with $contains / $not_contains / $regex / $not_regex, optionally combined with metadata where, when you need semantic retrieval plus lexical constraints.
In Rust, the same behavior is expressed through the unified Where AST (for example Where::Document(...)).
Full-text matching is case-sensitive. For Cloud Search API hybrid ranking/fusion workflows, use the Cloud Search docs.
Short Snippets¶
Concept Snippets
The snippets below are intentionally small and may omit surrounding setup. Use the runnable examples for end-to-end scripts.
Rust uses a unified Where AST for both metadata and document filters.
let where_clause = Where::Composite(CompositeExpression {
operator: BooleanOperator::Or,
children: vec![
Where::Document(DocumentExpression {
operator: DocumentOperator::Contains,
pattern: "technology".to_string(),
}),
Where::Document(DocumentExpression {
operator: DocumentOperator::Contains,
pattern: "freak".to_string(),
}),
],
});
let results = collection
.query(
vec![vec![0.15, 0.85, 0.25]],
Some(3),
Some(where_clause),
None,
None,
)
.await?;
Regex Filters¶
Use regex operators when substring matching is not expressive enough.
const regexResults = await collection.query({
queryEmbeddings: [[0.15, 0.85, 0.25]],
nResults: 3,
whereDocument: { $regex: "technology.*pace" },
});
const notRegexResults = await collection.query({
queryEmbeddings: [[0.15, 0.85, 0.25]],
nResults: 3,
whereDocument: { $not_regex: "Innovation.*topic" },
});
regexResults, err := collection.Query(ctx,
chroma.WithQueryEmbeddings(
embeddings.NewEmbeddingFromFloat32([]float32{0.15, 0.85, 0.25}),
),
chroma.WithNResults(3),
chroma.WithWhereDocument(chroma.Regex("technology.*pace")),
)
notRegexResults, err := collection.Query(ctx,
chroma.WithQueryEmbeddings(
embeddings.NewEmbeddingFromFloat32([]float32{0.15, 0.85, 0.25}),
),
chroma.WithNResults(3),
chroma.WithWhereDocument(chroma.NotRegex("Innovation.*topic")),
)
let regex_results = collection
.query(
vec![vec![0.15, 0.85, 0.25]],
Some(3),
Some(Where::Document(DocumentExpression {
operator: DocumentOperator::Regex,
pattern: "technology.*pace".to_string(),
})),
None,
None,
)
.await?;
let not_regex_results = collection
.query(
vec![vec![0.15, 0.85, 0.25]],
Some(3),
Some(Where::Document(DocumentExpression {
operator: DocumentOperator::NotRegex,
pattern: "Innovation.*topic".to_string(),
})),
None,
None,
)
.await?;
Composing Complex Queries¶
Use this canonical filter shape when you need semantic retrieval with richer document-text constraints:
{
"where_document": {
"$and": [
{
"$or": [
{ "$contains": "technology" },
{ "$regex": "\\bAI\\b" }
]
},
{
"$or": [
{ "$contains": "LLM" },
{ "$regex": "(GPU|CUDA)" }
]
},
{ "$not_contains": "deprecated" },
{ "$not_regex": "(draft|obsolete)" }
]
}
}
This expresses:
technologysubstring OR exact-wordAIregex.LLMsubstring ORGPU/CUDAregex.- Excludes documents containing
deprecated. - Excludes documents matching
draftorobsoletepatterns.
Client mapping:
- Python: pass this object as
where_document=.... - TypeScript: pass as
whereDocument. - Go: map to
chroma.WithWhereDocument(...). - Rust: express this with document nodes in the unified
Wheretree.
Hints:
- If you also pass metadata
where, it is combined withwhere_documentusing an implicitAND. - Use nested
$and/$orin either clause to model richer logic. - For debugging candidate selection, run
get(where=..., where_document=..., include=[])first to inspect matching IDs before ranking. - Keep
includenarrow to reduce payload size and response time. - Full-text operators include contains/not-contains and regex/not-regex. In Rust these map to
DocumentOperator::{Contains, NotContains, Regex, NotRegex}. - Prefer anchored and specific regex patterns to avoid broad scans.
Core References¶
- Filters (
whereandwhere_documentoperators) - Collections (
queryresult shape,include, ID-constrained query) - Concepts (search stages and query flow)
- Advanced Queries (query-stage semantics and tradeoffs)
- Official Full Text Search docs
Full Runnable Examples¶
All runnable examples assume a local Chroma server: