Skip to content

Maintenance

This section describes maintenance tooling and procedures for running your Chroma database.

Chroma Ops (Tooling)

Chroma Ops is a maintenance CLI for Chroma. It provides a set of commands for inspecting, configuring and improving the performance of your Chroma database.

Use Cases

Chroma Ops is designed to help you maintain a healthy Chroma database. It can also be used for inspecting the state of your database. The following use cases are supported:

  • πŸ“¦ Database Maintenance
  • db info - gathers general information about your Chroma persistent database
  • db clean - cleans up the database from unused files (for now only orphanated HNSW segment directories)
  • πŸ“ Write-Ahead Log (WAL) Maintenance
  • wal info - gathers information about the Write-Ahead Log (WAL)
  • wal commit - commits the WAL to all collections with outstanding changes
  • wal export - exports the WAL to a jsonl file. This can be used for debugging and for auditing.
  • wal config - allows you to configure the WAL for your Chroma database.
  • wal clean - cleans up the WAL from old, committed transactions.
  • πŸ” Full Text Search (FTS) Maintenance
  • fts rebuild - rebuilds the FTS index for all collections or change the tokenizer.
  • 🧬 Vector Index (HNSW) Maintenance
  • hnsw info - gathers information about the HNSW index for a given collection
  • hnsw rebuild - rebuilds the HNSW index for a given collection and allows the modification of otherwise immutable (construction-only) parameters. Useful command to keep your HNSW index healthy and prevent fragmentation.
  • hnsw config - allows you to configure the HNSW index for your Chroma database.
  • πŸ“Έ Collection Maintenance
  • collection snapshot - creates a snapshot of a collection. The snapshots are self-contained and are meant to be used for backup and restore.

Need help/Need more?

If you need help or need more features, please join the Discord server and let us know. Or just do a pull request on GitHub.

Installation

Chroma Ops can be installed using pip:

pip install --upgrade chromadb-ops

Usage

Database Maintenance

Database Info

What it does: Gathers general information about your Chroma persistent database (works only for local persistent databases).

Why it's useful: Run this command to better understand the current state of your database. It can provide you with invaluable information about any potential issues and also helps us help you in debugging issues.

chops db info /path/to/persist_dir

Options:

  • --skip-collection-names (-s) - to skip specific collections
  • --privacy-mode (-p) - privacy mode hides paths and collection names so that the output can be shared without exposing sensitive information

When sharing larger outputs consider storing the output in a file:

chops db info /path/to/persist_dir -p > chroma_info.txt

Example output:

chops db info smallc

                                 General Info
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                    Property ┃ Value                                          ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚              Chroma Version β”‚ 0.5.5                                          β”‚
β”‚        Number of Collection β”‚ 1                                              β”‚
β”‚           Persist Directory β”‚ /tmp/tmp9l3ceuvp                               β”‚
β”‚      Persist Directory Size β”‚ 142.2MiB                                       β”‚
β”‚              SystemDB size: β”‚ 81.6MiB (/tmp/tmp9l3ceuvp/chroma.sqlite3)      β”‚
β”‚     Orphan HNSW Directories β”‚ []                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
───────────────────────────────── Collections ──────────────────────────────────
───────────────────────────────────── test ─────────────────────────────────────
                             'test' Collection Data
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃         Table Data ┃ Value                                                   ┃
┑━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚                 ID β”‚ 9e80e4fd-fd4b-47b8-810c-e8ffa57c1912                    β”‚
β”‚               Name β”‚ test                                                    β”‚
β”‚           Metadata β”‚ None                                                    β”‚
β”‚          Dimension β”‚ 1536                                                    β”‚
β”‚             Tenant β”‚ default_tenant                                          β”‚
β”‚           Database β”‚ default_database                                        β”‚
β”‚            Records β”‚ 10,000                                                  β”‚
β”‚        WAL Entries β”‚ 10,000                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
─────────────────────────────────── Segments ───────────────────────────────────
                            Metadata Segment (test)
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                Property ┃ Value                                              ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚              Segment ID β”‚ 832fa2cd-6c40-4eee-ad7d-35f260acaaaa               β”‚
β”‚                    Type β”‚ urn:chroma:segment/metadata/sqlite                 β”‚
β”‚                   Scope β”‚ METADATA                                           β”‚
β”‚        SysDB Max Seq ID β”‚ 10,000                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              HNSW Segment (test)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                     Property ┃ Value                                         ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚                   Segment ID β”‚ 13609103-d317-4556-a744-008c96229b72          β”‚
β”‚                         Type β”‚ urn:chroma:segment/vector/hnsw-local-persist… β”‚
β”‚                        Scope β”‚ VECTOR                                        β”‚
β”‚                         Path β”‚ /tmp/tmp9l3ceuvp/13609103-d317-4556-a744-008… β”‚
β”‚             SysDB Max Seq ID β”‚ 0                                             β”‚
β”‚                HNSW Dir Size β”‚ 60.6MiB                                       β”‚
β”‚     HNSW Metadata Max Seq ID β”‚ 10,000                                        β”‚
β”‚   HNSW Metadata Total Labels β”‚ 10,000                                        β”‚
β”‚                      WAL Gap β”‚ 0                                             β”‚
β”‚ HNSW Raw Total Active Labels β”‚ 10,000                                        β”‚
β”‚    HNSW Raw Allocated Labels β”‚ 10,000                                        β”‚
β”‚           HNSW Orphan Labels β”‚ set()                                         β”‚
β”‚          Fragmentation Level β”‚ 0.0                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

⚠️ Interesting things to look for:

  • Fragmentation Level - the higher the value the more unnecessary memory and performance hits your HNSW index suffers. It needs to be rebuilt.
  • Orphan HNSW Directories - these are directories that are not associated with any collection. They can be safely deleted.
  • WAL Entries - high values usually means that you need prune your WAL. Use either this tool or the official Chroma CLI.
  • HNSW Orphan Labels - this must always be empty set, if you see anything else report it in Discord @taz.

How to Read the output

General Info

This section presents general Chroma persistent dir info.

  • Chroma Version - the currently installed Chroma version.
  • Number of Collection - the number of collections in the persistent dir.
  • Persist Directory - the path to the persistent dir (if privacy mode is off).
  • Persist Directory Size - the size of the persistent dir.
  • SystemDB size - the size of the system database (if privacy mode is off the full path to the sqlite3 file is shown).
  • Orphan HNSW Directories - a list of orphan HNSW directories. These directories are present in the persistent dir but are not associated with any collection.

Collections

  • ID - the collection ID.
  • Name - the collection name.
  • Metadata - the metadata associated with the collection.
  • Dimension - the dimension of the embeddings in the collection. (this can be None in case no vectors are present and the collection is newly created).
  • Tenant - the tenant of the collection.
  • Database - the database of the collection.
  • Records - the number of records in the collection.
  • WAL Entries - the number of WAL entries in the collection (as of 0.5.5 for new instances Chroma will clean WAL for each collection periodically).

Metadata Segment

  • Segment ID - the segment ID.
  • Type - the segment type.
  • Scope - the segment scope.
  • SysDB Max Seq ID - the maximum sequence ID in the system database.

HNSW Segment

  • Segment ID - the segment ID.
  • Type - the segment type.
  • Scope - the segment scope.
  • Path - the path to the HNSW directory.
  • SysDB Max Seq ID - the maximum sequence ID in the system database.
  • HNSW Dir Size - the size of the HNSW directory.
  • HNSW Metadata Max Seq ID - the maximum sequence ID in the HNSW metadata.
  • HNSW Metadata Total Labels - the total number of labels in the HNSW metadata.
  • WAL Gap - the difference between the maximum sequence ID in the system database and the maximum sequence ID in the HNSW metadata. The gap usually represents the number of WAL entries that are not committed to the HNSW index.
  • HNSW Raw Total Active Labels - the total number of active labels in the HNSW index.
  • HNSW Raw Allocated Labels - the total number of allocated labels in the HNSW index.
  • HNSW Orphan Labels - a set of orphan labels in the HNSW index. These are labels in the HNSW index that are not visible to Chroma as they are not part of the metadata. This set should always be empty, if not please report it!!!
  • Fragmentation Level - the fragmentation level of the HNSW index.
Database Clean

What it does: Cleans up the database from unused files. It will remove all orphanated HNSW segment directories.

Why it's useful: Orphanated HNSW segment directories sometimes are the byproduct of a filesystem failure to remove the HNSW segment directory, most commonly encountered on Windows systems, but any type of file loocking or disk operation failure can cause Chroma to leave behind these directories.

chops db clean /path/to/persist_dir

Supported options are:

  • --dry-run (-d) - to see what would be deleted without actually deleting anything.

Example output:

chops db clean smallc
ChromaDB version: 0.6.2
Cleaning up orphanated segment dirs...

                             Orphanated HNSW segment dirs                             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Segment ID                           ┃ Path                                        ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ 2E9021A8-A767-4339-B2C2-2F4B22C05F1D β”‚ smallc/2E9021A8-A767-4339-B2C2-2F4B22C05F1D β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Are you sure you want to delete these segment dirs? [y/N]: 

WAL Maintenance

WAL Info

What it does: Gathers information about the Write-Ahead Log (WAL).

Why it's useful: Run this command to better understand the current state of the Write-Ahead Log (WAL). It can provide you with invaluable information about any potential issues and also helps us help you in debugging issues.

chops wal info /path/to/persist_dir

Example output:

chops wal info smallc
ChromaDB version: 0.6.2

WAL config is set to: auto purge.
                                         WAL Info                                         
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Collection ┃ Topic                                                             ┃ Count ┃
┑━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
β”‚ test       β”‚ persistent://default/default/97f5234e-d02a-43b8-9909-99447950c949 β”‚ 20    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
WAL Export

What it does: Exports the Write-Ahead Log (WAL) to a jsonl file. This can be used for debugging and for auditing.

Why it's useful: This command is useful for exporting the Write-Ahead Log (WAL) to a jsonl file. This can be used for debugging and for auditing.

chops wal export /path/to/persist_dir

Example output:

chops wal export smallc --out wal.jsonl
ChromaDB version: 0.6.2
       Exporting WAL        
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Collection ┃ WAL Entries ┃
┑━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
β”‚ test       β”‚ 20          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Are you sure you want to export the WAL? [y/N]: y
Exported 20 rows
WAL Commit

What it does: Commits the Write-Ahead Log (WAL) to all collections with outstanding changes.

Why it's useful: This command is useful for committing the Write-Ahead Log (WAL) to all collections with outstanding changes.

chops wal commit /path/to/persist_dir

Options:

  • --skip (-s) - skip certain collections by running chops wal commit /path/to/persist_dir --skip <collection_name>
  • --yes (-y) - skip confirmation prompt (default: False, prompt will be shown)

Example output:

chops wal commit smallc
ChromaDB version: 0.6.2
     WAL Commit Summary     
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Collection ┃ WAL Entries ┃
┑━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
β”‚ test       β”‚ 20          β”‚
β”‚ test1      β”‚ 0           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   Skipped    
 Collections  
┏━━━━━━━━━━━━┓
┃ Collection ┃
┑━━━━━━━━━━━━┩
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Are you sure you want to commit the WAL in smallc? As part of the WAL commit action your database will be migrated to currently installed version 0.6.2. [y/N]: y
Processing index for collection test (0137d64b-8d71-42f5-b0d9-28716647b068) - total vectors in index 20
WAL commit completed.
WAL Clean

What it does: Cleans up the Write-Ahead Log (WAL) from committed transactions. Recent Chroma version automatically prune the WAL so this is not needed unless you have older version of Chroma or disabled automatic WAL pruning.

Why it's useful: Keep your WAL in check so it doesn't grow too large (in case automatic WAL pruning is disabled).

chops wal clean /path/to/persist_dir

Options:

  • --yes (-y) - skip confirmation prompt (default: False, prompt will be shown)

Example output:

chops wal clean smallc                                                                                                                                                                                                                                                                        11:33:36  ☁  main β˜‚ ⚑ ✭
ChromaDB version: 0.6.2
Size before: 429596

Are you sure you want to clean up the WAL in smallc? This action will delete all WAL entries that are not committed to the HNSW index. [y/N]: y
Cleaning up WAL
WAL cleaned up. Size after: 388636
WAL Configuration

What it does: Configures the Write-Ahead Log (WAL) for your Chroma database.

Why it's useful: This command is useful for configuring the Write-Ahead Log (WAL) for your Chroma database.

chops wal config /path/to/persist_dir --purge off

Options:

  • --purge option can be set to auto (automatically purge the WAL when the number of records in the collection exceeds the number of records in the WAL) or off (disable automatic purge of the WAL). Automatic WAL purge is enabled by default. The automatic purge keeps your slite3 file smaller and faster, but it makes it hard or impossible to restore Chroma.
  • --yes option can be set to true (skip confirmation prompt) or false (show confirmation prompt). The default is false.

Example output:

chops wal config smallc --purge off
ChromaDB version: 0.6.2
                           Current WAL config                            
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Config key                                ┃ Config Change             ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ Automatically purge (automatically_purge) β”‚ True (old) -> False (new) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Are you sure you want to update the WAL config? [y/N]: y
WAL config updated successfully!

Full Text Search (FTS) Maintenance

FTS Rebuild

What it does: Rebuilds the Full Text Search (FTS) index for all collections.

Why it's useful: This command is useful for rebuilding the Full Text Search (FTS) index for all collections.

chops fts rebuild /path/to/persist_dir

Additional options:

  • --yes (-y) - skip confirmation prompt (default: False, prompt will be shown)
  • --tokenizer (-t) - the tokenizer to use for the index.

Example output:

chops fts rebuild --tokenizer unicode61 smallc
ChromaDB version: 0.6.2

Are you sure you want to rebuild the FTS index in smallc? This action will drop the existing FTS index and create a new one. [y/N]: y
Rebuilt FTS. Will try to start your Chroma now.
NOTE: Depending on the size of your documents in Chroma it may take a while for Chroma to start up again.
Chroma started successfully. FTS rebuilt.

HNSW Maintenance

HNSW Info

What it does: Gathers information about the HNSW index for a given collection.

Why it's useful: This command is useful for gathering information about the HNSW index for a given collection.

chops hnsw info /path/to/persist_dir

Additional options:

  • --collection (-c) - the collection name
  • --verbose (-v) - If specified, the HNSW index will be loaded for more accurate fragmentation level reporting.

Example output:

chops hnsw info smallc -c test
ChromaDB version: 0.6.2
    HNSW details for collection test in default_database database    
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric              ┃ Value                                       ┃
┑━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ Space               β”‚ cosine                                      β”‚
β”‚ Dimensions          β”‚ 384                                         β”‚
β”‚ EF Construction     β”‚ 200                                         β”‚
β”‚ EF Search           β”‚ 100                                         β”‚
β”‚ M                   β”‚ 64                                          β”‚
β”‚ Number of threads   β”‚ 16                                          β”‚
β”‚ Resize factor       β”‚ 1.2                                         β”‚
β”‚ Batch size          β”‚ 100                                         β”‚
β”‚ Sync threshold      β”‚ 1000                                        β”‚
β”‚ Segment ID          β”‚ 0137d64b-8d71-42f5-b0d9-28716647b068        β”‚
β”‚ Path                β”‚ smallc/0137d64b-8d71-42f5-b0d9-28716647b068 β”‚
β”‚ Has metadata        β”‚ True                                        β”‚
β”‚ Number of elements  β”‚ 20                                          β”‚
β”‚ Collection ID       β”‚ 97f5234e-d02a-43b8-9909-99447950c949        β”‚
β”‚ Index size          β”‚ 41.6KiB                                     β”‚
β”‚ Fragmentation level β”‚ 0.00% (estimated)                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
HNSW Rebuild

What it does: Rebuilds the HNSW index for a given collection and allows the modification of otherwise immutable (construction-only) parameters.

Why it's useful: This command is useful for rebuilding the HNSW index for a given collection and allows the modification of otherwise immutable (construction-only) parameters.

chops hnsw rebuild /path/to/persist_dir

Options:

  • --backup (-b) - backup the old index. At the end of the rebuild process the location of the backed up index will be printed out. (default: True)
  • --database (-d) - the database name (default: default_database)
  • --yes (-y) - skip confirmation prompt (default: False, prompt will be shown)
  • --space (-s) - the distance metric to use for the index.
  • --construction-ef (-c) - the construction ef to use for the index.
  • --search-ef (-e) - the search ef to use for the index.
  • --m (-m) - the m to use for the index.
  • --num-threads (-t) - the number of threads to use for the index.
  • --resize-factor (-r) - the resize factor to use for the index.
  • --batch-size (-b) - the batch size to use for the index.
  • --sync-threshold (-s) - the sync threshold to use for the index.

Unchanged options will be skipped

All the HNSW index options default to None which means no changes will be made if the parameter is not specified. Additionally, any options provided that are identical to the current index configuration will be skipped.

Example output:

chops hnsw rebuild smallc -c test --m 64 --construction-ef 200
ChromaDB version: 0.6.2
    HNSW details for collection test in default_database database    
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric              ┃ Value                                       ┃
┑━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ Space               β”‚ cosine                                      β”‚
β”‚ Dimensions          β”‚ 384                                         β”‚
β”‚ EF Construction     β”‚ 200                                         β”‚
β”‚ EF Search           β”‚ 100                                         β”‚
β”‚ M                   β”‚ 64                                          β”‚
β”‚ Number of threads   β”‚ 16                                          β”‚
β”‚ Resize factor       β”‚ 1.2                                         β”‚
β”‚ Batch size          β”‚ 100                                         β”‚
β”‚ Sync threshold      β”‚ 1000                                        β”‚
β”‚ Segment ID          β”‚ 0137d64b-8d71-42f5-b0d9-28716647b068        β”‚
β”‚ Path                β”‚ smallc/0137d64b-8d71-42f5-b0d9-28716647b068 β”‚
β”‚ Has metadata        β”‚ True                                        β”‚
β”‚ Number of elements  β”‚ 20                                          β”‚
β”‚ Collection ID       β”‚ 97f5234e-d02a-43b8-9909-99447950c949        β”‚
β”‚ Index size          β”‚ 47.6KiB                                     β”‚
β”‚ Fragmentation level β”‚ 0.00% (estimated)                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    HNSW segment config changes     
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━┓
┃ Config Key           ┃ Old ┃ New ┃
┑━━━━━━━━━━━━━━━━━━━━━━╇━━━━━╇━━━━━┩
β”‚ hnsw:construction_ef β”‚ 100 β”‚ 200 β”‚
β”‚ hnsw:M               β”‚ 102 β”‚ 64  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

Are you sure you want to rebuild this index? [y/N]: y
Backup of old index created at smallc/0137d64b-8d71-42f5-b0d9-28716647b068_backup_20250208100514
    HNSW details for collection test in default_database database    
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric              ┃ Value                                       ┃
┑━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ Space               β”‚ cosine                                      β”‚
β”‚ Dimensions          β”‚ 384                                         β”‚
β”‚ EF Construction     β”‚ 200                                         β”‚
β”‚ EF Search           β”‚ 100                                         β”‚
β”‚ M                   β”‚ 64                                          β”‚
β”‚ Number of threads   β”‚ 16                                          β”‚
β”‚ Resize factor       β”‚ 1.2                                         β”‚
β”‚ Batch size          β”‚ 100                                         β”‚
β”‚ Sync threshold      β”‚ 1000                                        β”‚
β”‚ Segment ID          β”‚ 0137d64b-8d71-42f5-b0d9-28716647b068        β”‚
β”‚ Path                β”‚ smallc/0137d64b-8d71-42f5-b0d9-28716647b068 β”‚
β”‚ Has metadata        β”‚ True                                        β”‚
β”‚ Number of elements  β”‚ 20                                          β”‚
β”‚ Collection ID       β”‚ 97f5234e-d02a-43b8-9909-99447950c949        β”‚
β”‚ Index size          β”‚ 41.6KiB                                     β”‚
β”‚ Fragmentation level β”‚ 0.00%                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
HNSW Configuration

What it does: Configures the HNSW index for your Chroma database.

Why it's useful: This command is useful for configuring the HNSW index for your Chroma database.

chops hnsw config /path/to/persist_dir --collection <collection_name>

Options:

  • --search-ef (-e) - the search ef to use for the index.
  • --num-threads (-t) - the number of threads to use for the index.
  • --resize-factor (-r) - the resize factor to use for the index.
  • --batch-size (-b) - the batch size to use for the index.
  • --sync-threshold (-s) - the sync threshold to use for the index.

Example output:

chops hnsw config smallc -c test --search-ef 100
ChromaDB version: 0.6.2
 HNSW segment config changes  
┏━━━━━━━━━━━━━━━━┳━━━━━┳━━━━━┓
┃ Config Key     ┃ Old ┃ New ┃
┑━━━━━━━━━━━━━━━━╇━━━━━╇━━━━━┩
β”‚ hnsw:search_ef β”‚ 110 β”‚ 100 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

Are you sure you want to apply these changes? [y/N]: y
HNSW index configuration modified successfully

Collection Maintenance

Collection Snapshot

What it does: Creates a snapshot of a collection. The snapshots are self-contained sqlite3 files.

Why it's useful: The command is useful if you want to create a backup or a point-in-time copy of a collection in its entirety. The snapshot files are self-contained and use sqlite3 as a storage engine. You can use sqlite3 commands to inspect the snapshot files.

chops collection snapshot /path/to/persist_dir --collection <collection_name> -o /path/to/snapshot.sqlite3

Additional options:

  • --yes (-y) - skip confirmation prompt (default: False, prompt will be shown)
  • --collection (-c) - the collection name
  • --output (-o) - the path to the output snapshot file

Example output:

chops collection snapshot ./smallc --collection test -o snapshot.sqlite3
ChromaDB version: 0.6.2

Are you sure you want to overwrite /Users/tazarov/experiments/chroma/chromadb-ops/snapshot.sqlite3 file? [y/N]: y
Bootstrapping snapshot database...
Snapshot database bootstrapped in /Users/tazarov/experiments/chroma/chromadb-ops/snapshot.sqlite3
Copying collection test to snapshot database...
  Copying collection to snapshot   
            database...            
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Table                   ┃ Count ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
β”‚ Embeddings Queue        β”‚ 20    β”‚
β”‚ Max Seq ID              β”‚ 1     β”‚
β”‚ Embeddings              β”‚ 20    β”‚
β”‚ Embedding Metadata      β”‚ 20    β”‚
β”‚ Segments                β”‚ 2     β”‚
β”‚ Segment Metadata        β”‚ 3     β”‚
β”‚ Collections             β”‚ 1     β”‚
β”‚ Collection Metadata     β”‚ 0     β”‚
β”‚ HNSW Segment Data Files β”‚ 5     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Are you sure you want to copy this collection to the snapshot database? [y/N]: y
Collection test copied to snapshot database in /Users/tazarov/experiments/chroma/chromadb-ops/snapshot.sqlite3