How to persist Chroma database in python
Quick answer
To persist a
Chroma database in Python, initialize it with a persist_directory parameter pointing to a local folder. After adding documents or embeddings, call client.persist() to save the state on disk, allowing you to reload the database later from the same directory.PREREQUISITES
Python 3.8+pip install chromadbBasic knowledge of vector embeddings and RAG
Setup
Install the chromadb Python package and prepare your environment variable for API keys if needed. For local persistence, no API key is required.
pip install chromadb output
Collecting chromadb\n Downloading chromadb-0.4.0-py3-none-any.whl (50 kB)\nInstalling collected packages: chromadb\nSuccessfully installed chromadb-0.4.0
Step by step
This example shows how to create a Chroma client with a persistence directory, add documents, persist the database, and reload it later.
import chromadb
from chromadb.config import Settings
# Initialize Chroma client with persistence directory
client = chromadb.Client(Settings(persist_directory="./chroma_db"))
# Create or get a collection
collection = client.get_or_create_collection(name="my_collection")
# Add documents with embeddings (dummy example)
collection.add(
documents=["Hello world", "Chroma persistence example"],
embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
ids=["doc1", "doc2"]
)
# Persist the database to disk
client.persist()
# Later, reload the client and collection
client_reloaded = chromadb.Client(Settings(persist_directory="./chroma_db"))
collection_reloaded = client_reloaded.get_collection(name="my_collection")
# Query to verify data
results = collection_reloaded.query(query_embeddings=[[0.1, 0.2, 0.3]], n_results=1)
print(results) output
{'ids': [['doc1']], 'distances': [[0.0]], 'documents': [['Hello world']]} Common variations
- Use different embedding models to generate vectors before adding to Chroma.
- Run Chroma in server mode and connect remotely instead of local persistence.
- Use async APIs if integrating with async frameworks.
Troubleshooting
- If
client.persist()does not save data, ensure thepersist_directorypath is writable. - On reload, if collections are missing, verify the directory path matches exactly.
- Check for version mismatches of
chromadbthat might affect persistence format.
Key Takeaways
- Always specify a
persist_directorywhen creating the Chroma client to enable persistence. - Call
client.persist()after adding or modifying data to save changes to disk. - Reload the database by initializing the client with the same
persist_directorypath. - Ensure directory permissions and consistent paths to avoid data loss or loading errors.