How to use Chroma DB in python
Direct answer
Use the
chromadb Python client to create a collection, add documents with embeddings, and query them for retrieval-augmented generation workflows.Setup
Install
pip install chromadb openai Env vars
OPENAI_API_KEY Imports
import os
import chromadb
from chromadb.config import Settings
from openai import OpenAI Examples
inAdd documents about AI and query for 'What is RAG?'
outReturns relevant documents explaining Retrieval-Augmented Generation.
inInsert product descriptions and query 'Best laptops under $1000'
outReturns documents describing laptops priced under $1000.
inAdd empty documents and query 'Hello'
outReturns no results or empty list.
Integration steps
- Install the chromadb and openai Python packages.
- Initialize the Chroma client with desired settings.
- Create or get a collection to store documents and embeddings.
- Generate embeddings for your documents using an embedding model (e.g., OpenAI).
- Add documents and their embeddings to the collection.
- Query the collection with an embedding of the user query to retrieve relevant documents.
Full code
import os
import chromadb
from chromadb.config import Settings
from openai import OpenAI
# Initialize OpenAI client for embeddings
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Initialize Chroma client
client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory="./chroma_db"))
# Create or get collection
collection = client.get_or_create_collection(name="my_collection")
# Sample documents
documents = [
"Retrieval-Augmented Generation (RAG) combines retrieval with language models.",
"Chroma DB is an open-source vector database for embeddings.",
"Python is a popular programming language for AI development."
]
# Function to get embeddings from OpenAI
# Using text-embedding-3-large as example
def get_embedding(text):
response = openai_client.embeddings.create(
model="text-embedding-3-large",
input=text
)
return response.data[0].embedding
# Generate embeddings for documents
embeddings = [get_embedding(doc) for doc in documents]
# Add documents and embeddings to collection
collection.add(
documents=documents,
embeddings=embeddings,
ids=[f"doc{i}" for i in range(len(documents))]
)
# Query example
query = "What is RAG?"
query_embedding = get_embedding(query)
results = collection.query(
query_embeddings=[query_embedding],
n_results=2
)
print("Query Results:")
for doc in results['documents'][0]:
print(f"- {doc}") output
Query Results: - Retrieval-Augmented Generation (RAG) combines retrieval with language models. - Chroma DB is an open-source vector database for embeddings.
API trace
Request
{"model": "text-embedding-3-large", "input": "<text>"} for embedding generation; {"collection_name": "my_collection", "documents": [...], "embeddings": [...], "ids": [...]} for adding; {"query_embeddings": [...], "n_results": 2} for querying Response
{"data": [{"embedding": [...] }]} for embeddings; {"ids": [...], "documents": [...], "distances": [...]} for query results Extract
For embeddings: response.data[0].embedding; For query results: response['documents'][0]Variants
Streaming query results ›
Use when you want to process or display query results incrementally for better UX.
import os
import chromadb
from chromadb.config import Settings
from openai import OpenAI
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory="./chroma_db"))
collection = client.get_or_create_collection(name="my_collection")
def get_embedding(text):
response = openai_client.embeddings.create(model="text-embedding-3-large", input=text)
return response.data[0].embedding
query = "Explain Chroma DB"
query_embedding = get_embedding(query)
# Streaming is not natively supported by chromadb but can be simulated by chunked queries or async calls
results = collection.query(query_embeddings=[query_embedding], n_results=3)
for doc in results['documents'][0]:
print(doc) Async version using asyncio ›
Use for concurrent embedding requests or when integrating into async Python apps.
import os
import asyncio
import chromadb
from chromadb.config import Settings
from openai import OpenAI
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory="./chroma_db"))
collection = client.get_or_create_collection(name="my_collection")
async def get_embedding_async(text):
response = await openai_client.embeddings.acreate(model="text-embedding-3-large", input=text)
return response.data[0].embedding
async def main():
query = "What is RAG?"
query_embedding = await get_embedding_async(query)
results = collection.query(query_embeddings=[query_embedding], n_results=2)
print("Async Query Results:")
for doc in results['documents'][0]:
print(f"- {doc}")
asyncio.run(main()) Alternative embedding model ›
Use when you want to reduce cost or speed up embedding generation at some accuracy tradeoff.
import os
import chromadb
from chromadb.config import Settings
from openai import OpenAI
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory="./chroma_db"))
collection = client.get_or_create_collection(name="my_collection")
def get_embedding(text):
# Use a smaller or cheaper embedding model
response = openai_client.embeddings.create(model="text-embedding-3-small", input=text)
return response.data[0].embedding
# Rest of the code same as main example Performance
Latency~800ms for embedding generation + ~100ms for Chroma DB query
Cost~$0.0004 per 1K tokens for OpenAI embeddings (text-embedding-3-large)
Rate limitsOpenAI default: 350 RPM, 60,000 TPM; Chroma DB is local and limited by disk/CPU
- Batch multiple texts in one embedding request to reduce overhead.
- Use smaller embedding models for less critical queries.
- Cache embeddings locally to avoid repeated API calls.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard sync | ~900ms | ~$0.0004 | Simple scripts and demos |
| Async embedding calls | ~700ms | ~$0.0004 | Concurrent or high-throughput apps |
| Smaller embedding model | ~600ms | ~$0.0001 | Cost-sensitive or fast prototyping |
Quick tip
Persist your Chroma DB collection to disk to avoid re-indexing embeddings on every run.
Common mistake
Not generating embeddings before adding documents to Chroma DB, causing empty or invalid vector entries.
Community Notes
No notes yetBe the first to share a version-specific fix or tip.