How to use Chroma with OpenAI embeddings
Quick answer
Use
OpenAIEmbeddings from langchain_openai to generate vector embeddings with OpenAI models, then store and query these vectors in Chroma from langchain_community.vectorstores. This enables efficient semantic search and retrieval for RAG pipelines.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 langchain langchain_community chromadb
Setup
Install required packages and set your OpenAI API key as an environment variable.
pip install openai langchain langchain_community chromadb Step by step
This example shows how to embed documents using OpenAI embeddings and store them in Chroma for semantic search.
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
# Set your OpenAI API key in environment variable before running
# export OPENAI_API_KEY='your_api_key'
# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
# Sample documents to embed
texts = [
"Chroma is a vector database for embeddings.",
"OpenAI provides powerful embedding models.",
"Retrieval-augmented generation improves LLM responses."
]
# Create Chroma vector store and add documents
vectordb = Chroma.from_texts(texts=texts, embedding=embeddings, collection_name="example_collection")
# Query vector store with a semantic search
query = "What is Chroma?"
results = vectordb.similarity_search(query, k=2)
print("Top results:")
for i, doc in enumerate(results, 1):
print(f"{i}. {doc.page_content}") output
Top results: 1. Chroma is a vector database for embeddings. 2. Retrieval-augmented generation improves LLM responses.
Common variations
- Use different OpenAI embedding models by passing
model_nametoOpenAIEmbeddings, e.g.,model_name='text-embedding-3-large'. - Use async calls with LangChain's async support for embeddings and Chroma.
- Switch to other vector stores like FAISS by changing the import and initialization.
Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If Chroma fails to start, ensure
chromadbis installed and compatible with your Python version. - For slow queries, check your embedding model choice and batch your embedding requests.
Key Takeaways
- Use
OpenAIEmbeddingsto generate embeddings compatible with Chroma vector store. - Store and query documents in Chroma for efficient semantic search in RAG applications.
- Set your OpenAI API key in environment variables to authenticate embedding requests.
- You can customize embedding models and switch vector stores easily with LangChain.
- Troubleshoot common issues by verifying API keys and package installations.