How to beginner · 3 min read

How to use Chroma with OpenAI embeddings

Q: How to use Chroma with OpenAI embeddings

Use OpenAIEmbeddings from langchain_openai to generate vector embeddings with OpenAI models, then store and query these vectors in Chroma from langchain_community.vectorstores. This enables efficient semantic search and retrieval for RAG pipelines.

Quick answer

Use OpenAIEmbeddings from langchain_openai to generate vector embeddings with OpenAI models, then store and query these vectors in Chroma from langchain_community.vectorstores. This enables efficient semantic search and retrieval for RAG pipelines.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 langchain langchain_community chromadb

Setup

Install required packages and set your OpenAI API key as an environment variable.

bash

pip install openai langchain langchain_community chromadb

Step by step

This example shows how to embed documents using OpenAI embeddings and store them in Chroma for semantic search.

python

import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Set your OpenAI API key in environment variable before running
# export OPENAI_API_KEY='your_api_key'

# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# Sample documents to embed
texts = [
    "Chroma is a vector database for embeddings.",
    "OpenAI provides powerful embedding models.",
    "Retrieval-augmented generation improves LLM responses."
]

# Create Chroma vector store and add documents
vectordb = Chroma.from_texts(texts=texts, embedding=embeddings, collection_name="example_collection")

# Query vector store with a semantic search
query = "What is Chroma?"
results = vectordb.similarity_search(query, k=2)

print("Top results:")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")

output

Top results:
1. Chroma is a vector database for embeddings.
2. Retrieval-augmented generation improves LLM responses.

Common variations

Use different OpenAI embedding models by passing model_name to OpenAIEmbeddings, e.g., model_name='text-embedding-3-large'.
Use async calls with LangChain's async support for embeddings and Chroma.
Switch to other vector stores like FAISS by changing the import and initialization.

Troubleshooting

If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
If Chroma fails to start, ensure chromadb is installed and compatible with your Python version.
For slow queries, check your embedding model choice and batch your embedding requests.

Key Takeaways

Use OpenAIEmbeddings to generate embeddings compatible with Chroma vector store.
Store and query documents in Chroma for efficient semantic search in RAG applications.
Set your OpenAI API key in environment variables to authenticate embedding requests.
You can customize embedding models and switch vector stores easily with LangChain.
Troubleshoot common issues by verifying API keys and package installations.

Verified 2026-04 · gpt-4o, text-embedding-3-large

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.