How to create embeddings for documents in LangChain
Quick answer
Use
OpenAIEmbeddings from langchain_openai to create embeddings for documents in LangChain. Load your documents with a loader like TextLoader, then call embedder.embed_documents() on the document texts to get vector embeddings.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install langchain_openai langchain_community
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Run
pip install langchain_openai langchain_community - Set environment variable
OPENAI_API_KEYwith your OpenAI API key
pip install langchain_openai langchain_community Step by step
This example loads a text document, creates embeddings using OpenAI's text-embedding-3-large model via LangChain, and prints the resulting vectors.
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
# Load document text
loader = TextLoader("example.txt")
docs = loader.load()
# Initialize OpenAI embeddings client
embedder = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"], model_name="text-embedding-3-large")
# Extract texts from documents
texts = [doc.page_content for doc in docs]
# Generate embeddings
embeddings = embedder.embed_documents(texts)
print(f"Generated {len(embeddings)} embeddings.")
print("First embedding vector sample:", embeddings[0][:5]) output
Generated 1 embeddings. First embedding vector sample: [0.0023, -0.0017, 0.0045, 0.0031, -0.0028]
Common variations
- Use other loaders like
PyPDFLoaderfor PDFs. - Switch to
embedder.embed_query()for single text queries. - Use different embedding models by changing
model_name. - Async embedding calls are supported with LangChain async methods.
Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - For rate limit errors, consider retrying after some time or reducing batch size.
- If embeddings are empty, check that documents are loaded properly and contain text.
Key Takeaways
- Use
OpenAIEmbeddingsfromlangchain_openaito generate document embeddings easily. - Load documents with loaders like
TextLoaderorPyPDFLoaderbefore embedding. - Pass a list of document texts to
embed_documents()for batch embedding. - Set your OpenAI API key in
os.environ["OPENAI_API_KEY"]to authenticate. - You can switch embedding models by changing the
model_nameparameter.