How to beginner · 3 min read

How to create embeddings for documents in LangChain

Quick answer
Use OpenAIEmbeddings from langchain_openai to create embeddings for documents in LangChain. Load your documents with a loader like TextLoader, then call embedder.embed_documents() on the document texts to get vector embeddings.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install langchain_openai langchain_community

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Run pip install langchain_openai langchain_community
  • Set environment variable OPENAI_API_KEY with your OpenAI API key
bash
pip install langchain_openai langchain_community

Step by step

This example loads a text document, creates embeddings using OpenAI's text-embedding-3-large model via LangChain, and prints the resulting vectors.

python
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader

# Load document text
loader = TextLoader("example.txt")
docs = loader.load()

# Initialize OpenAI embeddings client
embedder = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"], model_name="text-embedding-3-large")

# Extract texts from documents
texts = [doc.page_content for doc in docs]

# Generate embeddings
embeddings = embedder.embed_documents(texts)

print(f"Generated {len(embeddings)} embeddings.")
print("First embedding vector sample:", embeddings[0][:5])
output
Generated 1 embeddings.
First embedding vector sample: [0.0023, -0.0017, 0.0045, 0.0031, -0.0028]

Common variations

  • Use other loaders like PyPDFLoader for PDFs.
  • Switch to embedder.embed_query() for single text queries.
  • Use different embedding models by changing model_name.
  • Async embedding calls are supported with LangChain async methods.

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • For rate limit errors, consider retrying after some time or reducing batch size.
  • If embeddings are empty, check that documents are loaded properly and contain text.

Key Takeaways

  • Use OpenAIEmbeddings from langchain_openai to generate document embeddings easily.
  • Load documents with loaders like TextLoader or PyPDFLoader before embedding.
  • Pass a list of document texts to embed_documents() for batch embedding.
  • Set your OpenAI API key in os.environ["OPENAI_API_KEY"] to authenticate.
  • You can switch embedding models by changing the model_name parameter.
Verified 2026-04 · text-embedding-3-large
Verify ↗