How to create embeddings for documents in python
Direct answer
Use the
OpenAIEmbeddings class from langchain_openai or the client.embeddings.create method from the OpenAI SDK to convert documents into vector embeddings in Python.Setup
Install
pip install openai langchain langchain_community Env vars
OPENAI_API_KEYANTHROPIC_API_KEY Imports
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
import anthropic Examples
inA short text document: 'Python is a popular programming language.'
outA vector embedding array representing the semantic content of the text.
inMultiple documents loaded from a folder with TextLoader
outA FAISS index containing embeddings for all documents, ready for similarity search.
inEmpty or very short document text
outAn embedding vector with minimal semantic information, possibly all zeros or near zero.
Integration steps
- Install required packages and set your OPENAI_API_KEY and ANTHROPIC_API_KEY in environment variables
- Load your documents using a loader like TextLoader or read raw text
- Initialize the OpenAIEmbeddings client from langchain_openai
- Generate embeddings by passing document texts to the embedding client
- Optionally, store embeddings in a vector store like FAISS for efficient retrieval
Full code
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
# Load documents from a local text file
loader = TextLoader("./documents/sample.txt")
docs = loader.load()
# Initialize embeddings client
embeddings = OpenAIEmbeddings()
# Create embeddings for each document
texts = [doc.page_content for doc in docs]
vectors = embeddings.embed_documents(texts)
# Build a FAISS vector store from embeddings
index = FAISS.from_texts(texts, embeddings)
print(f"Created embeddings for {len(texts)} documents.")
print(f"Sample embedding vector (first document): {vectors[0][:5]}...") output
Created embeddings for 3 documents. Sample embedding vector (first document): [0.0123, -0.0456, 0.0789, -0.0345, 0.0567]...
API trace
Request
{"model": "text-embedding-3-large", "input": ["document text 1", "document text 2"]} Response
{"data": [{"embedding": [0.01, -0.02, ...]}, {"embedding": [0.03, 0.04, ...]}], "usage": {"total_tokens": 50}} Extract
response.data[0].embedding for the first document embedding vectorVariants
Streaming embeddings generation ›
Use when embedding large batches of documents to reduce memory usage and latency.
import os
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
texts = ["Doc 1 text", "Doc 2 text"]
for vector in embeddings.embed_documents_stream(texts):
print(f"Streaming embedding vector: {vector[:5]}...") Async embeddings generation ›
Use in asynchronous Python applications to embed documents concurrently.
import os
import asyncio
from langchain_openai import OpenAIEmbeddings
async def main():
embeddings = OpenAIEmbeddings()
texts = ["Async doc 1", "Async doc 2"]
vectors = await embeddings.aembed_documents(texts)
print(f"Async embeddings: {vectors[0][:5]}...")
asyncio.run(main()) Using Anthropic Claude embeddings ›
Use if you prefer Anthropic's Claude embeddings for potentially better semantic quality.
import os
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.embeddings.create(
model="claude-3-5-haiku-20241022",
input=["Document text to embed"]
)
print(f"Embedding vector: {response.data[0].embedding[:5]}...") Performance
Latency~500ms per 512 tokens for embedding generation with OpenAI
Cost~$0.0004 per 1,000 tokens embedded with OpenAI text-embedding-3-large
Rate limitsTier 1: 60 RPM / 60,000 TPM typical for embeddings endpoint
- Split large documents into smaller chunks before embedding
- Remove unnecessary boilerplate or metadata from text
- Batch multiple texts in a single API call to optimize usage
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| OpenAIEmbeddings (batch) | ~500ms | ~$0.0004 per 1k tokens | General purpose, easy integration |
| Streaming embeddings | Lower latency per doc | Same as batch | Large datasets, memory efficient |
| Anthropic Claude embeddings | ~600ms | Check Anthropic pricing | Higher semantic quality, coding tasks |
Quick tip
Batch multiple documents in one embedding API call to reduce latency and cost.
Common mistake
Passing raw documents without preprocessing or splitting can cause token limits to be exceeded.