How to update embeddings in vector store
Quick answer
To update embeddings in a
vector store, first generate new embeddings for the updated or new documents using an embedding model like text-embedding-3-small. Then, replace or upsert these new vectors into the vector store, ensuring you remove or overwrite the old embeddings for the same documents to keep the index consistent.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install faiss-cpu (or another vector store client)
Setup
Install the required Python packages and set your OpenAI API key as an environment variable.
pip install openai faiss-cpu Step by step
This example shows how to update embeddings in a FAISS vector store by re-embedding documents and replacing old vectors.
import os
from openai import OpenAI
import faiss
import numpy as np
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example documents with IDs
documents = {
"doc1": "The quick brown fox jumps over the lazy dog.",
"doc2": "Artificial intelligence is transforming software development."
}
# Function to get embeddings for a list of texts
# Using OpenAI text-embedding-3-small model
def get_embeddings(texts):
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
return [data.embedding for data in response.data]
# Create or load FAISS index (dim=1536 for text-embedding-3-small)
dim = 1536
index = faiss.IndexFlatL2(dim) # simple flat index
# Map document IDs to index positions
id_to_pos = {}
# Initial embedding and indexing
texts = list(documents.values())
embeddings = get_embeddings(texts)
for i, (doc_id, embedding) in enumerate(zip(documents.keys(), embeddings)):
vector = np.array(embedding, dtype=np.float32).reshape(1, -1)
index.add(vector)
id_to_pos[doc_id] = i
print(f"Indexed {index.ntotal} vectors.")
# --- Update embeddings for doc1 ---
updated_documents = {
"doc1": "The quick brown fox jumps over the very lazy dog."
}
# Get new embeddings
updated_texts = list(updated_documents.values())
updated_embeddings = get_embeddings(updated_texts)
# Replace old vector for doc1
# FAISS does not support direct update, so we rebuild index
# Remove old doc1 from documents
for doc_id in updated_documents.keys():
if doc_id in documents:
del documents[doc_id]
# Add updated docs
documents.update(updated_documents)
# Re-embed all documents
all_texts = list(documents.values())
all_embeddings = get_embeddings(all_texts)
# Rebuild index
index.reset()
id_to_pos.clear()
for i, (doc_id, embedding) in enumerate(zip(documents.keys(), all_embeddings)):
vector = np.array(embedding, dtype=np.float32).reshape(1, -1)
index.add(vector)
id_to_pos[doc_id] = i
print(f"Updated index now has {index.ntotal} vectors.") output
Indexed 2 vectors. Updated index now has 2 vectors.
Common variations
You can use other vector stores like Chroma or FAISS with persistent storage. For async embedding calls, use async SDK methods if available. Different embedding models can be used depending on your accuracy and speed needs.
Troubleshooting
- If your vector store does not support direct updates, rebuild the index after removing old vectors.
- Ensure embedding dimensions match the vector store index dimension.
- Check API rate limits when embedding many documents.
Key Takeaways
- Always generate new embeddings for updated or new documents before updating the vector store.
- Most vector stores require rebuilding or upserting vectors to update embeddings; direct in-place updates are rare.
- Keep track of document IDs to correctly replace or remove old embeddings.
- Match embedding model dimension with your vector store index dimension to avoid errors.
- Use batching and handle API rate limits when embedding large document sets.