How to beginner · 3 min read

How to add files to OpenAI vector store

Quick answer
To add files to an OpenAI vector store, first load and split the file content, then generate embeddings using OpenAIEmbeddings, and finally index these embeddings with a vector store like FAISS or Chroma. Use the langchain_openai and langchain_community libraries to streamline this process.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 langchain_openai langchain_community faiss-cpu

Setup

Install the required Python packages and set your OpenAI API key as an environment variable.

  • Install packages: pip install openai langchain_openai langchain_community faiss-cpu
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai langchain_openai langchain_community faiss-cpu

Step by step

This example shows how to load a text file, split it into chunks, generate embeddings with OpenAI's gpt-4o model, and add them to a FAISS vector store.

python
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
from langchain_core.text_splitter import RecursiveCharacterTextSplitter

# Load your OpenAI API key from environment
_ = os.environ["OPENAI_API_KEY"]

# Load and split the file
loader = TextLoader("example.txt")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = text_splitter.split_documents(docs)

# Generate embeddings
embeddings = OpenAIEmbeddings(model="gpt-4o")

# Create FAISS vector store from documents
vector_store = FAISS.from_documents(split_docs, embeddings)

# Save the vector store locally
vector_store.save_local("faiss_index")

print(f"Added {len(split_docs)} chunks to the vector store and saved to 'faiss_index' folder.")
output
Added 10 chunks to the vector store and saved to 'faiss_index' folder.

Common variations

  • Use Chroma instead of FAISS for persistent vector storage with a database backend.
  • For PDFs, use PyPDFLoader from langchain_community.document_loaders instead of TextLoader.
  • Use different embedding models like gpt-4o-mini for faster, cheaper embeddings.
  • Async usage is not supported in langchain_openai embeddings currently.
python
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("example.pdf")
docs = loader.load()
# Then proceed with splitting and embedding as above

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If embeddings fail, check your network connection and API usage limits.
  • For large files, increase chunk_size or reduce chunk_overlap to optimize performance.

Key Takeaways

  • Use TextLoader or PyPDFLoader to load files before embedding.
  • Split large documents into chunks with RecursiveCharacterTextSplitter for better embedding quality.
  • Generate embeddings with OpenAIEmbeddings using a current model like gpt-4o.
  • Store embeddings in vector stores like FAISS or Chroma for efficient similarity search.
  • Always set your API key in os.environ["OPENAI_API_KEY"] to authenticate requests.
Verified 2026-04 · gpt-4o
Verify ↗