How to build a document QA system with OpenAI
Quick answer
Use OpenAI embeddings to vectorize document chunks and store them in a vector database like FAISS. Then query the vector store with user questions and pass retrieved context to a
gpt-4o chat completion for accurate document-based answers.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai faiss-cpu numpy
Setup
Install required packages and set your OpenAI API key as an environment variable.
- Install packages:
pip install openai faiss-cpu numpy - Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key_here'
pip install openai faiss-cpu numpy Step by step
This example loads a text document, splits it into chunks, creates embeddings with OpenAI's o1-mini model, indexes them with FAISS, and answers questions by retrieving relevant chunks and querying gpt-4o.
import os
import numpy as np
import faiss
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample document text
document_text = """OpenAI develops powerful AI models. Document question answering allows users to query large texts efficiently. This example shows how to build a QA system."""
# Split document into chunks (simple split by sentences here)
chunks = [chunk.strip() for chunk in document_text.split('.') if chunk.strip()]
# Create embeddings for each chunk using OpenAI's embedding model
embeddings = []
for chunk in chunks:
response = client.embeddings.create(
model="o1-mini",
input=chunk
)
embeddings.append(response.data[0].embedding)
embeddings = np.array(embeddings).astype('float32')
# Build FAISS index
dimension = len(embeddings[0])
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)
# Function to answer questions
def answer_question(question):
# Embed the question
q_embedding_resp = client.embeddings.create(
model="o1-mini",
input=question
)
q_embedding = np.array(q_embedding_resp.data[0].embedding).astype('float32')
q_embedding = np.expand_dims(q_embedding, axis=0)
# Search FAISS for top 2 relevant chunks
D, I = index.search(q_embedding, k=2)
relevant_chunks = [chunks[i] for i in I[0]]
# Prepare prompt with context
context = "\n".join(relevant_chunks)
messages = [
{"role": "system", "content": "You are a helpful assistant answering questions based on provided document context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
]
# Query GPT-4o for answer
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return response.choices[0].message.content
# Example usage
question = "What does OpenAI develop?"
answer = answer_question(question)
print("Q:", question)
print("A:", answer) output
Q: What does OpenAI develop? A: OpenAI develops powerful AI models.
Common variations
You can enhance this system by:
- Using async calls with
asyncioand OpenAI's async client. - Streaming partial answers from
gpt-4ofor faster UX. - Switching embedding models (e.g.,
o1for higher quality). - Using other vector stores like Chroma or Pinecone for scalability.
Troubleshooting
If you get empty or irrelevant answers:
- Check your document chunking strategy; too large or too small chunks reduce retrieval quality.
- Verify your API key is set correctly in
os.environ["OPENAI_API_KEY"]. - Ensure you use the correct model names
o1-minifor embeddings andgpt-4ofor chat completions. - Monitor API usage limits and errors in your console or logs.
Key Takeaways
- Use OpenAI embeddings to vectorize document chunks for efficient retrieval.
- Combine FAISS vector search with
gpt-4ochat completions for accurate answers. - Always split documents into meaningful chunks to improve context relevance.
- Keep API keys secure and use environment variables for all calls.
- Experiment with different embedding models and vector stores for scalability.