How to Intermediate · 3 min read

How to build a document QA system with OpenAI

Quick answer
Use OpenAI embeddings to vectorize document chunks and store them in a vector database like FAISS. Then query the vector store with user questions and pass retrieved context to a gpt-4o chat completion for accurate document-based answers.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai faiss-cpu numpy

Setup

Install required packages and set your OpenAI API key as an environment variable.

  • Install packages: pip install openai faiss-cpu numpy
  • Set environment variable in your shell: export OPENAI_API_KEY='your_api_key_here'
bash
pip install openai faiss-cpu numpy

Step by step

This example loads a text document, splits it into chunks, creates embeddings with OpenAI's o1-mini model, indexes them with FAISS, and answers questions by retrieving relevant chunks and querying gpt-4o.

python
import os
import numpy as np
import faiss
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample document text
document_text = """OpenAI develops powerful AI models. Document question answering allows users to query large texts efficiently. This example shows how to build a QA system."""

# Split document into chunks (simple split by sentences here)
chunks = [chunk.strip() for chunk in document_text.split('.') if chunk.strip()]

# Create embeddings for each chunk using OpenAI's embedding model
embeddings = []
for chunk in chunks:
    response = client.embeddings.create(
        model="o1-mini",
        input=chunk
    )
    embeddings.append(response.data[0].embedding)

embeddings = np.array(embeddings).astype('float32')

# Build FAISS index
dimension = len(embeddings[0])
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

# Function to answer questions

def answer_question(question):
    # Embed the question
    q_embedding_resp = client.embeddings.create(
        model="o1-mini",
        input=question
    )
    q_embedding = np.array(q_embedding_resp.data[0].embedding).astype('float32')
    q_embedding = np.expand_dims(q_embedding, axis=0)

    # Search FAISS for top 2 relevant chunks
    D, I = index.search(q_embedding, k=2)
    relevant_chunks = [chunks[i] for i in I[0]]

    # Prepare prompt with context
    context = "\n".join(relevant_chunks)
    messages = [
        {"role": "system", "content": "You are a helpful assistant answering questions based on provided document context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ]

    # Query GPT-4o for answer
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
    return response.choices[0].message.content

# Example usage
question = "What does OpenAI develop?"
answer = answer_question(question)
print("Q:", question)
print("A:", answer)
output
Q: What does OpenAI develop?
A: OpenAI develops powerful AI models.

Common variations

You can enhance this system by:

  • Using async calls with asyncio and OpenAI's async client.
  • Streaming partial answers from gpt-4o for faster UX.
  • Switching embedding models (e.g., o1 for higher quality).
  • Using other vector stores like Chroma or Pinecone for scalability.

Troubleshooting

If you get empty or irrelevant answers:

  • Check your document chunking strategy; too large or too small chunks reduce retrieval quality.
  • Verify your API key is set correctly in os.environ["OPENAI_API_KEY"].
  • Ensure you use the correct model names o1-mini for embeddings and gpt-4o for chat completions.
  • Monitor API usage limits and errors in your console or logs.

Key Takeaways

  • Use OpenAI embeddings to vectorize document chunks for efficient retrieval.
  • Combine FAISS vector search with gpt-4o chat completions for accurate answers.
  • Always split documents into meaningful chunks to improve context relevance.
  • Keep API keys secure and use environment variables for all calls.
  • Experiment with different embedding models and vector stores for scalability.
Verified 2026-04 · gpt-4o, o1-mini
Verify ↗