How to Intermediate · 4 min read

How to build document QA system

Quick answer
Build a document QA system by embedding your documents with OpenAI embeddings, storing vectors in a vector database like FAISS, and querying with a chat model such as gpt-4o-mini to answer questions based on retrieved context. Use OpenAI SDK for embeddings and chat completions to implement retrieval-augmented generation.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 langchain-openai langchain-community faiss-cpu

Setup

Install required Python packages and set your OPENAI_API_KEY environment variable.

  • Install packages: pip install openai langchain-openai langchain-community faiss-cpu
  • Export your API key: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or set in your environment variables on Windows.
bash
pip install openai langchain-openai langchain-community faiss-cpu

Step by step

This example loads documents, creates embeddings with OpenAI, indexes them with FAISS, and queries with gpt-4o-mini for answers.

python
import os
from openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
from langchain_core.prompts import ChatPromptTemplate

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Load documents from text files
loader = TextLoader("./docs/example.txt")
docs = loader.load()

# Create embeddings
embeddings = OpenAIEmbeddings(client=client, model="text-embedding-3-small")

# Build FAISS index
index = FAISS.from_documents(docs, embeddings)

# Define a function to answer questions

def answer_question(question: str) -> str:
    # Retrieve relevant docs
    docs = index.similarity_search(question, k=3)
    context = "\n".join([doc.page_content for doc in docs])

    # Prepare chat messages
    messages = [
        {"role": "system", "content": "You are a helpful assistant answering questions based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ]

    # Query chat completion
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

# Example usage
question = "What is the main topic of the document?"
answer = answer_question(question)
print("Answer:", answer)
output
Answer: The main topic of the document is ...

Common variations

You can adapt the system by:

  • Using async calls with asyncio and await client.chat.completions.acreate(...).
  • Switching to streaming responses for real-time output.
  • Using different embedding models like text-embedding-3-large for better accuracy.
  • Replacing FAISS with other vector stores like Chroma or Pinecone.
python
import asyncio

async def async_answer_question(question: str) -> str:
    docs = index.similarity_search(question, k=3)
    context = "\n".join([doc.page_content for doc in docs])
    messages = [
        {"role": "system", "content": "You are a helpful assistant answering questions based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    answer = ""
    async for chunk in response:
        answer += chunk.choices[0].delta.content or ""
    return answer

# Run async example
question = "Summarize the document."
answer = asyncio.run(async_answer_question(question))
print("Async answer:", answer)
output
Async answer: The document summarizes ...

Troubleshooting

  • If you get empty or irrelevant answers, ensure your documents are properly loaded and indexed.
  • Check your OPENAI_API_KEY environment variable is set correctly.
  • For large documents, increase k in similarity_search to retrieve more context.
  • If you hit rate limits, consider adding retry logic or upgrading your API plan.

Key Takeaways

  • Use OpenAIEmbeddings to vectorize documents for semantic search.
  • Store embeddings in a vector database like FAISS for fast retrieval.
  • Query with gpt-4o-mini chat completions using retrieved context for accurate answers.
  • Async and streaming APIs improve responsiveness for interactive applications.
  • Proper environment setup and document preprocessing are critical for success.
Verified 2026-04 · gpt-4o-mini, text-embedding-3-small
Verify ↗