How to Intermediate · 4 min read

How to build document QA system

Quick answer

Build a document QA system by embedding your documents with OpenAI embeddings, storing vectors in a vector database like FAISS, and querying with a chat model such as gpt-4o-mini to answer questions based on retrieved context. Use OpenAI SDK for embeddings and chat completions to implement retrieval-augmented generation.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 langchain-openai langchain-community faiss-cpu

Setup

Install required Python packages and set your OPENAI_API_KEY environment variable.

Install packages: pip install openai langchain-openai langchain-community faiss-cpu
Export your API key: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or set in your environment variables on Windows.

bash

pip install openai langchain-openai langchain-community faiss-cpu

Step by step

This example loads documents, creates embeddings with OpenAI, indexes them with FAISS, and queries with gpt-4o-mini for answers.

python

import os
from openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
from langchain_core.prompts import ChatPromptTemplate

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Load documents from text files
loader = TextLoader("./docs/example.txt")
docs = loader.load()

# Create embeddings
embeddings = OpenAIEmbeddings(client=client, model="text-embedding-3-small")

# Build FAISS index
index = FAISS.from_documents(docs, embeddings)

# Define a function to answer questions

def answer_question(question: str) -> str:
    # Retrieve relevant docs
    docs = index.similarity_search(question, k=3)
    context = "\n".join([doc.page_content for doc in docs])

    # Prepare chat messages
    messages = [
        {"role": "system", "content": "You are a helpful assistant answering questions based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ]

    # Query chat completion
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

# Example usage
question = "What is the main topic of the document?"
answer = answer_question(question)
print("Answer:", answer)

output

Answer: The main topic of the document is ...

Common variations

You can adapt the system by:

Using async calls with asyncio and await client.chat.completions.acreate(...).
Switching to streaming responses for real-time output.
Using different embedding models like text-embedding-3-large for better accuracy.
Replacing FAISS with other vector stores like Chroma or Pinecone.

python

import asyncio

async def async_answer_question(question: str) -> str:
    docs = index.similarity_search(question, k=3)
    context = "\n".join([doc.page_content for doc in docs])
    messages = [
        {"role": "system", "content": "You are a helpful assistant answering questions based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    answer = ""
    async for chunk in response:
        answer += chunk.choices[0].delta.content or ""
    return answer

# Run async example
question = "Summarize the document."
answer = asyncio.run(async_answer_question(question))
print("Async answer:", answer)

output

Async answer: The document summarizes ...

Troubleshooting

If you get empty or irrelevant answers, ensure your documents are properly loaded and indexed.
Check your OPENAI_API_KEY environment variable is set correctly.
For large documents, increase k in similarity_search to retrieve more context.
If you hit rate limits, consider adding retry logic or upgrading your API plan.

✅

Key Takeaways

Use OpenAIEmbeddings to vectorize documents for semantic search.
Store embeddings in a vector database like FAISS for fast retrieval.
Query with gpt-4o-mini chat completions using retrieved context for accurate answers.
Async and streaming APIs improve responsiveness for interactive applications.
Proper environment setup and document preprocessing are critical for success.

Verified 2026-04 · gpt-4o-mini, text-embedding-3-small

Verify ↗