How to Intermediate · 4 min read

How to build a question answering system over documents

Quick answer

Build a question answering system over documents by first converting documents into vector embeddings using models like OpenAIEmbeddings, then storing them in a vector database such as FAISS. Query the system by embedding the question, retrieving relevant document chunks, and using an LLM like gpt-4o to generate answers based on the retrieved context.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai langchain langchain_community faiss-cpu

Setup

Install required Python packages and set your OpenAI API key as an environment variable.

bash

pip install openai langchain langchain_community faiss-cpu

Step by step

This example loads text documents, creates embeddings, stores them in a FAISS vector store, and queries with an LLM to answer questions based on document content.

python

import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
from openai import OpenAI

# Load documents
loader = TextLoader("./docs/sample.txt")
docs = loader.load()

# Create embeddings
embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# Build FAISS vector store
vectorstore = FAISS.from_documents(docs, embeddings)

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Query function

def answer_question(question: str) -> str:
    # Embed question and retrieve relevant docs
    relevant_docs = vectorstore.similarity_search(question, k=3)
    context = "\n\n".join([doc.page_content for doc in relevant_docs])

    # Prepare prompt
    prompt_template = """
You are a helpful assistant. Use the following context to answer the question.

Context:\n{context}\n\nQuestion: {question}\nAnswer:"""
    prompt = prompt_template.format(context=context, question=question)

    # Call LLM
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Example usage
question = "What is the main topic of the documents?"
answer = answer_question(question)
print("Q:", question)
print("A:", answer)

output

Q: What is the main topic of the documents?
A: [LLM-generated answer based on document content]

Common variations

Use claude-3-5-sonnet-20241022 from Anthropic instead of OpenAI for better coding or reasoning.
Implement async calls for higher throughput in production.
Use chunking and overlap strategies to improve retrieval quality on large documents.
Swap FAISS with other vector stores like Chroma or Weaviate depending on scale and features.

Troubleshooting

If retrieval returns irrelevant results, increase k in similarity_search or improve document chunking.
If API calls fail, verify your OPENAI_API_KEY environment variable is set correctly.
For slow responses, consider caching embeddings or using smaller models like gpt-4o-mini.

✅

Key Takeaways

Convert documents into vector embeddings to enable semantic search.
Use a vector database like FAISS to efficiently retrieve relevant document chunks.
Feed retrieved context plus the question to an LLM for accurate answers.
Experiment with different models and vector stores to optimize performance.
Proper chunking and overlap improve retrieval relevance and answer quality.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗