How to Intermediate · 4 min read

How to build a question answering system over documents

Quick answer
Build a question answering system over documents by first converting documents into vector embeddings using models like OpenAIEmbeddings, then storing them in a vector database such as FAISS. Query the system by embedding the question, retrieving relevant document chunks, and using an LLM like gpt-4o to generate answers based on the retrieved context.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai langchain langchain_community faiss-cpu

Setup

Install required Python packages and set your OpenAI API key as an environment variable.

bash
pip install openai langchain langchain_community faiss-cpu

Step by step

This example loads text documents, creates embeddings, stores them in a FAISS vector store, and queries with an LLM to answer questions based on document content.

python
import os
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
from openai import OpenAI

# Load documents
loader = TextLoader("./docs/sample.txt")
docs = loader.load()

# Create embeddings
embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# Build FAISS vector store
vectorstore = FAISS.from_documents(docs, embeddings)

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Query function

def answer_question(question: str) -> str:
    # Embed question and retrieve relevant docs
    relevant_docs = vectorstore.similarity_search(question, k=3)
    context = "\n\n".join([doc.page_content for doc in relevant_docs])

    # Prepare prompt
    prompt_template = """
You are a helpful assistant. Use the following context to answer the question.

Context:\n{context}\n\nQuestion: {question}\nAnswer:"""
    prompt = prompt_template.format(context=context, question=question)

    # Call LLM
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Example usage
question = "What is the main topic of the documents?"
answer = answer_question(question)
print("Q:", question)
print("A:", answer)
output
Q: What is the main topic of the documents?
A: [LLM-generated answer based on document content]

Common variations

  • Use claude-3-5-sonnet-20241022 from Anthropic instead of OpenAI for better coding or reasoning.
  • Implement async calls for higher throughput in production.
  • Use chunking and overlap strategies to improve retrieval quality on large documents.
  • Swap FAISS with other vector stores like Chroma or Weaviate depending on scale and features.

Troubleshooting

  • If retrieval returns irrelevant results, increase k in similarity_search or improve document chunking.
  • If API calls fail, verify your OPENAI_API_KEY environment variable is set correctly.
  • For slow responses, consider caching embeddings or using smaller models like gpt-4o-mini.

Key Takeaways

  • Convert documents into vector embeddings to enable semantic search.
  • Use a vector database like FAISS to efficiently retrieve relevant document chunks.
  • Feed retrieved context plus the question to an LLM for accurate answers.
  • Experiment with different models and vector stores to optimize performance.
  • Proper chunking and overlap improve retrieval relevance and answer quality.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗