How to Intermediate · 4 min read

How to build study assistant with RAG

Quick answer
Build a study assistant with RAG by combining a vector database for document retrieval with a large language model (LLM) like gpt-4o. First, embed study materials using OpenAI embeddings, then query relevant context to augment LLM responses for accurate, context-aware answers.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install faiss-cpu
  • pip install numpy

Setup

Install required Python packages and set your OPENAI_API_KEY environment variable.

  • Use openai SDK v1+ for LLM and embeddings.
  • Use faiss-cpu for vector similarity search.
bash
pip install openai faiss-cpu numpy
output
Collecting openai...
Collecting faiss-cpu...
Collecting numpy...
Successfully installed openai faiss-cpu numpy

Step by step

This example shows how to embed study documents, build a vector index with faiss, and query with gpt-4o using retrieved context for a study assistant.

python
import os
import numpy as np
import faiss
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample study documents
documents = [
    "Photosynthesis is the process by which green plants convert sunlight into energy.",
    "The mitochondria is the powerhouse of the cell.",
    "Newton's second law states that Force equals mass times acceleration.",
    "The capital of France is Paris."
]

# Step 1: Embed documents
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=documents
)
embeddings = np.array([data.embedding for data in response.data]).astype('float32')

# Step 2: Build FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# Step 3: Query function

def query_study_assistant(question, k=2):
    # Embed question
    q_resp = client.embeddings.create(model="text-embedding-3-small", input=[question])
    q_emb = np.array(q_resp.data[0].embedding).astype('float32').reshape(1, -1)

    # Search top k relevant docs
    distances, indices = index.search(q_emb, k)
    context = "\n".join([documents[i] for i in indices[0]])

    # Compose prompt with context
    prompt = f"Use the following study notes to answer the question.\n\n{context}\n\nQuestion: {question}\nAnswer:" 

    # Call LLM
    chat_resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return chat_resp.choices[0].message.content

# Example usage
question = "What is the role of mitochondria in cells?"
answer = query_study_assistant(question)
print("Q:", question)
print("A:", answer)
output
Q: What is the role of mitochondria in cells?
A: The mitochondria is the powerhouse of the cell, responsible for producing energy through cellular respiration.

Common variations

  • Use async calls with await client.chat.completions.acreate() for concurrency.
  • Switch to smaller models like gpt-4o-mini for cost efficiency.
  • Use other vector stores like Chroma or FAISS GPU for scalability.
  • Incorporate chunking and metadata for large documents.

Troubleshooting

  • If embeddings are empty or errors occur, verify your OPENAI_API_KEY is set correctly.
  • If FAISS index search returns no results, check embedding dimensions match.
  • For incomplete answers, increase k to retrieve more context.
  • Watch token limits in prompt; chunk large documents accordingly.

Key Takeaways

  • Use OpenAI embeddings to convert study materials into vectors for retrieval.
  • Combine vector search with LLM prompts to provide context-aware study answers.
  • Adjust retrieval count and model size to balance cost and accuracy.
  • Ensure environment variables and embedding dimensions are consistent.
  • Chunk large documents to stay within token limits for LLM input.
Verified 2026-04 · gpt-4o, gpt-4o-mini, text-embedding-3-small
Verify ↗