How to Intermediate · 4 min read

How to build study assistant with RAG

Q: How to build study assistant with RAG

Build a study assistant with RAG by combining a vector database for document retrieval with a large language model (LLM) like gpt-4o. First, embed study materials using OpenAI embeddings, then query relevant context to augment LLM responses for accurate, context-aware answers.

Quick answer

Build a study assistant with RAG by combining a vector database for document retrieval with a large language model (LLM) like gpt-4o. First, embed study materials using OpenAI embeddings, then query relevant context to augment LLM responses for accurate, context-aware answers.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install faiss-cpu
pip install numpy

Setup

Install required Python packages and set your OPENAI_API_KEY environment variable.

Use openai SDK v1+ for LLM and embeddings.
Use faiss-cpu for vector similarity search.

bash

pip install openai faiss-cpu numpy

output

Collecting openai...
Collecting faiss-cpu...
Collecting numpy...
Successfully installed openai faiss-cpu numpy

Step by step

This example shows how to embed study documents, build a vector index with faiss, and query with gpt-4o using retrieved context for a study assistant.

python

import os
import numpy as np
import faiss
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample study documents
documents = [
    "Photosynthesis is the process by which green plants convert sunlight into energy.",
    "The mitochondria is the powerhouse of the cell.",
    "Newton's second law states that Force equals mass times acceleration.",
    "The capital of France is Paris."
]

# Step 1: Embed documents
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=documents
)
embeddings = np.array([data.embedding for data in response.data]).astype('float32')

# Step 2: Build FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# Step 3: Query function

def query_study_assistant(question, k=2):
    # Embed question
    q_resp = client.embeddings.create(model="text-embedding-3-small", input=[question])
    q_emb = np.array(q_resp.data[0].embedding).astype('float32').reshape(1, -1)

    # Search top k relevant docs
    distances, indices = index.search(q_emb, k)
    context = "\n".join([documents[i] for i in indices[0]])

    # Compose prompt with context
    prompt = f"Use the following study notes to answer the question.\n\n{context}\n\nQuestion: {question}\nAnswer:" 

    # Call LLM
    chat_resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return chat_resp.choices[0].message.content

# Example usage
question = "What is the role of mitochondria in cells?"
answer = query_study_assistant(question)
print("Q:", question)
print("A:", answer)

output

Q: What is the role of mitochondria in cells?
A: The mitochondria is the powerhouse of the cell, responsible for producing energy through cellular respiration.

Common variations

Use async calls with await client.chat.completions.acreate() for concurrency.
Switch to smaller models like gpt-4o-mini for cost efficiency.
Use other vector stores like Chroma or FAISS GPU for scalability.
Incorporate chunking and metadata for large documents.

Troubleshooting

If embeddings are empty or errors occur, verify your OPENAI_API_KEY is set correctly.
If FAISS index search returns no results, check embedding dimensions match.
For incomplete answers, increase k to retrieve more context.
Watch token limits in prompt; chunk large documents accordingly.

✅

Key Takeaways

Use OpenAI embeddings to convert study materials into vectors for retrieval.
Combine vector search with LLM prompts to provide context-aware study answers.
Adjust retrieval count and model size to balance cost and accuracy.
Ensure environment variables and embedding dimensions are consistent.
Chunk large documents to stay within token limits for LLM input.

Verified 2026-04 · gpt-4o, gpt-4o-mini, text-embedding-3-small

Verify ↗