How to Intermediate · 4 min read

How to build legal RAG system

Quick answer
Build a legal RAG system by combining a vector database for embedding legal documents with a large language model (LLM) like gpt-4o to generate answers based on retrieved context. Use OpenAI embeddings to convert legal texts into vectors, then query these vectors to find relevant documents and feed them as context to the LLM for precise legal responses.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install faiss-cpu or chromadb

Setup

Install required Python packages and set your environment variable for the OpenAI API key.

  • Use faiss-cpu or chromadb for vector search.
  • Set OPENAI_API_KEY in your environment.
bash
pip install openai faiss-cpu
output
Collecting openai
Collecting faiss-cpu
Successfully installed openai-1.x faiss-cpu-1.x

Step by step

This example shows how to embed legal documents, store them in a vector index, query relevant documents, and generate answers using gpt-4o.

python
import os
from openai import OpenAI
import faiss
import numpy as np

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample legal documents
legal_docs = [
    "Section 1: Contract terms and obligations.",
    "Section 2: Intellectual property rights.",
    "Section 3: Liability and indemnification.",
    "Section 4: Termination clauses and conditions."
]

# Step 1: Create embeddings for legal docs
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=legal_docs
)
embeddings = [data.embedding for data in response.data]

# Step 2: Build FAISS index
dimension = len(embeddings[0])
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings).astype('float32'))

# Step 3: Query embedding for user question
query = "What are the termination conditions in the contract?"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[query]
)
query_embedding = np.array(query_response.data[0].embedding).astype('float32')

# Step 4: Search top 2 relevant docs
k = 2
D, I = index.search(np.array([query_embedding]), k)
relevant_docs = [legal_docs[i] for i in I[0]]

# Step 5: Generate answer with context
context = "\n".join(relevant_docs)
prompt = f"You are a legal assistant. Use the following context to answer the question.\nContext:\n{context}\nQuestion: {query}\nAnswer:"

chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)
answer = chat_response.choices[0].message.content
print("Answer:", answer)
output
Answer: The termination conditions in the contract include the clauses outlined in Section 4, which specify the conditions under which the contract may be terminated by either party.

Common variations

You can enhance your legal RAG system by:

  • Using chromadb instead of FAISS for easier setup and persistence.
  • Switching to gpt-4o-mini for cost-effective inference.
  • Implementing async calls with asyncio for scalable querying.
  • Adding document chunking and metadata filtering for more precise retrieval.
python
import asyncio
from openai import OpenAI

async def async_legal_rag():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    query = "Explain liability clauses."
    embed_resp = await client.embeddings.acreate(model="text-embedding-3-small", input=[query])
    query_emb = embed_resp.data[0].embedding
    # Assume vector search async method here
    # Then generate answer asynchronously
    prompt = f"Use legal docs context to answer: {query}"
    chat_resp = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    print(chat_resp.choices[0].message.content)

asyncio.run(async_legal_rag())
output
Liability clauses define the responsibilities and limits of each party in case of damages or losses, protecting parties from excessive claims.

Troubleshooting

  • If embeddings return errors, verify your OPENAI_API_KEY and model name.
  • If vector search returns irrelevant results, increase the number of retrieved documents or improve document chunking.
  • If the LLM output is vague, provide clearer context or use system prompts to instruct the model.

Key Takeaways

  • Use OpenAI embeddings to vectorize legal documents for semantic search.
  • Combine vector search results as context to guide the LLM for accurate legal answers.
  • Choose models like gpt-4o or gpt-4o-mini balancing cost and performance.
  • Implement async and chunking for scalable, precise legal RAG systems.
  • Validate API keys and model names to avoid common errors.
Verified 2026-04 · gpt-4o, gpt-4o-mini, text-embedding-3-small
Verify ↗