How to Intermediate · 3 min read

How to reduce hallucinations in RAG

Q: How to reduce hallucinations in RAG

To reduce hallucinations in RAG, ensure high-quality, relevant document retrieval and use precise prompt engineering to ground the model's responses. Additionally, implement verification steps like cross-checking retrieved data or using confidence scoring to filter unreliable outputs.

Quick answer

To reduce hallucinations in RAG, ensure high-quality, relevant document retrieval and use precise prompt engineering to ground the model's responses. Additionally, implement verification steps like cross-checking retrieved data or using confidence scoring to filter unreliable outputs.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
Basic knowledge of RAG architecture

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash

pip install openai>=1.0

Step by step

This example demonstrates a simple RAG pipeline using OpenAI's gpt-4o model with a mock retrieval step. It shows how to reduce hallucinations by grounding the LLM's response on retrieved documents.

python

import os
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Mock retrieval function returning relevant documents
# In production, use a vector search or database query

def retrieve_documents(query):
    docs = [
        "Document 1: The Eiffel Tower is located in Paris, France.",
        "Document 2: The Eiffel Tower was completed in 1889."
    ]
    return docs

# Compose prompt with retrieved documents to ground the answer
query = "Where is the Eiffel Tower located?"
docs = retrieve_documents(query)
docs_text = "\n".join(docs)
prompt = f"Use the following documents to answer the question accurately:\n{docs_text}\nQuestion: {query}\nAnswer:" 

# Call the LLM with grounded prompt
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print("Answer:", response.choices[0].message.content.strip())

output

Answer: The Eiffel Tower is located in Paris, France.

Common variations

To further reduce hallucinations, you can:

Use more advanced retrieval methods like vector similarity search with FAISS or Chroma.
Incorporate confidence scoring or answer verification by querying multiple documents.
Try different models such as claude-3-5-sonnet-20241022 which excel at coding and factual accuracy.
Implement asynchronous calls or streaming for real-time applications.

Troubleshooting

If the model hallucinates despite grounding, try these steps:

Verify the retrieval step returns relevant and up-to-date documents.
Increase the context window or chunk size of retrieved documents.
Use explicit instructions in the prompt to "only answer based on the provided documents."
Check for API errors or rate limits that might truncate responses.

✅

Key Takeaways

Ground LLM responses on high-quality, relevant retrieved documents to reduce hallucinations.
Use clear prompt instructions to constrain the model to the provided context.
Implement verification techniques like cross-checking or confidence scoring for reliability.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗