How to Intermediate · 3 min read

How to reduce hallucinations in RAG

Quick answer
To reduce hallucinations in RAG, ensure high-quality, relevant document retrieval and use precise prompt engineering to ground the model's responses. Additionally, implement verification steps like cross-checking retrieved data or using confidence scoring to filter unreliable outputs.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • Basic knowledge of RAG architecture

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0

Step by step

This example demonstrates a simple RAG pipeline using OpenAI's gpt-4o model with a mock retrieval step. It shows how to reduce hallucinations by grounding the LLM's response on retrieved documents.

python
import os
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Mock retrieval function returning relevant documents
# In production, use a vector search or database query

def retrieve_documents(query):
    docs = [
        "Document 1: The Eiffel Tower is located in Paris, France.",
        "Document 2: The Eiffel Tower was completed in 1889."
    ]
    return docs

# Compose prompt with retrieved documents to ground the answer
query = "Where is the Eiffel Tower located?"
docs = retrieve_documents(query)
docs_text = "\n".join(docs)
prompt = f"Use the following documents to answer the question accurately:\n{docs_text}\nQuestion: {query}\nAnswer:" 

# Call the LLM with grounded prompt
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print("Answer:", response.choices[0].message.content.strip())
output
Answer: The Eiffel Tower is located in Paris, France.

Common variations

To further reduce hallucinations, you can:

  • Use more advanced retrieval methods like vector similarity search with FAISS or Chroma.
  • Incorporate confidence scoring or answer verification by querying multiple documents.
  • Try different models such as claude-3-5-sonnet-20241022 which excel at coding and factual accuracy.
  • Implement asynchronous calls or streaming for real-time applications.

Troubleshooting

If the model hallucinates despite grounding, try these steps:

  • Verify the retrieval step returns relevant and up-to-date documents.
  • Increase the context window or chunk size of retrieved documents.
  • Use explicit instructions in the prompt to "only answer based on the provided documents."
  • Check for API errors or rate limits that might truncate responses.

Key Takeaways

  • Ground LLM responses on high-quality, relevant retrieved documents to reduce hallucinations.
  • Use clear prompt instructions to constrain the model to the provided context.
  • Implement verification techniques like cross-checking or confidence scoring for reliability.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗