How to reduce hallucinations in RAG
Quick answer
To reduce hallucinations in
RAG, ensure high-quality, relevant document retrieval and use precise prompt engineering to ground the model's responses. Additionally, implement verification steps like cross-checking retrieved data or using confidence scoring to filter unreliable outputs.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0Basic knowledge of RAG architecture
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai>=1.0 Step by step
This example demonstrates a simple RAG pipeline using OpenAI's gpt-4o model with a mock retrieval step. It shows how to reduce hallucinations by grounding the LLM's response on retrieved documents.
import os
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Mock retrieval function returning relevant documents
# In production, use a vector search or database query
def retrieve_documents(query):
docs = [
"Document 1: The Eiffel Tower is located in Paris, France.",
"Document 2: The Eiffel Tower was completed in 1889."
]
return docs
# Compose prompt with retrieved documents to ground the answer
query = "Where is the Eiffel Tower located?"
docs = retrieve_documents(query)
docs_text = "\n".join(docs)
prompt = f"Use the following documents to answer the question accurately:\n{docs_text}\nQuestion: {query}\nAnswer:"
# Call the LLM with grounded prompt
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print("Answer:", response.choices[0].message.content.strip()) output
Answer: The Eiffel Tower is located in Paris, France.
Common variations
To further reduce hallucinations, you can:
- Use more advanced retrieval methods like vector similarity search with FAISS or Chroma.
- Incorporate confidence scoring or answer verification by querying multiple documents.
- Try different models such as
claude-3-5-sonnet-20241022which excel at coding and factual accuracy. - Implement asynchronous calls or streaming for real-time applications.
Troubleshooting
If the model hallucinates despite grounding, try these steps:
- Verify the retrieval step returns relevant and up-to-date documents.
- Increase the context window or chunk size of retrieved documents.
- Use explicit instructions in the prompt to "only answer based on the provided documents."
- Check for API errors or rate limits that might truncate responses.
Key Takeaways
- Ground LLM responses on high-quality, relevant retrieved documents to reduce hallucinations.
- Use clear prompt instructions to constrain the model to the provided context.
- Implement verification techniques like cross-checking or confidence scoring for reliability.