Concept Intermediate · 3 min read

What is retrieval precision and recall in RAG

Quick answer
In Retrieval-Augmented Generation (RAG), retrieval precision measures the proportion of retrieved documents that are relevant to the query, while retrieval recall measures the proportion of all relevant documents that were successfully retrieved. These metrics evaluate how well the retrieval component finds useful context to ground the language model's generated answers.
Retrieval precision and recall are evaluation metrics that measure the accuracy and completeness of the document retrieval step in Retrieval-Augmented Generation (RAG) systems.

How it works

Retrieval precision and recall assess the quality of the retrieval step in a RAG pipeline, where a retriever fetches documents from a knowledge base to support a language model's answer generation. Precision is like checking how many of the documents you picked are actually useful, while recall checks how many of all the useful documents you managed to find.

Imagine searching for books in a library: precision is the fraction of books you picked that are truly about your topic, and recall is the fraction of all relevant books in the library that you found. High precision means fewer irrelevant documents, high recall means fewer missed relevant documents.

Concrete example

Suppose a RAG system retrieves 5 documents for a query, and 3 of them are relevant. There are 4 relevant documents total in the knowledge base.

python
retrieved_docs = ['doc1', 'doc2', 'doc3', 'doc4', 'doc5']
relevant_docs = {'doc1', 'doc3', 'doc6', 'doc7'}  # total relevant docs

retrieved_relevant = [doc for doc in retrieved_docs if doc in relevant_docs]

precision = len(retrieved_relevant) / len(retrieved_docs)
recall = len(retrieved_relevant) / len(relevant_docs)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
output
Precision: 0.40
Recall: 0.50

When to use it

Use retrieval precision and recall to evaluate and tune the retriever in a RAG system, especially when you have labeled relevant documents for queries. High precision is critical when irrelevant documents can mislead the language model, while high recall is important when missing relevant context reduces answer quality.

Do not rely solely on these metrics if you lack ground truth relevance labels or if the language model can compensate for some retrieval errors.

Key terms

TermDefinition
Retrieval precisionThe fraction of retrieved documents that are relevant.
Retrieval recallThe fraction of all relevant documents that are retrieved.
RAGRetrieval-Augmented Generation, combining retrieval with language models.
RetrieverComponent that fetches documents from a knowledge base based on a query.

Key Takeaways

  • Retrieval precision measures how many retrieved documents are relevant to the query.
  • Retrieval recall measures how many relevant documents are successfully retrieved.
  • Balancing precision and recall is crucial for effective RAG system performance.
  • Use these metrics when you have ground truth relevance labels for evaluation.
  • High precision reduces noise; high recall ensures comprehensive context for generation.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗