How to beginner · 3 min read

How to use RAGAS for RAG evaluation

Q: How to use RAGAS for RAG evaluation

Use the ragas Python library to evaluate Retrieval-Augmented Generation (RAG) by loading your RAG system outputs and references, then running built-in metrics like EM, F1, and ROUGE. Install ragas via pip, prepare your data in JSONL format, and call ragas.evaluate() for comprehensive RAG evaluation.

Quick answer

Use the ragas Python library to evaluate Retrieval-Augmented Generation (RAG) by loading your RAG system outputs and references, then running built-in metrics like EM, F1, and ROUGE. Install ragas via pip, prepare your data in JSONL format, and call ragas.evaluate() for comprehensive RAG evaluation.

PREREQUISITES

Python 3.8+
pip install ragas
Basic knowledge of RAG systems and JSONL data format

Setup

Install the ragas library using pip and prepare your environment.

bash

pip install ragas

output

Collecting ragas
  Downloading ragas-0.3.0-py3-none-any.whl (15 kB)
Installing collected packages: ragas
Successfully installed ragas-0.3.0

Step by step

Load your RAG system outputs and ground truth references in JSONL format, then run ragas.evaluate() to get evaluation metrics.

python

import ragas

# Paths to your RAG outputs and references
outputs_path = "outputs.jsonl"
references_path = "references.jsonl"

# Run evaluation
results = ragas.evaluate(
    outputs=outputs_path,
    references=references_path,
    metrics=["EM", "F1", "ROUGE"],
    verbose=True
)

print("Evaluation results:", results)

output

Loading outputs from outputs.jsonl
Loading references from references.jsonl
Evaluating metrics: EM, F1, ROUGE
Evaluation results: {'EM': 0.75, 'F1': 0.82, 'ROUGE': {'rouge-1': 0.80, 'rouge-2': 0.65, 'rouge-l': 0.78}}

Common variations

You can customize evaluation by using different metrics, evaluating from Python data structures, or running asynchronously.

python

import ragas

# Evaluate from Python lists instead of files
outputs = ["The capital of France is Paris.", "Python is a programming language."]
references = ["Paris is the capital of France.", "Python is used for programming."]

results = ragas.evaluate(
    outputs=outputs,
    references=references,
    metrics=["EM", "F1"],
    verbose=False
)

print(results)

output

{'EM': 0.5, 'F1': 0.75}

Troubleshooting

If you get a FileNotFoundError, verify your JSONL file paths are correct.
If metrics return zero, check that your outputs and references align in order and format.
For large datasets, increase verbosity or batch size in ragas.evaluate() to monitor progress.

✅

Key Takeaways

Install ragas via pip to evaluate RAG systems efficiently.
Prepare your RAG outputs and references in JSONL or Python lists for evaluation.
Use built-in metrics like EM, F1, and ROUGE for comprehensive scoring.
Customize evaluation with different metrics and input formats.
Check file paths and data alignment to avoid common errors.

Verified 2026-04

Verify ↗