How to use RAGAS for RAG evaluation
Quick answer
Use the
ragas Python library to evaluate Retrieval-Augmented Generation (RAG) by loading your RAG system outputs and references, then running built-in metrics like EM, F1, and ROUGE. Install ragas via pip, prepare your data in JSONL format, and call ragas.evaluate() for comprehensive RAG evaluation.PREREQUISITES
Python 3.8+pip install ragasBasic knowledge of RAG systems and JSONL data format
Setup
Install the ragas library using pip and prepare your environment.
pip install ragas output
Collecting ragas Downloading ragas-0.3.0-py3-none-any.whl (15 kB) Installing collected packages: ragas Successfully installed ragas-0.3.0
Step by step
Load your RAG system outputs and ground truth references in JSONL format, then run ragas.evaluate() to get evaluation metrics.
import ragas
# Paths to your RAG outputs and references
outputs_path = "outputs.jsonl"
references_path = "references.jsonl"
# Run evaluation
results = ragas.evaluate(
outputs=outputs_path,
references=references_path,
metrics=["EM", "F1", "ROUGE"],
verbose=True
)
print("Evaluation results:", results) output
Loading outputs from outputs.jsonl
Loading references from references.jsonl
Evaluating metrics: EM, F1, ROUGE
Evaluation results: {'EM': 0.75, 'F1': 0.82, 'ROUGE': {'rouge-1': 0.80, 'rouge-2': 0.65, 'rouge-l': 0.78}} Common variations
You can customize evaluation by using different metrics, evaluating from Python data structures, or running asynchronously.
import ragas
# Evaluate from Python lists instead of files
outputs = ["The capital of France is Paris.", "Python is a programming language."]
references = ["Paris is the capital of France.", "Python is used for programming."]
results = ragas.evaluate(
outputs=outputs,
references=references,
metrics=["EM", "F1"],
verbose=False
)
print(results) output
{'EM': 0.5, 'F1': 0.75} Troubleshooting
- If you get a
FileNotFoundError, verify your JSONL file paths are correct. - If metrics return zero, check that your outputs and references align in order and format.
- For large datasets, increase verbosity or batch size in
ragas.evaluate()to monitor progress.
Key Takeaways
- Install
ragasvia pip to evaluate RAG systems efficiently. - Prepare your RAG outputs and references in JSONL or Python lists for evaluation.
- Use built-in metrics like
EM,F1, andROUGEfor comprehensive scoring. - Customize evaluation with different metrics and input formats.
- Check file paths and data alignment to avoid common errors.