What is model calibration in AI
How it works
Model calibration adjusts the confidence scores output by AI models to better match real-world probabilities. Imagine a weather app that says there's a 70% chance of rain. If it rains roughly 7 out of 10 times when the app predicts 70%, the model is well calibrated. Calibration uses techniques like Platt scaling or isotonic regression to correct overconfident or underconfident predictions, ensuring the predicted probability aligns with observed outcomes.
Think of calibration like tuning a musical instrument: the model's raw predictions are the strings, and calibration tightens or loosens them so the notes (probabilities) sound true to reality.
Concrete example
Suppose a binary classifier outputs probabilities for positive class predictions. We test it on 100 samples where it predicts 0.8 probability. If the model is calibrated, about 80 of those samples should actually be positive.
import numpy as np
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt
# Simulated predicted probabilities and true labels
pred_probs = np.array([0.8]*100 + [0.3]*100)
true_labels = np.array([1]*80 + [0]*20 + [1]*30 + [0]*70)
# Compute calibration curve
prob_true, prob_pred = calibration_curve(true_labels, pred_probs, n_bins=2)
print(f"Predicted probabilities bins: {prob_pred}")
print(f"True outcome frequencies: {prob_true}")
# Plot calibration curve
plt.plot(prob_pred, prob_true, marker='o')
plt.plot([0,1], [0,1], linestyle='--')
plt.xlabel('Mean predicted probability')
plt.ylabel('Fraction of positives')
plt.title('Calibration curve example')
plt.show() Predicted probabilities bins: [0.3 0.8] True outcome frequencies: [0.3 0.8]
When to use it
Use model calibration when your AI system's probability outputs drive critical decisions, such as medical diagnosis, credit scoring, or weather forecasting. Calibration improves trust by making confidence scores meaningful and comparable. Avoid relying on raw model probabilities when they are known to be biased or overconfident, especially in imbalanced datasets or when models are trained with surrogate losses that do not optimize probability accuracy.
Key terms
| Term | Definition |
|---|---|
| Model calibration | Adjusting predicted probabilities to match true outcome frequencies. |
| Platt scaling | A logistic regression method to calibrate model outputs. |
| Isotonic regression | A non-parametric calibration technique that fits a monotonic function. |
| Calibration curve | A plot comparing predicted probabilities to observed frequencies. |
| Overconfidence | When predicted probabilities are systematically higher than actual outcomes. |
Key Takeaways
- Model calibration ensures predicted probabilities reflect real-world likelihoods accurately.
- Use calibration techniques like Platt scaling or isotonic regression to fix biased confidence scores.
- Calibration is critical for AI applications where probability estimates guide decisions.
- Calibration curves visually assess how well model probabilities align with true outcomes.