Concept Beginner · 3 min read

What is perplexity in language models

Quick answer
Perplexity is a metric used to evaluate language models by measuring how well a model predicts a sample of text. It is the exponentiation of the average negative log-likelihood of the predicted tokens, indicating the model's uncertainty; lower perplexity means better prediction.
Perplexity is a measurement metric that quantifies how well a language model predicts a sequence of words by evaluating its uncertainty.

How it works

Perplexity measures the uncertainty of a language model when predicting the next word in a sequence. Imagine reading a sentence and guessing the next word; if you are very confident, your perplexity is low. If you are unsure, perplexity is high. Mathematically, it is the exponentiation of the average negative log probability the model assigns to the true next tokens. Lower perplexity means the model is better at predicting the text, similar to how a skilled reader anticipates words more accurately.

Concrete example

Suppose a language model predicts the next word probabilities for a 3-word sequence as follows:

  • Word 1 probability: 0.5
  • Word 2 probability: 0.25
  • Word 3 probability: 0.125

The average negative log-likelihood is:

\( -\frac{1}{3} (\log 0.5 + \log 0.25 + \log 0.125) \)

Calculating this and exponentiating gives the perplexity.

Here's Python code to compute it:

python
import math

probabilities = [0.5, 0.25, 0.125]

neg_log_likelihood = -sum(math.log(p) for p in probabilities) / len(probabilities)
perplexity = math.exp(neg_log_likelihood)

print(f"Perplexity: {perplexity:.3f}")
output
Perplexity: 3.000

When to use it

Use perplexity to evaluate and compare language models during training or benchmarking, especially for tasks like language modeling and text generation. It helps identify which model predicts text more confidently. However, do not rely on perplexity alone for downstream tasks like classification or summarization, where task-specific metrics are more relevant.

Key terms

TermDefinition
PerplexityA metric measuring how well a language model predicts a sequence; lower means better prediction.
Log-likelihoodThe logarithm of the probability assigned to the true next token by the model.
Negative log-likelihoodThe negative value of log-likelihood, used to compute perplexity.
ExponentiationMathematical operation used to convert average negative log-likelihood back to probability scale.

Key Takeaways

  • Perplexity quantifies a language model's uncertainty in predicting text sequences.
  • Lower perplexity indicates a better predictive model performance.
  • Calculate perplexity by exponentiating the average negative log-likelihood of predicted tokens.
  • Use perplexity primarily for evaluating language models, not for all NLP tasks.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗