Concept Beginner · 3 min read

What is perplexity in language models

Quick answer

Perplexity is a metric used to evaluate language models by measuring how well a model predicts a sample of text. It is the exponentiation of the average negative log-likelihood of the predicted tokens, indicating the model's uncertainty; lower perplexity means better prediction.

Perplexity is a measurement metric that quantifies how well a language model predicts a sequence of words by evaluating its uncertainty.

How it works

Perplexity measures the uncertainty of a language model when predicting the next word in a sequence. Imagine reading a sentence and guessing the next word; if you are very confident, your perplexity is low. If you are unsure, perplexity is high. Mathematically, it is the exponentiation of the average negative log probability the model assigns to the true next tokens. Lower perplexity means the model is better at predicting the text, similar to how a skilled reader anticipates words more accurately.

Concrete example

Suppose a language model predicts the next word probabilities for a 3-word sequence as follows:

Word 1 probability: 0.5
Word 2 probability: 0.25
Word 3 probability: 0.125

The average negative log-likelihood is:

\( -\frac{1}{3} (\log 0.5 + \log 0.25 + \log 0.125) \)

Calculating this and exponentiating gives the perplexity.

Here's Python code to compute it:

python

import math

probabilities = [0.5, 0.25, 0.125]

neg_log_likelihood = -sum(math.log(p) for p in probabilities) / len(probabilities)
perplexity = math.exp(neg_log_likelihood)

print(f"Perplexity: {perplexity:.3f}")

output

Perplexity: 3.000

When to use it

Use perplexity to evaluate and compare language models during training or benchmarking, especially for tasks like language modeling and text generation. It helps identify which model predicts text more confidently. However, do not rely on perplexity alone for downstream tasks like classification or summarization, where task-specific metrics are more relevant.

Key terms

Term	Definition
Perplexity	A metric measuring how well a language model predicts a sequence; lower means better prediction.
Log-likelihood	The logarithm of the probability assigned to the true next token by the model.
Negative log-likelihood	The negative value of log-likelihood, used to compute perplexity.
Exponentiation	Mathematical operation used to convert average negative log-likelihood back to probability scale.

✅

Key Takeaways

Perplexity quantifies a language model's uncertainty in predicting text sequences.
Lower perplexity indicates a better predictive model performance.
Calculate perplexity by exponentiating the average negative log-likelihood of predicted tokens.
Use perplexity primarily for evaluating language models, not for all NLP tasks.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗