How to Intermediate · 3 min read

How to measure quantization accuracy loss

Quick answer

Measure quantization accuracy loss by comparing the outputs of the original and quantized models on a representative dataset using metrics such as accuracy, mean squared error (MSE), or perplexity. This involves running inference on both models with the same inputs and quantifying the difference in their predictions or probabilities.

PREREQUISITES

Python 3.8+
pip install torch transformers numpy
Basic understanding of model quantization and evaluation metrics

Setup

Install necessary Python packages for model loading, quantization, and evaluation.

bash

pip install torch transformers numpy

Step by step

This example shows how to measure quantization accuracy loss by comparing the original and quantized model outputs on a classification task using accuracy and mean squared error (MSE).

python

import torch
import numpy as np
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import accuracy_score, mean_squared_error

# Load original model and tokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

# Dummy dataset
texts = ["I love this movie!", "This is terrible.", "Not bad, could be better."]
labels = [1, 0, 1]  # 1=positive, 0=negative

# Tokenize inputs
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Get original model predictions
with torch.no_grad():
    outputs = model(**inputs)
    orig_logits = outputs.logits
    orig_preds = torch.argmax(orig_logits, dim=1).numpy()

# Quantize model to 8-bit (using PyTorch native quantization as example)
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)
quantized_model.eval()

# Get quantized model predictions
with torch.no_grad():
    quant_outputs = quantized_model(**inputs)
    quant_logits = quant_outputs.logits
    quant_preds = torch.argmax(quant_logits, dim=1).numpy()

# Calculate accuracy loss
orig_acc = accuracy_score(labels, orig_preds)
quant_acc = accuracy_score(labels, quant_preds)
acc_loss = orig_acc - quant_acc

# Calculate MSE between logits
mse_loss = mean_squared_error(orig_logits.numpy(), quant_logits.numpy())

print(f"Original accuracy: {orig_acc:.3f}")
print(f"Quantized accuracy: {quant_acc:.3f}")
print(f"Accuracy loss due to quantization: {acc_loss:.3f}")
print(f"Mean squared error between logits: {mse_loss:.6f}")

output

Original accuracy: 1.000
Quantized accuracy: 1.000
Accuracy loss due to quantization: 0.000
Mean squared error between logits: 0.000123

Common variations

You can measure quantization accuracy loss using other metrics like perplexity for language models or BLEU for translation tasks. For large models, use a representative validation set to get statistically meaningful results. Also, consider using bitsandbytes or transformers libraries for advanced quantization methods.

Troubleshooting

If quantized model outputs are drastically different, check if the quantization method is compatible with your model architecture.
Ensure the evaluation dataset is representative and large enough to avoid misleading accuracy loss.
Use torch.no_grad() during inference to prevent unwanted gradient computations.

Key Takeaways

Compare original and quantized model outputs on the same dataset to measure accuracy loss.
Use metrics like accuracy, MSE, or perplexity depending on the task.
Ensure quantization method matches model architecture to avoid large accuracy drops.

Verified 2026-04 · distilbert-base-uncased-finetuned-sst-2-english

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.