High severity intermediate · Fix: 5-10 min

RuntimeError

torch._C._RuntimeError

What this error means
This error occurs when attempting to run int8 quantized models on a CPU that does not support the required instructions or when the quantization backend is not properly configured.

Stack trace

traceback
RuntimeError: int8 quantization is not supported on this CPU or the quantization backend is not set correctly.
  File "model_quant.py", line 42, in <module>
    model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)  # triggers error
  File "/usr/local/lib/python3.9/site-packages/torch/quantization/quantize.py", line 123, in quantize_dynamic
    raise RuntimeError("int8 quantization CPU not supported error")
QUICK FIX
Set torch.backends.quantized.engine = 'fbgemm' before quantizing to enable CPU int8 quantization support on x86 CPUs.

Why it happens

Int8 quantization requires CPU support for specific instructions like AVX2 or VNNI. If the CPU lacks these or if PyTorch's quantization backend is not set to a supported one (e.g., fbgemm for x86), this error is raised. It can also happen if the environment or PyTorch build does not include quantization support.

Detection

Check CPU capabilities using tools like lscpu or cpuinfo and verify PyTorch quantization backend settings before running quantized models to catch unsupported configurations early.

Causes & fixes

1

CPU hardware does not support int8 quantization instructions (e.g., lacks AVX2 or VNNI)

✓ Fix

Run the model on a CPU that supports these instructions or switch to a GPU backend that supports quantization.

2

PyTorch quantization backend is not set or set incorrectly (e.g., backend is 'qnnpack' on x86 CPU)

✓ Fix

Set the quantization backend to 'fbgemm' for x86 CPUs by calling torch.backends.quantized.engine = 'fbgemm' before quantization.

3

PyTorch installation does not include quantization support or is outdated

✓ Fix

Upgrade PyTorch to a version that supports quantization and ensure it is installed with CPU quantization capabilities.

Code: broken vs fixed

Broken - triggers the error
python
import torch
model = torch.nn.Linear(10, 5)
# This line triggers the error on unsupported CPU
quantized_model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
Fixed - works correctly
python
import os
import torch

# Set quantization backend to fbgemm for CPU support
torch.backends.quantized.engine = 'fbgemm'

model = torch.nn.Linear(10, 5)
quantized_model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)  # fixed
print("Quantization successful on supported CPU")
Added torch.backends.quantized.engine = 'fbgemm' to enable CPU int8 quantization support on compatible CPUs, preventing the runtime error.

Workaround

If you cannot change hardware or backend, run the model without quantization or use float32 precision as a temporary fallback.

Prevention

Verify CPU instruction support and explicitly set the quantization backend in your environment before deploying quantized models to avoid unsupported runtime errors.

Python 3.7+ · torch >=1.10.0 · tested on 2.0.1
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.