High severity intermediate · Fix: 5-10 min

RuntimeError

torch._C._RuntimeError

What this error means

A dtype mismatch occurs when the quantized model expects inputs or weights in one data type but receives another during inference, causing a runtime error.

Stack trace

traceback

RuntimeError: Expected tensor dtype torch.qint8 but got torch.float32 during quantized model inference.

QUICK FIX

Cast input tensors to the model's expected quantized dtype before inference to resolve dtype mismatch errors immediately.

Why it happens

Quantized models use specific low-precision data types like int8 or qint8 for weights and activations. If the input tensors or model weights are not properly converted or cast to the expected quantized dtype, PyTorch raises a dtype mismatch error during inference.

Detection

Monitor model input and weight tensor dtypes before inference; assert that all tensors match the expected quantized dtype to catch mismatches early.

Causes & fixes

Input tensors are in float32 but the quantized model expects int8 or qint8 dtype.

✓ Fix

Convert input tensors to the expected quantized dtype using torch.quantize_per_tensor or appropriate casting before passing to the model.

Model weights were not properly quantized or loaded with the correct dtype.

✓ Fix

Ensure the model is loaded with quantized weights using the correct quantization-aware loading functions or scripts.

Mixing quantized and non-quantized layers or tensors in the model pipeline.

✓ Fix

Verify that all model components and intermediate tensors are consistently quantized or dequantized as needed to maintain dtype compatibility.

Code: broken vs fixed

Broken - triggers the error

python

import torch

model = torch.quantization.quantize_dynamic(torch.nn.Linear(10, 5), {torch.nn.Linear}, dtype=torch.qint8)
input_tensor = torch.randn(1, 10)  # float32 tensor
output = model(input_tensor)  # RuntimeError: dtype mismatch here

Fixed - works correctly

python

import os
import torch

os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'  # example env usage

model = torch.quantization.quantize_dynamic(torch.nn.Linear(10, 5), {torch.nn.Linear}, dtype=torch.qint8)
input_tensor = torch.randn(1, 10)
input_quantized = torch.quantize_per_tensor(input_tensor, scale=1.0, zero_point=0, dtype=torch.qint8)  # convert input to qint8
output = model(input_quantized)  # fixed dtype mismatch
print(output)

Converted the input tensor to the quantized dtype qint8 before passing it to the quantized model, resolving the dtype mismatch error.

⚠

Workaround

Catch the RuntimeError and manually convert input tensors to the expected quantized dtype using torch.quantize_per_tensor before retrying inference.

✓

Prevention

Use consistent quantization-aware training and inference pipelines that enforce dtype compatibility, and validate tensor dtypes at each stage to avoid mismatches.

Python 3.9+ · torch >=1.12.0 · tested on 2.1.0

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.