Critical severity intermediate · Fix: 5-15 min

RuntimeError

torch.cuda.runtime.RuntimeError: CUDA kernel not compiled

What this error means

This error occurs when the GPTQ CUDA kernel required for quantization is missing or not compiled, preventing GPU acceleration.

Stack trace

traceback

RuntimeError: CUDA kernel not compiled. Please compile the CUDA kernels before running quantization.
  File "quantize.py", line 42, in quantize_model
    gptq_cuda_kernel()
  File "gptq_cuda.py", line 10, in gptq_cuda_kernel
    raise RuntimeError("CUDA kernel not compiled")

QUICK FIX

Run the GPTQ library's CUDA kernel compilation script (e.g., python setup.py build_ext --inplace) before quantization.

Why it happens

The GPTQ quantization library relies on custom CUDA kernels for GPU acceleration. If these kernels are not compiled during installation or build, the runtime cannot find them, causing this error. This often happens if the build step was skipped, CUDA toolkit is missing, or incompatible PyTorch/CUDA versions are used.

Detection

Check for this error during model quantization initialization. Monitor logs for 'CUDA kernel not compiled' RuntimeError before GPU operations start.

Causes & fixes

CUDA kernels were not compiled during GPTQ installation

✓ Fix

Run the setup script or build commands provided by the GPTQ library to compile CUDA kernels before usage.

CUDA toolkit or compiler (nvcc) is missing or not in PATH

✓ Fix

Install the CUDA toolkit matching your PyTorch CUDA version and ensure nvcc is accessible in your system PATH.

PyTorch CUDA version mismatch with compiled kernels

✓ Fix

Use a PyTorch version compatible with your CUDA toolkit and recompile the GPTQ kernels to match.

Running on CPU-only environment without GPU support

✓ Fix

Either switch to a GPU-enabled environment or use CPU-only quantization methods that do not require CUDA kernels.

Code: broken vs fixed

Broken - triggers the error

python

from gptq import quantize_model

quantize_model(model)  # RuntimeError: CUDA kernel not compiled

Fixed - works correctly

python

import os
from gptq import quantize_model

# Ensure CUDA kernels are compiled before import
os.system('python setup.py build_ext --inplace')  # Compile CUDA kernels

quantize_model(model)  # Works without error

Added explicit CUDA kernel compilation step before running quantization to ensure kernels are available at runtime.

⚠

Workaround

Catch the RuntimeError and fallback to CPU quantization mode or skip GPU acceleration until kernels are compiled.

✓

Prevention

Automate CUDA kernel compilation as part of your deployment pipeline and verify CUDA toolkit compatibility with PyTorch and GPTQ versions.

Python 3.9+ · gptq-for-llm >=0.1.0 · tested on 0.1.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.