RuntimeError
torch.cuda.runtime.RuntimeError: CUDA kernel not compiled
Stack trace
RuntimeError: CUDA kernel not compiled. Please compile the CUDA kernels before running quantization.
File "quantize.py", line 42, in quantize_model
gptq_cuda_kernel()
File "gptq_cuda.py", line 10, in gptq_cuda_kernel
raise RuntimeError("CUDA kernel not compiled") Why it happens
The GPTQ quantization library relies on custom CUDA kernels for GPU acceleration. If these kernels are not compiled during installation or build, the runtime cannot find them, causing this error. This often happens if the build step was skipped, CUDA toolkit is missing, or incompatible PyTorch/CUDA versions are used.
Detection
Check for this error during model quantization initialization. Monitor logs for 'CUDA kernel not compiled' RuntimeError before GPU operations start.
Causes & fixes
CUDA kernels were not compiled during GPTQ installation
Run the setup script or build commands provided by the GPTQ library to compile CUDA kernels before usage.
CUDA toolkit or compiler (nvcc) is missing or not in PATH
Install the CUDA toolkit matching your PyTorch CUDA version and ensure nvcc is accessible in your system PATH.
PyTorch CUDA version mismatch with compiled kernels
Use a PyTorch version compatible with your CUDA toolkit and recompile the GPTQ kernels to match.
Running on CPU-only environment without GPU support
Either switch to a GPU-enabled environment or use CPU-only quantization methods that do not require CUDA kernels.
Code: broken vs fixed
from gptq import quantize_model
quantize_model(model) # RuntimeError: CUDA kernel not compiled import os
from gptq import quantize_model
# Ensure CUDA kernels are compiled before import
os.system('python setup.py build_ext --inplace') # Compile CUDA kernels
quantize_model(model) # Works without error Workaround
Catch the RuntimeError and fallback to CPU quantization mode or skip GPU acceleration until kernels are compiled.
Prevention
Automate CUDA kernel compilation as part of your deployment pipeline and verify CUDA toolkit compatibility with PyTorch and GPTQ versions.