RuntimeError
torch.cuda.RuntimeError
Stack trace
RuntimeError: bitsandbytes requires CUDA 11.7 or higher and a compatible GPU for 4bit quantization. Detected CUDA version: 11.3. Please upgrade your CUDA toolkit and bitsandbytes package.
Why it happens
QLoRA relies on bitsandbytes for efficient 4bit quantization which requires a minimum CUDA version (11.7+) and compatible GPU drivers. If the environment has an older CUDA version or bitsandbytes is outdated, the setup fails with this runtime error.
Detection
Check CUDA version with 'nvcc --version' or 'torch.version.cuda' and bitsandbytes version via pip; verify GPU compatibility before running QLoRA to catch this error early.
Causes & fixes
Installed CUDA version is older than 11.7, incompatible with bitsandbytes 4bit quantization.
Upgrade your CUDA toolkit to version 11.7 or higher and ensure your GPU drivers are up to date.
bitsandbytes package version is outdated and lacks support for 4bit quantization.
Upgrade bitsandbytes to the latest version (>=0.37.0) using pip install --upgrade bitsandbytes.
Running on a CPU-only environment or unsupported GPU that does not support 4bit CUDA kernels.
Run QLoRA on a compatible NVIDIA GPU with CUDA support; CPU-only setups cannot use 4bit CUDA quantization.
Mismatch between PyTorch CUDA version and system CUDA toolkit causing runtime incompatibility.
Ensure PyTorch is installed with the correct CUDA version matching your system CUDA toolkit.
Code: broken vs fixed
from transformers import AutoModelForCausalLM
import bitsandbytes as bnb
model = AutoModelForCausalLM.from_pretrained(
'model-name',
load_in_4bit=True, # triggers quantization setup
device_map='auto'
) # RuntimeError occurs here due to CUDA incompatibility import os
from transformers import AutoModelForCausalLM
import bitsandbytes as bnb
import torch
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Ensure GPU is visible
# Confirm CUDA version and bitsandbytes version before loading
assert torch.version.cuda >= '11.7', 'CUDA 11.7+ required'
assert bnb.__version__ >= '0.37.0', 'Update bitsandbytes to >=0.37.0'
model = AutoModelForCausalLM.from_pretrained(
'model-name',
load_in_4bit=True, # fixed: environment meets requirements
device_map='auto'
)
print('Model loaded with 4bit quantization successfully') Workaround
If upgrading CUDA or GPU is not possible, disable 4bit quantization by setting load_in_4bit=False and use full precision or 8bit quantization as fallback.
Prevention
Use containerized environments with pre-installed compatible CUDA and bitsandbytes versions or automate environment validation scripts before running QLoRA to avoid setup errors.