Critical severity intermediate · Fix: 5-10 min

RuntimeError

torch.cuda.RuntimeError

What this error means

This error occurs when QLoRA's 4bit quantization setup fails due to incompatible CUDA or bitsandbytes library versions or missing GPU support.

Stack trace

traceback

RuntimeError: bitsandbytes requires CUDA 11.7 or higher and a compatible GPU for 4bit quantization. Detected CUDA version: 11.3. Please upgrade your CUDA toolkit and bitsandbytes package.

QUICK FIX

Upgrade CUDA toolkit to 11.7+, update bitsandbytes to latest version, and run on a compatible GPU to resolve the 4bit quantization setup error.

Why it happens

QLoRA relies on bitsandbytes for efficient 4bit quantization which requires a minimum CUDA version (11.7+) and compatible GPU drivers. If the environment has an older CUDA version or bitsandbytes is outdated, the setup fails with this runtime error.

Detection

Check CUDA version with 'nvcc --version' or 'torch.version.cuda' and bitsandbytes version via pip; verify GPU compatibility before running QLoRA to catch this error early.

Causes & fixes

Installed CUDA version is older than 11.7, incompatible with bitsandbytes 4bit quantization.

✓ Fix

Upgrade your CUDA toolkit to version 11.7 or higher and ensure your GPU drivers are up to date.

bitsandbytes package version is outdated and lacks support for 4bit quantization.

✓ Fix

Upgrade bitsandbytes to the latest version (>=0.37.0) using pip install --upgrade bitsandbytes.

Running on a CPU-only environment or unsupported GPU that does not support 4bit CUDA kernels.

✓ Fix

Run QLoRA on a compatible NVIDIA GPU with CUDA support; CPU-only setups cannot use 4bit CUDA quantization.

Mismatch between PyTorch CUDA version and system CUDA toolkit causing runtime incompatibility.

✓ Fix

Ensure PyTorch is installed with the correct CUDA version matching your system CUDA toolkit.

Code: broken vs fixed

Broken - triggers the error

python

from transformers import AutoModelForCausalLM
import bitsandbytes as bnb

model = AutoModelForCausalLM.from_pretrained(
    'model-name',
    load_in_4bit=True,  # triggers quantization setup
    device_map='auto'
)  # RuntimeError occurs here due to CUDA incompatibility

Fixed - works correctly

python

import os
from transformers import AutoModelForCausalLM
import bitsandbytes as bnb
import torch

os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # Ensure GPU is visible

# Confirm CUDA version and bitsandbytes version before loading
assert torch.version.cuda >= '11.7', 'CUDA 11.7+ required'
assert bnb.__version__ >= '0.37.0', 'Update bitsandbytes to >=0.37.0'

model = AutoModelForCausalLM.from_pretrained(
    'model-name',
    load_in_4bit=True,  # fixed: environment meets requirements
    device_map='auto'
)
print('Model loaded with 4bit quantization successfully')

Added environment checks and ensured CUDA and bitsandbytes versions meet minimum requirements to enable 4bit quantization without runtime errors.

⚠

Workaround

If upgrading CUDA or GPU is not possible, disable 4bit quantization by setting load_in_4bit=False and use full precision or 8bit quantization as fallback.

✓

Prevention

Use containerized environments with pre-installed compatible CUDA and bitsandbytes versions or automate environment validation scripts before running QLoRA to avoid setup errors.

Python 3.9+ · bitsandbytes >=0.37.0 · tested on 0.39.0

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.