Critical severity intermediate · Fix: 5-15 min

RuntimeError

bitsandbytes.cuda_setup.RuntimeError

What this error means

This error occurs when bitsandbytes fails to initialize CUDA for 4bit quantization due to missing or incompatible GPU drivers or CUDA toolkit.

Stack trace

traceback

RuntimeError: CUDA setup failed: bitsandbytes requires a compatible NVIDIA GPU with CUDA installed and properly configured.
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup.py", line 45, in initialize_cuda
    raise RuntimeError("CUDA setup failed: incompatible or missing CUDA environment.")

QUICK FIX

Verify CUDA installation and GPU availability with torch.cuda.is_available() and upgrade bitsandbytes to the latest version.

Why it happens

BitsAndBytes relies on CUDA to perform efficient 4bit quantization on NVIDIA GPUs. This error happens when the CUDA driver or toolkit is missing, incompatible, or not properly configured, preventing bitsandbytes from initializing GPU acceleration.

Detection

Check for RuntimeError exceptions during model loading or quantization initialization; verify CUDA availability with torch.cuda.is_available() before using bitsandbytes.

Causes & fixes

CUDA driver or toolkit is not installed or incompatible with bitsandbytes requirements.

✓ Fix

Install the correct NVIDIA CUDA driver and CUDA toolkit version compatible with your GPU and bitsandbytes version.

No NVIDIA GPU detected or GPU is unsupported by bitsandbytes 4bit quantization.

✓ Fix

Ensure your system has a supported NVIDIA GPU; bitsandbytes 4bit quantization requires a CUDA-capable GPU.

bitsandbytes package version is outdated and incompatible with current CUDA setup.

✓ Fix

Upgrade bitsandbytes to the latest version using pip install --upgrade bitsandbytes.

Environment variables like CUDA_HOME or PATH do not point to the correct CUDA installation.

✓ Fix

Set environment variables CUDA_HOME and PATH to the correct CUDA toolkit installation directories.

Code: broken vs fixed

Broken - triggers the error

python

from transformers import AutoModelForCausalLM
import bitsandbytes as bnb

model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    load_in_4bit=True,  # triggers bitsandbytes 4bit quantization
    device_map='auto'
)  # RuntimeError: CUDA setup failed here

Fixed - works correctly

python

import os
import torch
from transformers import AutoModelForCausalLM
import bitsandbytes as bnb

# Ensure CUDA environment variables are set
os.environ['CUDA_HOME'] = '/usr/local/cuda'
os.environ['PATH'] += ':/usr/local/cuda/bin'

if not torch.cuda.is_available():
    raise RuntimeError("CUDA is not available. Please install CUDA and NVIDIA drivers.")

model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    load_in_4bit=True,  # bitsandbytes 4bit quantization
    device_map='auto'
)
print("Model loaded with 4bit quantization on CUDA successfully.")  # Fixed CUDA setup

Added CUDA environment variable setup and a torch.cuda.is_available() check to ensure CUDA is properly configured before loading the model with bitsandbytes 4bit quantization.

⚠

Workaround

If CUDA setup cannot be fixed immediately, disable 4bit quantization by setting load_in_4bit=False or run the model on CPU to avoid the CUDA setup error temporarily.

✓

Prevention

Use containerized environments with pre-installed compatible CUDA drivers and bitsandbytes versions, and automate CUDA compatibility checks during deployment to prevent setup errors.

Python 3.9+ · bitsandbytes >=0.39.0 · tested on 0.39.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.