RuntimeError
bitsandbytes.cuda_setup.RuntimeError
Stack trace
RuntimeError: CUDA setup failed: bitsandbytes requires a compatible NVIDIA GPU with CUDA installed and properly configured.
File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup.py", line 45, in initialize_cuda
raise RuntimeError("CUDA setup failed: incompatible or missing CUDA environment.") Why it happens
BitsAndBytes relies on CUDA to perform efficient 4bit quantization on NVIDIA GPUs. This error happens when the CUDA driver or toolkit is missing, incompatible, or not properly configured, preventing bitsandbytes from initializing GPU acceleration.
Detection
Check for RuntimeError exceptions during model loading or quantization initialization; verify CUDA availability with torch.cuda.is_available() before using bitsandbytes.
Causes & fixes
CUDA driver or toolkit is not installed or incompatible with bitsandbytes requirements.
Install the correct NVIDIA CUDA driver and CUDA toolkit version compatible with your GPU and bitsandbytes version.
No NVIDIA GPU detected or GPU is unsupported by bitsandbytes 4bit quantization.
Ensure your system has a supported NVIDIA GPU; bitsandbytes 4bit quantization requires a CUDA-capable GPU.
bitsandbytes package version is outdated and incompatible with current CUDA setup.
Upgrade bitsandbytes to the latest version using pip install --upgrade bitsandbytes.
Environment variables like CUDA_HOME or PATH do not point to the correct CUDA installation.
Set environment variables CUDA_HOME and PATH to the correct CUDA toolkit installation directories.
Code: broken vs fixed
from transformers import AutoModelForCausalLM
import bitsandbytes as bnb
model = AutoModelForCausalLM.from_pretrained(
"gpt2",
load_in_4bit=True, # triggers bitsandbytes 4bit quantization
device_map='auto'
) # RuntimeError: CUDA setup failed here import os
import torch
from transformers import AutoModelForCausalLM
import bitsandbytes as bnb
# Ensure CUDA environment variables are set
os.environ['CUDA_HOME'] = '/usr/local/cuda'
os.environ['PATH'] += ':/usr/local/cuda/bin'
if not torch.cuda.is_available():
raise RuntimeError("CUDA is not available. Please install CUDA and NVIDIA drivers.")
model = AutoModelForCausalLM.from_pretrained(
"gpt2",
load_in_4bit=True, # bitsandbytes 4bit quantization
device_map='auto'
)
print("Model loaded with 4bit quantization on CUDA successfully.") # Fixed CUDA setup Workaround
If CUDA setup cannot be fixed immediately, disable 4bit quantization by setting load_in_4bit=False or run the model on CPU to avoid the CUDA setup error temporarily.
Prevention
Use containerized environments with pre-installed compatible CUDA drivers and bitsandbytes versions, and automate CUDA compatibility checks during deployment to prevent setup errors.