RuntimeError
bitsandbytes.cuda_setup.RuntimeError: CUDA setup not initialized
Stack trace
RuntimeError: CUDA setup not initialized
File "/usr/local/lib/python3.9/site-packages/bitsandbytes/cuda_setup.py", line 45, in <module>
raise RuntimeError("CUDA setup not initialized")
File "/app/model_quantize.py", line 12, in <module>
import bitsandbytes as bnb # triggers CUDA init error
Why it happens
BitsAndBytes requires a properly configured CUDA environment to perform GPU quantization. This error occurs when CUDA drivers are missing, incompatible, or the GPU is not accessible, preventing bitsandbytes from initializing CUDA context.
Detection
Check for RuntimeError with message 'CUDA setup not initialized' during bitsandbytes import or quantization calls; monitor logs for CUDA driver or device availability errors.
Causes & fixes
CUDA drivers are not installed or incompatible with the GPU hardware
Install the correct CUDA drivers version compatible with your GPU and bitsandbytes requirements, then reboot the system.
bitsandbytes installed without GPU support or on a CPU-only environment
Ensure bitsandbytes is installed with GPU support and run on a machine with a CUDA-capable GPU.
Environment variables like CUDA_VISIBLE_DEVICES are misconfigured, hiding GPUs
Unset or correctly set CUDA_VISIBLE_DEVICES to expose the GPU devices to bitsandbytes.
Running inside a container without GPU passthrough or missing NVIDIA runtime
Configure the container with NVIDIA Container Toolkit and enable GPU passthrough to allow CUDA initialization.
Code: broken vs fixed
import bitsandbytes as bnb # triggers RuntimeError: CUDA setup not initialized
quantized_model = bnb.nn.Linear8bitLt(768, 768) # example quantization call import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Ensure GPU 0 is visible
import bitsandbytes as bnb # CUDA initializes correctly now
quantized_model = bnb.nn.Linear8bitLt(768, 768) # quantization works
print('Quantization initialized successfully') Workaround
If immediate CUDA fix is impossible, run quantization on CPU-only fallback or disable bitsandbytes GPU features temporarily to avoid the error.
Prevention
Use container images or environments preconfigured with compatible CUDA drivers and NVIDIA runtimes, and validate GPU access before running quantization code.