RuntimeError
bitsandbytes.nn.Linear8bitLt.RuntimeError
Stack trace
RuntimeError: bitsandbytes linear layer not supported on this device or CUDA version
File "/app/model_quant.py", line 42, in load_quantized_model
model = bitsandbytes.nn.Linear8bitLt(...)
File "/usr/local/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 123, in __init__
raise RuntimeError("bitsandbytes linear layer not supported on this device or CUDA version") Why it happens
BitsAndBytes requires specific CUDA versions and GPU architectures to support its optimized 8-bit linear layers. If the environment's CUDA version, GPU driver, or hardware does not meet these requirements, or if bitsandbytes is outdated, this error is raised to prevent unsupported operations.
Detection
Check for RuntimeError exceptions during model loading or quantization steps, specifically looking for messages about bitsandbytes linear layer support. Logging the CUDA version and GPU info at startup helps preempt this error.
Causes & fixes
CUDA version installed is incompatible with bitsandbytes linear layer requirements
Upgrade or downgrade your CUDA toolkit to a version compatible with your bitsandbytes version, typically CUDA 11.7 or 11.8 for bitsandbytes 0.39+.
GPU hardware does not support the required features for bitsandbytes 8-bit linear layers
Use a supported GPU architecture (e.g., NVIDIA Ampere or newer) or switch to CPU quantization fallback if GPU is unsupported.
bitsandbytes package version is outdated or improperly installed
Upgrade bitsandbytes to the latest stable release using pip and ensure it is installed with CUDA support.
Running in an environment without GPU or CUDA drivers (e.g., CPU-only or incompatible container)
Run the quantization code on a machine with a compatible GPU and properly installed CUDA drivers.
Code: broken vs fixed
import bitsandbytes as bnb
# This line triggers the error if environment is incompatible
model = bnb.nn.Linear8bitLt(in_features=768, out_features=768)
print("Model loaded") import os
import bitsandbytes as bnb
# Set environment variables if needed for CUDA
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# Ensure bitsandbytes is installed with CUDA support and environment is compatible
model = bnb.nn.Linear8bitLt(in_features=768, out_features=768) # Fixed: run on supported CUDA/GPU
print("Model loaded successfully") Workaround
Catch the RuntimeError and fallback to using standard PyTorch linear layers or CPU-only quantization when bitsandbytes linear layers are unsupported.
Prevention
Validate CUDA version, GPU architecture, and bitsandbytes installation during deployment. Automate environment checks to prevent unsupported quantization attempts.