High severity intermediate · Fix: 5-10 min

RuntimeError

bitsandbytes.nn.Linear8bitLt.RuntimeError

What this error means
This error occurs when bitsandbytes attempts to use a linear layer type that is not supported by the installed CUDA or bitsandbytes version during quantization.

Stack trace

traceback
RuntimeError: bitsandbytes linear layer not supported on this device or CUDA version
  File "/app/model_quant.py", line 42, in load_quantized_model
    model = bitsandbytes.nn.Linear8bitLt(...)
  File "/usr/local/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 123, in __init__
    raise RuntimeError("bitsandbytes linear layer not supported on this device or CUDA version")
QUICK FIX
Ensure your environment has a compatible CUDA version and GPU, then upgrade bitsandbytes to the latest version with CUDA support.

Why it happens

BitsAndBytes requires specific CUDA versions and GPU architectures to support its optimized 8-bit linear layers. If the environment's CUDA version, GPU driver, or hardware does not meet these requirements, or if bitsandbytes is outdated, this error is raised to prevent unsupported operations.

Detection

Check for RuntimeError exceptions during model loading or quantization steps, specifically looking for messages about bitsandbytes linear layer support. Logging the CUDA version and GPU info at startup helps preempt this error.

Causes & fixes

1

CUDA version installed is incompatible with bitsandbytes linear layer requirements

✓ Fix

Upgrade or downgrade your CUDA toolkit to a version compatible with your bitsandbytes version, typically CUDA 11.7 or 11.8 for bitsandbytes 0.39+.

2

GPU hardware does not support the required features for bitsandbytes 8-bit linear layers

✓ Fix

Use a supported GPU architecture (e.g., NVIDIA Ampere or newer) or switch to CPU quantization fallback if GPU is unsupported.

3

bitsandbytes package version is outdated or improperly installed

✓ Fix

Upgrade bitsandbytes to the latest stable release using pip and ensure it is installed with CUDA support.

4

Running in an environment without GPU or CUDA drivers (e.g., CPU-only or incompatible container)

✓ Fix

Run the quantization code on a machine with a compatible GPU and properly installed CUDA drivers.

Code: broken vs fixed

Broken - triggers the error
python
import bitsandbytes as bnb

# This line triggers the error if environment is incompatible
model = bnb.nn.Linear8bitLt(in_features=768, out_features=768)
print("Model loaded")
Fixed - works correctly
python
import os
import bitsandbytes as bnb

# Set environment variables if needed for CUDA
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Ensure bitsandbytes is installed with CUDA support and environment is compatible
model = bnb.nn.Linear8bitLt(in_features=768, out_features=768)  # Fixed: run on supported CUDA/GPU
print("Model loaded successfully")
Added environment setup and ensured bitsandbytes runs on a compatible CUDA-enabled GPU to avoid the unsupported linear layer error.

Workaround

Catch the RuntimeError and fallback to using standard PyTorch linear layers or CPU-only quantization when bitsandbytes linear layers are unsupported.

Prevention

Validate CUDA version, GPU architecture, and bitsandbytes installation during deployment. Automate environment checks to prevent unsupported quantization attempts.

Python 3.9+ · bitsandbytes >=0.39.0 · tested on 0.39.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.