RuntimeError
torch.cuda.runtime.RuntimeError
Stack trace
RuntimeError: QLoRA compute dtype float16 and bfloat16 are incompatible on this device or configuration. Please set a consistent dtype for compute and model parameters.
Why it happens
QLoRA fine-tuning requires consistent compute dtypes across model parameters and optimizer states. Mixing float16 and bfloat16 on unsupported hardware or configurations leads to runtime errors due to incompatible tensor operations.
Detection
Monitor runtime logs for dtype mismatch errors during model loading or training initialization. Assert that all model and optimizer dtypes match before starting training.
Causes & fixes
Model parameters are loaded with float16 but optimizer or compute uses bfloat16
Explicitly set the compute_dtype parameter to float16 or bfloat16 consistently when loading the model and configuring the optimizer.
Hardware or CUDA version does not support bfloat16 compute operations
Switch to float16 compute dtype or upgrade hardware and CUDA drivers to versions supporting bfloat16.
BitsAndBytes or Transformers library version mismatch causing dtype incompatibility
Upgrade bitsandbytes and transformers to compatible versions that support consistent dtype usage in QLoRA.
Code: broken vs fixed
from transformers import AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
'model-name',
load_in_8bit=True,
device_map='auto'
)
# This triggers dtype conflict error during QLoRA fine-tuning import os
import torch
from transformers import AutoModelForCausalLM
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
model = AutoModelForCausalLM.from_pretrained(
'model-name',
load_in_8bit=True,
device_map='auto',
torch_dtype=torch.float16 # Set consistent compute dtype to float16
)
print('Model loaded with consistent compute dtype float16') Workaround
Catch the RuntimeError and reload the model with an explicit torch_dtype parameter set to float16 or bfloat16 depending on your hardware support.
Prevention
Always specify and verify consistent compute_dtype settings when loading models and configuring QLoRA fine-tuning to prevent dtype mismatch errors.