RuntimeError
torch.RuntimeError: Expected all tensors to be on the same device
Stack trace
Traceback (most recent call last):
File "generate.py", line 45, in <module>
images = pipe(prompt).images
File "/usr/local/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 412, in __call__
latents = self.scheduler.scale_model_input(latents, timestep)
File "/usr/local/lib/python3.9/site-packages/diffusers/schedulers/scheduling_ddim.py", line 124, in scale_model_input
return model_input * self.sigmas[timestep].to(device=model_input.device, dtype=model_input.dtype)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! Why it happens
When you load a diffusers pipeline with torch_dtype=torch.float16 or torch.bfloat16, the model weights are cast to that dtype. However, if the pipeline is on GPU (device='cuda') but intermediate tensors or scheduler tensors remain in float32 or on CPU, PyTorch raises an error because it cannot perform operations between tensors on different devices or with mismatched dtypes. This commonly happens when you specify torch_dtype in from_pretrained() but forget to move all components to the same device, or when scheduler/safety checker components aren't properly transferred.
Detection
Check your pipeline's device and dtype before calling it: add `print(next(pipe.unet.parameters()).device, next(pipe.unet.parameters()).dtype)` after initialization. Add dtype conversion to your scheduler via `pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, dtype=torch.float16)`.
Causes & fixes
torch_dtype specified in from_pretrained() but pipeline not moved to GPU device
Add .to('cuda') or .to(device, dtype=torch.float16) after loading to ensure ALL components (unet, vae, text_encoder, scheduler) move together with matching dtype
Scheduler has float32 sigmas/timesteps but model weights are float16
Recreate scheduler with matching dtype: `pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, dtype=torch.float16)` after pipeline creation
Safety checker or text encoder on CPU while unet on GPU
Explicitly move all pipeline components: `pipe = pipe.to('cuda')` or use `pipe.enable_attention_slicing()` + `pipe.to(device, dtype=torch.float16)` for each component
Using bfloat16 on non-Ampere GPU or when CUDA doesn't support it
Switch to torch.float16 instead: `pipe = StableDiffusionXLPipeline.from_pretrained(..., torch_dtype=torch.float16).to('cuda')`: float16 has wider hardware support than bfloat16
Code: broken vs fixed
import os
import torch
from diffusers import StableDiffusionXLPipeline
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
token = os.environ.get('HF_TOKEN')
# BROKEN: torch_dtype set but not moved to GPU
pipe = StableDiffusionXLPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
use_auth_token=token
)
# Missing: pipe = pipe.to('cuda')
prompt = 'A serene landscape with mountains and lakes'
images = pipe(prompt).images # ❌ RuntimeError: tensors on different devices
images[0].save('output.png') import os
import torch
from diffusers import StableDiffusionXLPipeline
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
token = os.environ.get('HF_TOKEN')
# FIXED: Load with dtype and move to GPU with matching dtype
pipe = StableDiffusionXLPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
use_auth_token=token
)
# FIX: Move entire pipeline to GPU with matching dtype
pipe = pipe.to('cuda', dtype=torch.float16)
prompt = 'A serene landscape with mountains and lakes'
images = pipe(prompt).images
images[0].save('output.png')
print('✓ Generated image successfully without device mismatch') Workaround
If you cannot upgrade diffusers, explicitly cast scheduler sigmas before pipeline call: `pipe.scheduler.sigmas = pipe.scheduler.sigmas.to('cuda', dtype=torch.float16)` and use `pipe.enable_attention_slicing()` to reduce memory pressure and avoid mixed-dtype intermediate tensors.
Prevention
Always pair torch_dtype with explicit .to(device, dtype) immediately after from_pretrained(). Use a helper function: `def load_pipeline(model_id, dtype=torch.float16): pipe = StableDiffusionXLPipeline.from_pretrained(model_id, torch_dtype=dtype); return pipe.to('cuda', dtype=dtype)` to enforce consistency across your codebase.