High severity intermediate · Fix: 2-5 min

RuntimeError

torch.RuntimeError: Expected all tensors to be on the same device

What this error means

Diffusers pipeline has model weights in float16/bfloat16 but input tensors are in a different dtype or on a different device (CPU vs GPU), causing a tensor operation mismatch.

Stack trace

traceback

Traceback (most recent call last):
  File "generate.py", line 45, in <module>
    images = pipe(prompt).images
  File "/usr/local/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 412, in __call__
    latents = self.scheduler.scale_model_input(latents, timestep)
  File "/usr/local/lib/python3.9/site-packages/diffusers/schedulers/scheduling_ddim.py", line 124, in scale_model_input
    return model_input * self.sigmas[timestep].to(device=model_input.device, dtype=model_input.dtype)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

QUICK FIX

Call `pipe = pipe.to('cuda', dtype=torch.float16)` immediately after loading the pipeline to ensure all weights and buffers are on the same device with matching dtype.

Why it happens

When you load a diffusers pipeline with torch_dtype=torch.float16 or torch.bfloat16, the model weights are cast to that dtype. However, if the pipeline is on GPU (device='cuda') but intermediate tensors or scheduler tensors remain in float32 or on CPU, PyTorch raises an error because it cannot perform operations between tensors on different devices or with mismatched dtypes. This commonly happens when you specify torch_dtype in from_pretrained() but forget to move all components to the same device, or when scheduler/safety checker components aren't properly transferred.

Detection

Check your pipeline's device and dtype before calling it: add `print(next(pipe.unet.parameters()).device, next(pipe.unet.parameters()).dtype)` after initialization. Add dtype conversion to your scheduler via `pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, dtype=torch.float16)`.

Causes & fixes

torch_dtype specified in from_pretrained() but pipeline not moved to GPU device

✓ Fix

Add .to('cuda') or .to(device, dtype=torch.float16) after loading to ensure ALL components (unet, vae, text_encoder, scheduler) move together with matching dtype

Scheduler has float32 sigmas/timesteps but model weights are float16

✓ Fix

Recreate scheduler with matching dtype: `pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, dtype=torch.float16)` after pipeline creation

Safety checker or text encoder on CPU while unet on GPU

✓ Fix

Explicitly move all pipeline components: `pipe = pipe.to('cuda')` or use `pipe.enable_attention_slicing()` + `pipe.to(device, dtype=torch.float16)` for each component

Using bfloat16 on non-Ampere GPU or when CUDA doesn't support it

✓ Fix

Switch to torch.float16 instead: `pipe = StableDiffusionXLPipeline.from_pretrained(..., torch_dtype=torch.float16).to('cuda')`: float16 has wider hardware support than bfloat16

Code: broken vs fixed

Broken - triggers the error

python

import os
import torch
from diffusers import StableDiffusionXLPipeline

model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
token = os.environ.get('HF_TOKEN')

# BROKEN: torch_dtype set but not moved to GPU
pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_auth_token=token
)
# Missing: pipe = pipe.to('cuda')

prompt = 'A serene landscape with mountains and lakes'
images = pipe(prompt).images  # ❌ RuntimeError: tensors on different devices
images[0].save('output.png')

Fixed - works correctly

python

import os
import torch
from diffusers import StableDiffusionXLPipeline

model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
token = os.environ.get('HF_TOKEN')

# FIXED: Load with dtype and move to GPU with matching dtype
pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_auth_token=token
)
# FIX: Move entire pipeline to GPU with matching dtype
pipe = pipe.to('cuda', dtype=torch.float16)

prompt = 'A serene landscape with mountains and lakes'
images = pipe(prompt).images
images[0].save('output.png')
print('✓ Generated image successfully without device mismatch')

The .to('cuda', dtype=torch.float16) call moves ALL pipeline components (unet, vae, text_encoder, scheduler) to GPU with matching float16 dtype, ensuring tensors operate on the same device.

⚠

Workaround

If you cannot upgrade diffusers, explicitly cast scheduler sigmas before pipeline call: `pipe.scheduler.sigmas = pipe.scheduler.sigmas.to('cuda', dtype=torch.float16)` and use `pipe.enable_attention_slicing()` to reduce memory pressure and avoid mixed-dtype intermediate tensors.

✓

Prevention

Always pair torch_dtype with explicit .to(device, dtype) immediately after from_pretrained(). Use a helper function: `def load_pipeline(model_id, dtype=torch.float16): pipe = StableDiffusionXLPipeline.from_pretrained(model_id, torch_dtype=dtype); return pipe.to('cuda', dtype=dtype)` to enforce consistency across your codebase.

Python 3.9+ · diffusers >=0.21.0 · tested on 0.27.x

Verified 2026-04 · stabilityai/stable-diffusion-xl-base-1.0, stabilityai/stable-diffusion-v1-5

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.