How to beginner to intermediate · 3 min read

Fix Stable Diffusion slow generation

Quick answer
Fix slow generation in Stable Diffusion by enabling GPU acceleration with CUDA or ROCm, using optimized models like Stable Diffusion XL or quantized versions, and reducing num_inference_steps. Also, use efficient pipelines such as diffusers with torch.cuda.amp for mixed precision.

PREREQUISITES

  • Python 3.8+
  • pip install torch torchvision diffusers transformers accelerate
  • NVIDIA GPU with CUDA or AMD GPU with ROCm (optional but recommended)

Setup environment

Install the necessary Python packages and ensure your GPU drivers and CUDA toolkit are properly installed for hardware acceleration.

bash
pip install torch torchvision diffusers transformers accelerate

Step by step speedup

Use the diffusers library with GPU and mixed precision to speed up generation. Reduce num_inference_steps and use a smaller or optimized model.

python
import torch
from diffusers import StableDiffusionPipeline

# Load model with GPU and half precision
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

# Enable faster attention if available
pipe.enable_attention_slicing()

prompt = "A futuristic cityscape at sunset"

# Generate image with fewer steps for speed
image = pipe(prompt, num_inference_steps=20).images[0]

image.save("output.png")
print("Image generated and saved as output.png")
output
Image generated and saved as output.png

Common variations

  • Use Stable Diffusion XL for better speed-quality tradeoff.
  • Try quantized models (4-bit or 8-bit) with bitsandbytes for lower VRAM and faster inference.
  • Use accelerate to optimize device placement and mixed precision automatically.
  • Run inference asynchronously or batch multiple prompts for throughput.

Troubleshooting tips

  • If generation is still slow, verify GPU usage with nvidia-smi or system monitors.
  • Check that torch is installed with CUDA support: torch.cuda.is_available() should return True.
  • Disable CPU fallback by forcing device="cuda" in pipeline.
  • Reduce image resolution or batch size to improve speed.

Key Takeaways

  • Always enable GPU acceleration with CUDA or ROCm for Stable Diffusion inference.
  • Use optimized models and reduce num_inference_steps to speed up generation.
  • Leverage mixed precision (float16) and attention slicing to lower memory and increase speed.
Verified 2026-04 · runwayml/stable-diffusion-v1-5, Stable Diffusion XL
Verify ↗