Fix Stable Diffusion slow generation
Quick answer
Fix slow generation in
Stable Diffusion by enabling GPU acceleration with CUDA or ROCm, using optimized models like Stable Diffusion XL or quantized versions, and reducing num_inference_steps. Also, use efficient pipelines such as diffusers with torch.cuda.amp for mixed precision.PREREQUISITES
Python 3.8+pip install torch torchvision diffusers transformers accelerateNVIDIA GPU with CUDA or AMD GPU with ROCm (optional but recommended)
Setup environment
Install the necessary Python packages and ensure your GPU drivers and CUDA toolkit are properly installed for hardware acceleration.
pip install torch torchvision diffusers transformers accelerate Step by step speedup
Use the diffusers library with GPU and mixed precision to speed up generation. Reduce num_inference_steps and use a smaller or optimized model.
import torch
from diffusers import StableDiffusionPipeline
# Load model with GPU and half precision
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
# Enable faster attention if available
pipe.enable_attention_slicing()
prompt = "A futuristic cityscape at sunset"
# Generate image with fewer steps for speed
image = pipe(prompt, num_inference_steps=20).images[0]
image.save("output.png")
print("Image generated and saved as output.png") output
Image generated and saved as output.png
Common variations
- Use
Stable Diffusion XLfor better speed-quality tradeoff. - Try quantized models (4-bit or 8-bit) with
bitsandbytesfor lower VRAM and faster inference. - Use
accelerateto optimize device placement and mixed precision automatically. - Run inference asynchronously or batch multiple prompts for throughput.
Troubleshooting tips
- If generation is still slow, verify GPU usage with
nvidia-smior system monitors. - Check that
torchis installed with CUDA support:torch.cuda.is_available()should returnTrue. - Disable CPU fallback by forcing
device="cuda"in pipeline. - Reduce image resolution or batch size to improve speed.
Key Takeaways
- Always enable GPU acceleration with CUDA or ROCm for Stable Diffusion inference.
- Use optimized models and reduce
num_inference_stepsto speed up generation. - Leverage mixed precision (float16) and attention slicing to lower memory and increase speed.