How to Intermediate · 3 min read

How to use ControlNet with Diffusers

Quick answer
Use the diffusers library with the ControlNetModel class to load a ControlNet model and combine it with a Stable Diffusion pipeline. Provide conditioning inputs such as edge maps or poses to guide image generation. This enables precise control over the output while leveraging Diffusers' flexible API.

PREREQUISITES

  • Python 3.8+
  • pip install diffusers>=0.19.0 transformers accelerate
  • pip install opencv-python (for image preprocessing)
  • Hugging Face Hub access token (optional for private models)

Setup

Install the required Python packages and import necessary modules. You need diffusers for the pipeline, transformers for tokenizer support, and opencv-python for image preprocessing.

bash
pip install diffusers>=0.19.0 transformers accelerate opencv-python

Step by step

This example demonstrates loading a ControlNet model and a Stable Diffusion pipeline, preparing a conditioning image, and generating an image guided by ControlNet.

python
import os
import cv2
import torch
import numpy as np
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image

# Load ControlNet model (e.g., for Canny edge detection conditioning)
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16
)

# Load Stable Diffusion pipeline with ControlNet
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Load and preprocess conditioning image (Canny edges)
input_image = load_image("./input.jpg")  # Replace with your image path
image = cv2.cvtColor(np.array(input_image), cv2.COLOR_RGB2BGR)
edges = cv2.Canny(image, 100, 200)
edges = cv2.cvtColor(edges, cv2.COLOR_GRAY2RGB)
edges = edges.astype("float32") / 255.0

# Generate image with prompt and conditioning
prompt = "A fantasy landscape, detailed, vibrant colors"
output = pipe(prompt=prompt, image=edges, num_inference_steps=20, guidance_scale=7.5)

# Save output image
output.images[0].save("./output.png")
print("Image saved as output.png")
output
Image saved as output.png

Common variations

  • Use different ControlNet models for other conditioning types like pose, depth, or scribbles by changing the model ID in ControlNetModel.from_pretrained().
  • Run the pipeline asynchronously with pipe(prompt=..., image=..., return_dict=True) inside an async function.
  • Adjust num_inference_steps and guidance_scale for quality and creativity trade-offs.
  • Use CPU or mixed precision if GPU memory is limited by setting torch_dtype=torch.float32 or using pipe.enable_xformers_memory_efficient_attention().

Troubleshooting

  • If you get CUDA out-of-memory errors, reduce num_inference_steps or switch to CPU with pipe.to("cpu").
  • If the conditioning image is not influencing output, verify preprocessing matches the ControlNet model requirements (e.g., Canny edges for sd-controlnet-canny).
  • Ensure diffusers version is 0.19.0 or higher for ControlNet support.
  • For slow generation, enable pipe.enable_xformers_memory_efficient_attention() if your GPU supports it.

Key Takeaways

  • Use ControlNetModel with StableDiffusionControlNetPipeline from diffusers to guide image generation.
  • Preprocess conditioning images (edges, poses) to match the ControlNet model's expected input format.
  • Adjust inference steps and guidance scale to balance quality and speed.
  • Update diffusers to version 0.19.0+ for full ControlNet compatibility.
  • Use GPU with mixed precision for faster and memory-efficient generation.
Verified 2026-04 · lllyasviel/sd-controlnet-canny, runwayml/stable-diffusion-v1-5
Verify ↗