ControlNet depth control explained
Quick answer
ControlNet depth control uses a depth map as a conditioning input to guide Stable Diffusion models, enabling precise control over image structure and perspective. It works by feeding depth information extracted from an input image into a ControlNet model branch, which influences the generation process to respect the depth cues.
PREREQUISITES
Python 3.8+pip install diffusers controlnet_aux transformers torchBasic knowledge of Stable Diffusion and ControlNet
Setup
Install the necessary Python packages to use ControlNet with depth control in Stable Diffusion. You need diffusers for the model, controlnet_aux for depth map extraction, and torch for PyTorch support.
pip install diffusers controlnet_aux transformers torch Step by step
This example shows how to load a depth control ControlNet model, extract a depth map from an input image, and generate a new image with Stable Diffusion guided by the depth map.
import os
from PIL import Image
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import OpenposeDetector, HEDdetector, MLSDdetector, NormalBaeDetector, PidiNetDetector, MidasDetector
# Load ControlNet depth model
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-depth",
torch_dtype=torch.float16
)
# Load Stable Diffusion pipeline with ControlNet
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")
# Load input image for depth extraction
input_image = Image.open("input.jpg")
# Extract depth map using MidasDetector
depth_detector = MidasDetector()
depth_map = depth_detector(input_image)
# Generate image conditioned on depth map
prompt = "A futuristic cityscape with neon lights"
output = pipe(prompt=prompt, image=depth_map, num_inference_steps=20, guidance_scale=7.5)
# Save output image
output.images[0].save("output_depth_control.png") output
Saves generated image as output_depth_control.png
Common variations
You can use different depth detectors like NormalBaeDetector or PidiNetDetector depending on your input image and desired depth quality. Adjust guidance_scale to control how strongly the depth map influences generation. For asynchronous or streaming generation, use compatible APIs or frameworks that support async calls.
Troubleshooting
- If the output image ignores depth structure, increase guidance_scale or verify the depth map quality.
- If you get CUDA out-of-memory errors, reduce batch size or switch to torch_dtype=torch.float32.
- Ensure input images are RGB and properly sized (e.g., 512x512) for best results.
Key Takeaways
- ControlNet depth control guides Stable Diffusion using depth maps for structural accuracy.
- Use MidasDetector or similar to extract depth maps from input images.
- Adjust guidance_scale to balance depth influence on generation.
- Ensure compatible model and pipeline versions with torch_dtype=torch.float16 for efficiency.