How to Intermediate · 3 min read

ControlNet depth control explained

Quick answer

ControlNet depth control uses a depth map as a conditioning input to guide Stable Diffusion models, enabling precise control over image structure and perspective. It works by feeding depth information extracted from an input image into a ControlNet model branch, which influences the generation process to respect the depth cues.

PREREQUISITES

Python 3.8+
pip install diffusers controlnet_aux transformers torch
Basic knowledge of Stable Diffusion and ControlNet

Setup

Install the necessary Python packages to use ControlNet with depth control in Stable Diffusion. You need diffusers for the model, controlnet_aux for depth map extraction, and torch for PyTorch support.

bash

pip install diffusers controlnet_aux transformers torch

Step by step

This example shows how to load a depth control ControlNet model, extract a depth map from an input image, and generate a new image with Stable Diffusion guided by the depth map.

python

import os
from PIL import Image
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import OpenposeDetector, HEDdetector, MLSDdetector, NormalBaeDetector, PidiNetDetector, MidasDetector

# Load ControlNet depth model
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-depth",
    torch_dtype=torch.float16
)

# Load Stable Diffusion pipeline with ControlNet
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")

# Load input image for depth extraction
input_image = Image.open("input.jpg")

# Extract depth map using MidasDetector
depth_detector = MidasDetector()
depth_map = depth_detector(input_image)

# Generate image conditioned on depth map
prompt = "A futuristic cityscape with neon lights"
output = pipe(prompt=prompt, image=depth_map, num_inference_steps=20, guidance_scale=7.5)

# Save output image
output.images[0].save("output_depth_control.png")

output

Saves generated image as output_depth_control.png

Common variations

You can use different depth detectors like NormalBaeDetector or PidiNetDetector depending on your input image and desired depth quality. Adjust guidance_scale to control how strongly the depth map influences generation. For asynchronous or streaming generation, use compatible APIs or frameworks that support async calls.

Troubleshooting

If the output image ignores depth structure, increase guidance_scale or verify the depth map quality.
If you get CUDA out-of-memory errors, reduce batch size or switch to torch_dtype=torch.float32.
Ensure input images are RGB and properly sized (e.g., 512x512) for best results.

✅

Key Takeaways

ControlNet depth control guides Stable Diffusion using depth maps for structural accuracy.
Use MidasDetector or similar to extract depth maps from input images.
Adjust guidance_scale to balance depth influence on generation.
Ensure compatible model and pipeline versions with torch_dtype=torch.float16 for efficiency.

Verified 2026-04 · lllyasviel/sd-controlnet-depth, runwayml/stable-diffusion-v1-5

Verify ↗