How to Intermediate · 3 min read

ControlNet pose control explained

Quick answer
ControlNet pose control is a technique that extends Stable Diffusion by conditioning the model on human pose data, enabling precise control over generated images based on pose skeleton inputs. It uses a pretrained pose estimator to extract keypoints and feeds this structured pose information into the diffusion process via a ControlNet model.

PREREQUISITES

  • Python 3.8+
  • pip install diffusers>=0.18.0
  • pip install opencv-python
  • pip install mediapipe
  • Access to a Stable Diffusion model with ControlNet support

Setup

Install necessary Python packages for pose detection and ControlNet integration with Stable Diffusion. You need diffusers for Stable Diffusion and ControlNet, mediapipe for pose estimation, and opencv-python for image processing.

bash
pip install diffusers[torch] mediapipe opencv-python

Step by step

This example shows how to use ControlNet pose control to generate an image conditioned on a human pose skeleton extracted from an input image.

python
import os
import cv2
import mediapipe as mp
import numpy as np
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

# Load pose estimation model from MediaPipe
mp_pose = mp.solutions.pose
pose = mp_pose.Pose(static_image_mode=True, min_detection_confidence=0.5)

# Load ControlNet pose model and Stable Diffusion pipeline
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16
)
pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)
pipeline.to("cuda")

# Load input image and detect pose
image_path = "input_pose.jpg"  # Replace with your pose image path
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pose.process(image_rgb)

# Create blank image for pose skeleton
pose_image = np.zeros_like(image_rgb)

if results.pose_landmarks:
    for landmark in results.pose_landmarks.landmark:
        x = int(landmark.x * image.shape[1])
        y = int(landmark.y * image.shape[0])
        cv2.circle(pose_image, (x, y), 5, (255, 255, 255), -1)

# Convert pose image to PIL Image
from PIL import Image
pose_pil = Image.fromarray(pose_image)

# Generate image with ControlNet pose conditioning
prompt = "A dancer in a dynamic pose, digital art"
output = pipeline(prompt=prompt, image=pose_pil, num_inference_steps=30, guidance_scale=7.5)

# Save output
output.images[0].save("output_pose_control.png")
print("Image saved as output_pose_control.png")
output
Image saved as output_pose_control.png

Common variations

You can use asynchronous calls with the pipeline if supported, adjust num_inference_steps for speed vs quality, or swap the ControlNet model for other conditioning types like depth or scribble. Also, you can preprocess pose images differently or use other pose estimation libraries.

python
import asyncio

async def generate_async():
    output = await pipeline.apredict(
        prompt="A ballet dancer on stage",
        image=pose_pil,
        num_inference_steps=20,
        guidance_scale=8.0
    )
    output.images[0].save("output_async.png")
    print("Async image saved as output_async.png")

asyncio.run(generate_async())
output
Async image saved as output_async.png

Troubleshooting

  • If pose landmarks are not detected, ensure the input image clearly shows a full human figure and is well-lit.
  • If GPU memory errors occur, reduce num_inference_steps or switch to a smaller Stable Diffusion model.
  • For installation issues, verify package versions and Python environment compatibility.

Key Takeaways

  • Use ControlNet pose control to guide Stable Diffusion image generation with human pose skeleton inputs.
  • Extract pose keypoints using MediaPipe or similar libraries to create conditioning images.
  • Adjust inference steps and guidance scale to balance quality and speed.
  • ControlNet pose models require pose skeleton images as input, not raw photos.
  • Troubleshoot pose detection by ensuring clear, well-lit input images.
Verified 2026-04 · lllyasviel/sd-controlnet-openpose, runwayml/stable-diffusion-v1-5
Verify ↗