Code beginner · 3 min read

How to build image generation app with Python

Direct answer
Use the diffusers library with StableDiffusionPipeline in Python to build an image generation app that inputs text prompts and outputs images.

Setup

Install
bash
pip install diffusers[torch] transformers accelerate scipy safetensors
Env vars
HUGGINGFACE_TOKEN
Imports
python
from diffusers import StableDiffusionPipeline
import torch
import os

Examples

inA futuristic cityscape at sunset
outGenerates a high-resolution image depicting a futuristic cityscape with warm sunset colors.
inA cute cat wearing a wizard hat
outProduces an image of an adorable cat dressed as a wizard with a pointed hat and magical background.
inAn astronaut riding a horse on Mars
outCreates a surreal image of an astronaut on horseback on the red Martian surface.

Integration steps

  1. Install the required Python packages including diffusers and torch.
  2. Set the Hugging Face API token in the environment variable HUGGINGFACE_TOKEN.
  3. Import StableDiffusionPipeline and initialize it with the model and token.
  4. Use the pipeline to generate images by passing text prompts.
  5. Save or display the generated images in your app interface.

Full code

python
import os
from diffusers import StableDiffusionPipeline
import torch

hf_token = os.environ.get("HUGGINGFACE_TOKEN")
if not hf_token:
    raise ValueError("Set the HUGGINGFACE_TOKEN environment variable.")

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")

def generate_image(prompt: str, output_path: str = "output.png"):
    image = pipe(prompt).images[0]
    image.save(output_path)
    print(f"Image saved to {output_path}")

generate_image("A futuristic cityscape at sunset")
output
Image saved to output.png

API trace

Request
json
{"model_id": "runwayml/stable-diffusion-v1-5", "prompt": "A futuristic cityscape at sunset"}
Response
json
{"images": ["<PIL.Image object>"]}
Extractpipe(prompt).images[0]

Variants

Generation with progress callback

Use when you want to show progress to users during long image generation.

python
import os
from diffusers import StableDiffusionPipeline
import torch

hf_token = os.environ.get("HUGGINGFACE_TOKEN")
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")

def progress_callback(step: int, timestep: int, latents):
    print(f"Step {step} at timestep {timestep}")

image = pipe("A cute cat wearing a wizard hat", callback=progress_callback).images[0]
image.save("cat_wizard.png")
Async generation with asyncio

Use in async Python applications to avoid blocking the event loop.

python
import os
import asyncio
from diffusers import StableDiffusionPipeline
import torch

async def async_generate():
    hf_token = os.environ.get("HUGGINGFACE_TOKEN")
    pipe = StableDiffusionPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        torch_dtype=torch.float16
    )
    pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")
    image = await asyncio.to_thread(pipe, "An astronaut riding a horse on Mars")
    image.images[0].save("astronaut_mars.png")

asyncio.run(async_generate())
Use SD 2 base for faster generation

Use SD 2 base for faster inference with slightly lower fidelity.

python
import os
from diffusers import StableDiffusionPipeline
import torch

hf_token = os.environ.get("HUGGINGFACE_TOKEN")
pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-base",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")
image = pipe("A beautiful mountain landscape").images[0]
image.save("mountain.png")

Performance

Latency~5-15 seconds per 512x512 image on a modern GPU
CostFree for local runs; cloud GPU costs vary by provider
Rate limitsNo rate limits when running locally
  • Cache the pipeline object to avoid repeated model loading.
  • Use torch_dtype=torch.float16 for faster GPU inference.
  • Use lower resolution or smaller models for faster results.
ApproachLatencyCost/callBest for
Local GPU with diffusers~5-15sFree (hardware cost)Full control, no API limits
Hugging Face Inference API~3-10sPaid per callQuick setup, no local GPU needed
Lightweight models (SD 2 base)~2-5sFree or cheaperFaster generation, lower quality

Quick tip

Cache the pipeline object outside your generation function to avoid reloading the model on every call.

Common mistake

Forgetting to set HUGGINGFACE_TOKEN or calling pipe.to() before generating causes authentication or device errors.

Verified 2026-04 · runwayml/stable-diffusion-v1-5, stabilityai/stable-diffusion-2-base
Verify ↗