How to build image generation app with Python
Direct answer
Use the
diffusers library with StableDiffusionPipeline in Python to build an image generation app that inputs text prompts and outputs images.Setup
Install
pip install diffusers[torch] transformers accelerate scipy safetensors Env vars
HUGGINGFACE_TOKEN Imports
from diffusers import StableDiffusionPipeline
import torch
import os Examples
inA futuristic cityscape at sunset
outGenerates a high-resolution image depicting a futuristic cityscape with warm sunset colors.
inA cute cat wearing a wizard hat
outProduces an image of an adorable cat dressed as a wizard with a pointed hat and magical background.
inAn astronaut riding a horse on Mars
outCreates a surreal image of an astronaut on horseback on the red Martian surface.
Integration steps
- Install the required Python packages including diffusers and torch.
- Set the Hugging Face API token in the environment variable
HUGGINGFACE_TOKEN. - Import
StableDiffusionPipelineand initialize it with the model and token. - Use the pipeline to generate images by passing text prompts.
- Save or display the generated images in your app interface.
Full code
import os
from diffusers import StableDiffusionPipeline
import torch
hf_token = os.environ.get("HUGGINGFACE_TOKEN")
if not hf_token:
raise ValueError("Set the HUGGINGFACE_TOKEN environment variable.")
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")
def generate_image(prompt: str, output_path: str = "output.png"):
image = pipe(prompt).images[0]
image.save(output_path)
print(f"Image saved to {output_path}")
generate_image("A futuristic cityscape at sunset") output
Image saved to output.png
API trace
Request
{"model_id": "runwayml/stable-diffusion-v1-5", "prompt": "A futuristic cityscape at sunset"} Response
{"images": ["<PIL.Image object>"]} Extract
pipe(prompt).images[0]Variants
Generation with progress callback ›
Use when you want to show progress to users during long image generation.
import os
from diffusers import StableDiffusionPipeline
import torch
hf_token = os.environ.get("HUGGINGFACE_TOKEN")
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")
def progress_callback(step: int, timestep: int, latents):
print(f"Step {step} at timestep {timestep}")
image = pipe("A cute cat wearing a wizard hat", callback=progress_callback).images[0]
image.save("cat_wizard.png") Async generation with asyncio ›
Use in async Python applications to avoid blocking the event loop.
import os
import asyncio
from diffusers import StableDiffusionPipeline
import torch
async def async_generate():
hf_token = os.environ.get("HUGGINGFACE_TOKEN")
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")
image = await asyncio.to_thread(pipe, "An astronaut riding a horse on Mars")
image.images[0].save("astronaut_mars.png")
asyncio.run(async_generate()) Use SD 2 base for faster generation ›
Use SD 2 base for faster inference with slightly lower fidelity.
import os
from diffusers import StableDiffusionPipeline
import torch
hf_token = os.environ.get("HUGGINGFACE_TOKEN")
pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-base",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")
image = pipe("A beautiful mountain landscape").images[0]
image.save("mountain.png") Performance
Latency~5-15 seconds per 512x512 image on a modern GPU
CostFree for local runs; cloud GPU costs vary by provider
Rate limitsNo rate limits when running locally
- Cache the pipeline object to avoid repeated model loading.
- Use torch_dtype=torch.float16 for faster GPU inference.
- Use lower resolution or smaller models for faster results.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Local GPU with diffusers | ~5-15s | Free (hardware cost) | Full control, no API limits |
| Hugging Face Inference API | ~3-10s | Paid per call | Quick setup, no local GPU needed |
| Lightweight models (SD 2 base) | ~2-5s | Free or cheaper | Faster generation, lower quality |
Quick tip
Cache the pipeline object outside your generation function to avoid reloading the model on every call.
Common mistake
Forgetting to set HUGGINGFACE_TOKEN or calling pipe.to() before generating causes authentication or device errors.