How to beginner · 4 min read

How to deploy FastAPI LLM app with Docker

Q: How to deploy FastAPI LLM app with Docker

Use FastAPI to build your LLM app and containerize it with a Dockerfile that installs dependencies and exposes the app port. Then build and run the Docker image locally or deploy it to a cloud service for scalable LLM API hosting.

Quick answer

Use FastAPI to build your LLM app and containerize it with a Dockerfile that installs dependencies and exposes the app port. Then build and run the Docker image locally or deploy it to a cloud service for scalable LLM API hosting.

PREREQUISITES

Python 3.8+
Docker installed on your machine
OpenAI API key (free tier works)
pip install fastapi uvicorn openai

Setup

Install required Python packages and set your environment variable for the OpenAI API key.

Install FastAPI, Uvicorn, and OpenAI SDK:

bash

pip install fastapi uvicorn openai

Step by step

Create a simple FastAPI app that calls the OpenAI gpt-4o model to generate text completions. Then write a Dockerfile to containerize the app.

python

import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: PromptRequest):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": request.prompt}]
    )
    return {"response": response.choices[0].message.content}

# To run locally:
# uvicorn main:app --host 0.0.0.0 --port 8000

# Dockerfile content:
# FROM python:3.10-slim
# WORKDIR /app
# COPY requirements.txt ./
# RUN pip install --no-cache-dir -r requirements.txt
# COPY . .
# EXPOSE 8000
# CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Common variations

You can use asynchronous calls with FastAPI, switch to other models like gpt-4.1, or add streaming responses. Also, you can deploy the Docker container to cloud platforms like AWS ECS, Google Cloud Run, or Azure Container Instances.

python

# Async example with FastAPI and OpenAI
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate-async")
async def generate_text_async(request: PromptRequest):
    response = await client.chat.completions.acreate(
        model="gpt-4.1",
        messages=[{"role": "user", "content": request.prompt}]
    )
    return {"response": response.choices[0].message.content}

# To build Docker image:
# docker build -t fastapi-llm-app .

# To run Docker container:
# docker run -d -p 8000:8000 -e OPENAI_API_KEY="$OPENAI_API_KEY" fastapi-llm-app

Troubleshooting

If you get ModuleNotFoundError, ensure dependencies are installed in Dockerfile.
If the app fails to connect to OpenAI, verify your OPENAI_API_KEY environment variable is set correctly in Docker.
For port conflicts, confirm Docker port mapping matches your FastAPI uvicorn port.

✅

Key Takeaways

Use a minimal Dockerfile to containerize your FastAPI LLM app with dependencies.
Pass your OpenAI API key securely via environment variables when running the Docker container.
Test your app locally with uvicorn before containerizing to catch errors early.

Verified 2026-04 · gpt-4o, gpt-4.1

Verify ↗