How to beginner · 4 min read

How to deploy FastAPI LLM app with Docker

Quick answer
Use FastAPI to build your LLM app and containerize it with a Dockerfile that installs dependencies and exposes the app port. Then build and run the Docker image locally or deploy it to a cloud service for scalable LLM API hosting.

PREREQUISITES

  • Python 3.8+
  • Docker installed on your machine
  • OpenAI API key (free tier works)
  • pip install fastapi uvicorn openai

Setup

Install required Python packages and set your environment variable for the OpenAI API key.

  • Install FastAPI, Uvicorn, and OpenAI SDK:
bash
pip install fastapi uvicorn openai

Step by step

Create a simple FastAPI app that calls the OpenAI gpt-4o model to generate text completions. Then write a Dockerfile to containerize the app.

python
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: PromptRequest):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": request.prompt}]
    )
    return {"response": response.choices[0].message.content}

# To run locally:
# uvicorn main:app --host 0.0.0.0 --port 8000

# Dockerfile content:
# FROM python:3.10-slim
# WORKDIR /app
# COPY requirements.txt ./
# RUN pip install --no-cache-dir -r requirements.txt
# COPY . .
# EXPOSE 8000
# CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Common variations

You can use asynchronous calls with FastAPI, switch to other models like gpt-4.1, or add streaming responses. Also, you can deploy the Docker container to cloud platforms like AWS ECS, Google Cloud Run, or Azure Container Instances.

python
# Async example with FastAPI and OpenAI
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate-async")
async def generate_text_async(request: PromptRequest):
    response = await client.chat.completions.acreate(
        model="gpt-4.1",
        messages=[{"role": "user", "content": request.prompt}]
    )
    return {"response": response.choices[0].message.content}

# To build Docker image:
# docker build -t fastapi-llm-app .

# To run Docker container:
# docker run -d -p 8000:8000 -e OPENAI_API_KEY="$OPENAI_API_KEY" fastapi-llm-app

Troubleshooting

  • If you get ModuleNotFoundError, ensure dependencies are installed in Dockerfile.
  • If the app fails to connect to OpenAI, verify your OPENAI_API_KEY environment variable is set correctly in Docker.
  • For port conflicts, confirm Docker port mapping matches your FastAPI uvicorn port.

Key Takeaways

  • Use a minimal Dockerfile to containerize your FastAPI LLM app with dependencies.
  • Pass your OpenAI API key securely via environment variables when running the Docker container.
  • Test your app locally with uvicorn before containerizing to catch errors early.
Verified 2026-04 · gpt-4o, gpt-4.1
Verify ↗