How to Intermediate · 3 min read

How to deploy FastAPI LLM app on GCP

Quick answer
Deploy a FastAPI app integrated with an LLM by containerizing it with Docker, then deploying on Google Cloud Run or Google Kubernetes Engine. Use environment variables for your OpenAI API key and configure the service to scale automatically on GCP.

PREREQUISITES

  • Python 3.8+
  • Docker installed
  • Google Cloud SDK installed and configured
  • GCP project with billing enabled
  • OpenAI API key (free tier works)
  • pip install fastapi uvicorn openai

Setup

Install required Python packages and Google Cloud SDK. Set environment variables for your API keys and GCP project.

bash
pip install fastapi uvicorn openai

# Install Google Cloud SDK from https://cloud.google.com/sdk/docs/install

# Authenticate with GCP
 gcloud auth login
 gcloud config set project YOUR_GCP_PROJECT_ID

# Set environment variable for OpenAI API key
 export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]

Step by step

Create a FastAPI app that calls the OpenAI API using the official openai SDK. Containerize the app with Docker and deploy it to Google Cloud Run.

python
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: PromptRequest):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": request.prompt}]
    )
    return {"response": response.choices[0].message.content}

# Dockerfile
# FROM python:3.10-slim
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install -r requirements.txt
# COPY . .
# CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

# Build and deploy commands
# docker build -t gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app .
# docker push gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app
# gcloud run deploy fastapi-llm-app --image gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app --platform managed --region us-central1 --allow-unauthenticated

Common variations

  • Use async calls with httpx for non-blocking OpenAI requests.
  • Deploy on Google Kubernetes Engine (GKE) for more control over scaling.
  • Switch model to gpt-4o-mini for cost savings.
  • Use environment variables in GCP Secret Manager for better security.

Troubleshooting

  • If deployment fails with permission errors, verify your GCP IAM roles include Cloud Run Admin and Storage Admin.
  • If API calls time out, check network egress settings and increase Cloud Run timeout.
  • For authentication errors, confirm OPENAI_API_KEY is correctly set in Cloud Run environment variables.

Key Takeaways

  • Containerize your FastAPI LLM app with Docker for seamless GCP deployment.
  • Use Google Cloud Run for scalable, serverless hosting with minimal management.
  • Secure API keys using environment variables or GCP Secret Manager.
  • Adjust model choice and deployment platform based on cost and scale needs.
Verified 2026-04 · gpt-4o-mini
Verify ↗