How to Intermediate · 3 min read

How to deploy FastAPI LLM app on GCP

Q: How to deploy FastAPI LLM app on GCP

Deploy a FastAPI app integrated with an LLM by containerizing it with Docker, then deploying on Google Cloud Run or Google Kubernetes Engine. Use environment variables for your OpenAI API key and configure the service to scale automatically on GCP.

Quick answer

Deploy a FastAPI app integrated with an LLM by containerizing it with Docker, then deploying on Google Cloud Run or Google Kubernetes Engine. Use environment variables for your OpenAI API key and configure the service to scale automatically on GCP.

PREREQUISITES

Python 3.8+
Docker installed
Google Cloud SDK installed and configured
GCP project with billing enabled
OpenAI API key (free tier works)
pip install fastapi uvicorn openai

Setup

Install required Python packages and Google Cloud SDK. Set environment variables for your API keys and GCP project.

bash

pip install fastapi uvicorn openai

# Install Google Cloud SDK from https://cloud.google.com/sdk/docs/install

# Authenticate with GCP
 gcloud auth login
 gcloud config set project YOUR_GCP_PROJECT_ID

# Set environment variable for OpenAI API key
 export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]

Step by step

Create a FastAPI app that calls the OpenAI API using the official openai SDK. Containerize the app with Docker and deploy it to Google Cloud Run.

python

import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class PromptRequest(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: PromptRequest):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": request.prompt}]
    )
    return {"response": response.choices[0].message.content}

# Dockerfile
# FROM python:3.10-slim
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install -r requirements.txt
# COPY . .
# CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

# Build and deploy commands
# docker build -t gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app .
# docker push gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app
# gcloud run deploy fastapi-llm-app --image gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app --platform managed --region us-central1 --allow-unauthenticated

Common variations

Use async calls with httpx for non-blocking OpenAI requests.
Deploy on Google Kubernetes Engine (GKE) for more control over scaling.
Switch model to gpt-4o-mini for cost savings.
Use environment variables in GCP Secret Manager for better security.

Troubleshooting

If deployment fails with permission errors, verify your GCP IAM roles include Cloud Run Admin and Storage Admin.
If API calls time out, check network egress settings and increase Cloud Run timeout.
For authentication errors, confirm OPENAI_API_KEY is correctly set in Cloud Run environment variables.

✅

Key Takeaways

Containerize your FastAPI LLM app with Docker for seamless GCP deployment.
Use Google Cloud Run for scalable, serverless hosting with minimal management.
Secure API keys using environment variables or GCP Secret Manager.
Adjust model choice and deployment platform based on cost and scale needs.

Verified 2026-04 · gpt-4o-mini

Verify ↗