How to deploy FastAPI LLM app on GCP
Quick answer
Deploy a
FastAPI app integrated with an LLM by containerizing it with Docker, then deploying on Google Cloud Run or Google Kubernetes Engine. Use environment variables for your OpenAI API key and configure the service to scale automatically on GCP.PREREQUISITES
Python 3.8+Docker installedGoogle Cloud SDK installed and configuredGCP project with billing enabledOpenAI API key (free tier works)pip install fastapi uvicorn openai
Setup
Install required Python packages and Google Cloud SDK. Set environment variables for your API keys and GCP project.
pip install fastapi uvicorn openai
# Install Google Cloud SDK from https://cloud.google.com/sdk/docs/install
# Authenticate with GCP
gcloud auth login
gcloud config set project YOUR_GCP_PROJECT_ID
# Set environment variable for OpenAI API key
export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"] Step by step
Create a FastAPI app that calls the OpenAI API using the official openai SDK. Containerize the app with Docker and deploy it to Google Cloud Run.
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
class PromptRequest(BaseModel):
prompt: str
@app.post("/generate")
async def generate_text(request: PromptRequest):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": request.prompt}]
)
return {"response": response.choices[0].message.content}
# Dockerfile
# FROM python:3.10-slim
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install -r requirements.txt
# COPY . .
# CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
# Build and deploy commands
# docker build -t gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app .
# docker push gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app
# gcloud run deploy fastapi-llm-app --image gcr.io/YOUR_GCP_PROJECT_ID/fastapi-llm-app --platform managed --region us-central1 --allow-unauthenticated Common variations
- Use
asynccalls withhttpxfor non-blocking OpenAI requests. - Deploy on Google Kubernetes Engine (GKE) for more control over scaling.
- Switch model to
gpt-4o-minifor cost savings. - Use environment variables in GCP Secret Manager for better security.
Troubleshooting
- If deployment fails with permission errors, verify your GCP IAM roles include Cloud Run Admin and Storage Admin.
- If API calls time out, check network egress settings and increase Cloud Run timeout.
- For authentication errors, confirm
OPENAI_API_KEYis correctly set in Cloud Run environment variables.
Key Takeaways
- Containerize your FastAPI LLM app with Docker for seamless GCP deployment.
- Use Google Cloud Run for scalable, serverless hosting with minimal management.
- Secure API keys using environment variables or GCP Secret Manager.
- Adjust model choice and deployment platform based on cost and scale needs.