How to beginner · 3 min read

How to add health check to FastAPI LLM app

Quick answer
Add a dedicated health check endpoint in your FastAPI app that returns a simple status response. Optionally, perform a lightweight test call to your LLM API (e.g., OpenAI) to verify connectivity and readiness.

PREREQUISITES

  • Python 3.8+
  • FastAPI
  • Uvicorn
  • OpenAI API key (free tier works)
  • pip install fastapi uvicorn openai>=1.0

Setup

Install FastAPI and Uvicorn for the web server, and the openai package for LLM API calls.

Set your OpenAI API key as an environment variable OPENAI_API_KEY.

bash
pip install fastapi uvicorn openai>=1.0

Step by step

Create a FastAPI app with a /health endpoint that returns {"status": "ok"}. Add an optional lightweight LLM API call to verify service readiness.

python
import os
from fastapi import FastAPI, HTTPException
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.get("/health")
async def health_check():
    try:
        # Lightweight test call to LLM API to check connectivity
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Ping"}],
            max_tokens=1
        )
        if response.choices[0].message.content:
            return {"status": "ok"}
        else:
            raise HTTPException(status_code=503, detail="LLM API no response")
    except Exception as e:
        raise HTTPException(status_code=503, detail=f"LLM API error: {str(e)}")

# To run: uvicorn filename:app --reload

Common variations

  • Use synchronous endpoints if preferred by removing async and calling client.chat.completions.create synchronously.
  • Change the model to another like gpt-4.1 or claude-3-5-sonnet-20241022 depending on your provider.
  • Implement a simple /ready endpoint that only checks internal app state without calling the LLM API.

Troubleshooting

  • If the health check returns 503, verify your API key and network connectivity to the LLM provider.
  • Check for rate limits or quota exhaustion on your LLM API account.
  • Use logging inside the health check to capture exceptions for debugging.

Key Takeaways

  • Implement a dedicated /health endpoint in FastAPI returning simple JSON status.
  • Perform a minimal LLM API call inside the health check to verify external service availability.
  • Use HTTP 503 status code to indicate LLM API connectivity issues.
  • Keep health checks lightweight to avoid unnecessary API usage and latency.
Verified 2026-04 · gpt-4o-mini, gpt-4.1, claude-3-5-sonnet-20241022
Verify ↗