How to beginner · 3 min read

How to add health check to FastAPI LLM app

Q: How to add health check to FastAPI LLM app

Add a dedicated health check endpoint in your FastAPI app that returns a simple status response. Optionally, perform a lightweight test call to your LLM API (e.g., OpenAI) to verify connectivity and readiness.

Quick answer

Add a dedicated health check endpoint in your FastAPI app that returns a simple status response. Optionally, perform a lightweight test call to your LLM API (e.g., OpenAI) to verify connectivity and readiness.

PREREQUISITES

Python 3.8+
FastAPI
Uvicorn
OpenAI API key (free tier works)
pip install fastapi uvicorn openai>=1.0

Setup

Install FastAPI and Uvicorn for the web server, and the openai package for LLM API calls.

Set your OpenAI API key as an environment variable OPENAI_API_KEY.

bash

pip install fastapi uvicorn openai>=1.0

Step by step

Create a FastAPI app with a /health endpoint that returns {"status": "ok"}. Add an optional lightweight LLM API call to verify service readiness.

python

import os
from fastapi import FastAPI, HTTPException
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.get("/health")
async def health_check():
    try:
        # Lightweight test call to LLM API to check connectivity
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Ping"}],
            max_tokens=1
        )
        if response.choices[0].message.content:
            return {"status": "ok"}
        else:
            raise HTTPException(status_code=503, detail="LLM API no response")
    except Exception as e:
        raise HTTPException(status_code=503, detail=f"LLM API error: {str(e)}")

# To run: uvicorn filename:app --reload

Common variations

Use synchronous endpoints if preferred by removing async and calling client.chat.completions.create synchronously.
Change the model to another like gpt-4.1 or claude-3-5-sonnet-20241022 depending on your provider.
Implement a simple /ready endpoint that only checks internal app state without calling the LLM API.

Troubleshooting

If the health check returns 503, verify your API key and network connectivity to the LLM provider.
Check for rate limits or quota exhaustion on your LLM API account.
Use logging inside the health check to capture exceptions for debugging.

✅

Key Takeaways

Implement a dedicated /health endpoint in FastAPI returning simple JSON status.
Perform a minimal LLM API call inside the health check to verify external service availability.
Use HTTP 503 status code to indicate LLM API connectivity issues.
Keep health checks lightweight to avoid unnecessary API usage and latency.

Verified 2026-04 · gpt-4o-mini, gpt-4.1, claude-3-5-sonnet-20241022

Verify ↗