How to Intermediate · 3 min read

How to set up LLM quality alerts

Quick answer

Set up LLM quality alerts by defining key performance metrics like response accuracy, latency, and hallucination rate, then automate monitoring using API calls to your LLM and alerting tools such as email or Slack. Use SDKs like OpenAI or Anthropic to fetch model outputs and compare them against expected benchmarks, triggering alerts when thresholds are breached.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works) or Anthropic API key
pip install openai>=1.0 or pip install anthropic>=0.20

Setup

Install the required Python SDK for your LLM provider and set your API key as an environment variable for secure access.

bash

pip install openai

Step by step

This example uses the OpenAI SDK to query gpt-4o, evaluate output quality by checking if the response contains expected keywords, and send an alert via email if quality drops below threshold.

python

import os
from openai import OpenAI
import smtplib
from email.message import EmailMessage

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define a simple quality check function
EXPECTED_KEYWORDS = ["Python", "function", "return"]


def check_quality(response_text):
    return all(keyword in response_text for keyword in EXPECTED_KEYWORDS)


def send_alert(subject, body):
    msg = EmailMessage()
    msg.set_content(body)
    msg["Subject"] = subject
    msg["From"] = os.environ["ALERT_EMAIL_FROM"]
    msg["To"] = os.environ["ALERT_EMAIL_TO"]

    with smtplib.SMTP_SSL("smtp.gmail.com", 465) as smtp:
        smtp.login(os.environ["ALERT_EMAIL_FROM"], os.environ["ALERT_EMAIL_PASSWORD"])
        smtp.send_message(msg)


# Query the LLM
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a Python function that returns the square of a number."}]
)

output_text = response.choices[0].message.content
print("LLM output:", output_text)

# Check quality
if not check_quality(output_text):
    send_alert(
        subject="LLM Quality Alert: Output check failed",
        body=f"The LLM output did not meet quality standards:\n\n{output_text}"
    )
    print("Alert sent due to quality failure.")
else:
    print("Output passed quality check.")

output

LLM output: def square(x):
    return x * x
Output passed quality check.

Common variations

You can extend quality alerts by integrating with Slack or PagerDuty for real-time notifications, use async SDK calls for high throughput, or switch to other models like claude-3-5-sonnet-20241022 with Anthropic SDK. Customize quality checks with metrics like response latency, token usage, or semantic similarity using embeddings.

python

import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Explain recursion in simple terms."}]
)

print("Claude output:", response.content[0].text)

output

Claude output: Recursion is when a function calls itself to solve smaller instances of a problem until it reaches a base case.

Troubleshooting

If alerts are not sent, verify your SMTP credentials and environment variables for email alerts.
If quality checks are too strict or too loose, adjust keyword lists or implement more advanced NLP metrics like BLEU or ROUGE.
For API rate limits, implement exponential backoff or batch queries.

✅

Key Takeaways

Define clear, measurable quality metrics for LLM outputs to trigger alerts effectively.
Automate monitoring by integrating LLM API calls with alerting channels like email or Slack.
Use SDKs from OpenAI or Anthropic with environment-secured API keys for reliable querying.
Customize quality checks beyond keywords using latency, token usage, or semantic similarity.
Handle failures gracefully with retries and clear troubleshooting steps for alert delivery.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗