How to Intermediate · 3 min read

How to implement content filtering for AI apps

Quick answer

Implement content filtering in AI apps by using dedicated moderation APIs like OpenAI's moderation endpoint or Anthropic's safety classifiers to detect and block harmful content. Combine these with prompt engineering and custom rule-based filters to ensure outputs meet your safety and compliance requirements.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to access moderation endpoints.

bash

pip install openai>=1.0

Step by step

This example shows how to call OpenAI's moderation endpoint to filter user input before sending it to a chat model.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# User input to check
user_input = "I want to do something illegal."

# Call moderation endpoint
response = client.moderations.create(
    model="omni-moderation-latest",
    input=user_input
)

# Check if flagged
moderation_result = response.results[0]
if moderation_result.flagged:
    print("Content flagged by moderation. Blocking request.")
else:
    # Proceed with chat completion
    chat_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_input}]
    )
    print("AI response:", chat_response.choices[0].message.content)

output

Content flagged by moderation. Blocking request.

Common variations

You can use asynchronous calls, switch to Anthropic's claude-3-5-sonnet-20241022 with its safety classifiers, or implement custom keyword filters before calling the API.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def moderate_and_chat(user_input: str):
    moderation_response = await client.moderations.acreate(
        model="omni-moderation-latest",
        input=user_input
    )
    if moderation_response.results[0].flagged:
        return "Content flagged by moderation."
    chat_response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_input}]
    )
    return chat_response.choices[0].message.content

async def main():
    result = await moderate_and_chat("Tell me a joke.")
    print(result)

asyncio.run(main())

output

Why did the scarecrow win an award? Because he was outstanding in his field!

Troubleshooting

If you see false positives where safe content is flagged, adjust your prompt or add a review step. If flagged content still passes, add custom keyword filters or escalate to human review. Ensure your API key is valid and environment variables are set correctly.

Key Takeaways

Use dedicated moderation endpoints like OpenAI's omni-moderation-latest to detect harmful content before generating AI responses.
Combine automated filtering with prompt engineering and custom keyword checks for robust content safety.
Implement async calls for scalable moderation and generation workflows.
Always handle flagged content gracefully by blocking or escalating to human review.
Validate environment setup and API keys to avoid runtime errors in filtering pipelines.

Verified 2026-04 · gpt-4o, omni-moderation-latest, claude-3-5-sonnet-20241022

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.