How to implement content filtering for AI apps
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to access moderation endpoints.
pip install openai>=1.0 Step by step
This example shows how to call OpenAI's moderation endpoint to filter user input before sending it to a chat model.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# User input to check
user_input = "I want to do something illegal."
# Call moderation endpoint
response = client.moderations.create(
model="omni-moderation-latest",
input=user_input
)
# Check if flagged
moderation_result = response.results[0]
if moderation_result.flagged:
print("Content flagged by moderation. Blocking request.")
else:
# Proceed with chat completion
chat_response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_input}]
)
print("AI response:", chat_response.choices[0].message.content) Content flagged by moderation. Blocking request.
Common variations
You can use asynchronous calls, switch to Anthropic's claude-3-5-sonnet-20241022 with its safety classifiers, or implement custom keyword filters before calling the API.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def moderate_and_chat(user_input: str):
moderation_response = await client.moderations.acreate(
model="omni-moderation-latest",
input=user_input
)
if moderation_response.results[0].flagged:
return "Content flagged by moderation."
chat_response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": user_input}]
)
return chat_response.choices[0].message.content
async def main():
result = await moderate_and_chat("Tell me a joke.")
print(result)
asyncio.run(main()) Why did the scarecrow win an award? Because he was outstanding in his field!
Troubleshooting
If you see false positives where safe content is flagged, adjust your prompt or add a review step. If flagged content still passes, add custom keyword filters or escalate to human review. Ensure your API key is valid and environment variables are set correctly.
Key Takeaways
- Use dedicated moderation endpoints like OpenAI's omni-moderation-latest to detect harmful content before generating AI responses.
- Combine automated filtering with prompt engineering and custom keyword checks for robust content safety.
- Implement async calls for scalable moderation and generation workflows.
- Always handle flagged content gracefully by blocking or escalating to human review.
- Validate environment setup and API keys to avoid runtime errors in filtering pipelines.