How to use AI for content moderation
Quick answer
Use AI models like
gpt-4o or claude-3-5-sonnet-20241022 to analyze user-generated content by sending text to moderation endpoints or prompt-based classification. These models can detect hate speech, spam, adult content, and other policy violations automatically.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.
pip install openai>=1.0 Step by step
This example uses gpt-4o to classify text for moderation by prompting the model to identify if content violates policies such as hate speech or adult content.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
content_to_moderate = "I hate all people from group X!"
messages = [
{"role": "system", "content": "You are a content moderation assistant. Classify if the following text contains hate speech, adult content, spam, or is safe."},
{"role": "user", "content": content_to_moderate}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print("Moderation result:", response.choices[0].message.content) output
Moderation result: The text contains hate speech and violates content policies.
Common variations
You can use asynchronous calls for higher throughput or switch to claude-3-5-sonnet-20241022 for stronger coding and classification accuracy. Streaming responses help with real-time moderation dashboards.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def moderate_async(text):
messages = [
{"role": "system", "content": "You are a content moderation assistant."},
{"role": "user", "content": text}
]
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=messages
)
return response.choices[0].message.content
async def main():
result = await moderate_async("This is spam content!")
print("Async moderation result:", result)
asyncio.run(main()) output
Async moderation result: The text contains spam and should be flagged.
Troubleshooting
If you receive unexpected or vague moderation results, refine your system prompt to be more explicit about categories and actions. Also, ensure your API key is valid and you are using the latest SDK version.
Key Takeaways
- Use prompt engineering to tailor AI models for precise content moderation tasks.
- Leverage asynchronous API calls for scalable moderation pipelines.
- Test with diverse content types to improve detection accuracy.
- Keep system prompts explicit to reduce ambiguous moderation outputs.