How to beginner · 3 min read

How to use AI for content moderation

Q: How to use AI for content moderation

Use AI models like gpt-4o or claude-3-5-sonnet-20241022 to analyze user-generated content by sending text to moderation endpoints or prompt-based classification. These models can detect hate speech, spam, adult content, and other policy violations automatically.

Quick answer

Use AI models like gpt-4o or claude-3-5-sonnet-20241022 to analyze user-generated content by sending text to moderation endpoints or prompt-based classification. These models can detect hate speech, spam, adult content, and other policy violations automatically.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.

bash

pip install openai>=1.0

Step by step

This example uses gpt-4o to classify text for moderation by prompting the model to identify if content violates policies such as hate speech or adult content.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

content_to_moderate = "I hate all people from group X!"

messages = [
    {"role": "system", "content": "You are a content moderation assistant. Classify if the following text contains hate speech, adult content, spam, or is safe."},
    {"role": "user", "content": content_to_moderate}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print("Moderation result:", response.choices[0].message.content)

output

Moderation result: The text contains hate speech and violates content policies.

Common variations

You can use asynchronous calls for higher throughput or switch to claude-3-5-sonnet-20241022 for stronger coding and classification accuracy. Streaming responses help with real-time moderation dashboards.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def moderate_async(text):
    messages = [
        {"role": "system", "content": "You are a content moderation assistant."},
        {"role": "user", "content": text}
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages
    )
    return response.choices[0].message.content

async def main():
    result = await moderate_async("This is spam content!")
    print("Async moderation result:", result)

asyncio.run(main())

output

Async moderation result: The text contains spam and should be flagged.

Troubleshooting

If you receive unexpected or vague moderation results, refine your system prompt to be more explicit about categories and actions. Also, ensure your API key is valid and you are using the latest SDK version.

✅

Key Takeaways

Use prompt engineering to tailor AI models for precise content moderation tasks.
Leverage asynchronous API calls for scalable moderation pipelines.
Test with diverse content types to improve detection accuracy.
Keep system prompts explicit to reduce ambiguous moderation outputs.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗