How to beginner · 3 min read

How to add content moderation to chatbot

Q: How to add content moderation to chatbot

Use the OpenAI API's moderations.create endpoint to check user inputs for harmful or disallowed content before sending them to the chatbot model. Reject or filter flagged inputs to enforce content moderation in your chatbot.

Quick answer

Use the OpenAI API's moderations.create endpoint to check user inputs for harmful or disallowed content before sending them to the chatbot model. Reject or filter flagged inputs to enforce content moderation in your chatbot.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package and set your API key as an environment variable.

Run pip install openai to install the SDK.
Set your API key in your shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows).

bash

pip install openai

Step by step

This example shows how to moderate user input before passing it to a chatbot using gpt-4o. If the input is flagged, the chatbot rejects it.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def moderate_content(text: str) -> bool:
    response = client.moderations.create(
        model="omni-moderation-latest",
        input=text
    )
    # The API returns categories and flagged boolean
    flagged = response.results[0].flagged
    return flagged

def chat_with_moderation(user_input: str) -> str:
    if moderate_content(user_input):
        return "Your message was flagged by content moderation and cannot be processed."

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_input}]
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    user_text = input("Enter your message: ")
    reply = chat_with_moderation(user_text)
    print("Chatbot reply:", reply)

output

Enter your message: Hello, how are you?
Chatbot reply: I'm doing great, thank you! How can I assist you today?

Common variations

You can use async calls with asyncio and the OpenAI SDK's async methods for better performance in web apps. Also, you can moderate chatbot outputs similarly to ensure safe responses. Different models like omni-moderation-latest or gpt-4o-mini can be used depending on your needs.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def moderate_content_async(text: str) -> bool:
    response = await client.moderations.acreate(
        model="omni-moderation-latest",
        input=text
    )
    return response.results[0].flagged

async def chat_with_moderation_async(user_input: str) -> str:
    if await moderate_content_async(user_input):
        return "Your message was flagged by content moderation and cannot be processed."

    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": user_input}]
    )
    return response.choices[0].message.content

async def main():
    user_text = input("Enter your message: ")
    reply = await chat_with_moderation_async(user_text)
    print("Chatbot reply:", reply)

if __name__ == "__main__":
    asyncio.run(main())

output

Enter your message: This is a test.
Chatbot reply: This is a test.

Troubleshooting

If you get an authentication error, verify your OPENAI_API_KEY environment variable is set correctly.
If moderation flags too many false positives, review the flagged categories in the response to customize your filtering logic.
For rate limits, implement exponential backoff retries or upgrade your API plan.

✅

Key Takeaways

Use the moderations.create endpoint to pre-check user inputs for harmful content.
Reject or sanitize flagged inputs before sending them to the chatbot model to enforce guardrails.
You can moderate both user inputs and chatbot outputs for safer conversations.
Async API calls improve performance in real-time applications.
Always handle API errors and review moderation categories to fine-tune your filters.

Verified 2026-04 · gpt-4o, gpt-4o-mini, omni-moderation-latest

Verify ↗