How to beginner · 3 min read

How to block toxic content in chatbot

Q: How to block toxic content in chatbot

Use content moderation APIs like OpenAI's moderation endpoint or implement prompt-based guardrails to detect and block toxic content in chatbots. Combine automated filtering with system prompts that instruct the model to refuse harmful or toxic requests.

Quick answer

Use content moderation APIs like OpenAI's moderation endpoint or implement prompt-based guardrails to detect and block toxic content in chatbots. Combine automated filtering with system prompts that instruct the model to refuse harmful or toxic requests.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

This example shows how to use OpenAI's moderation API to detect toxic content before sending user input to the chatbot. If the input is flagged, the bot refuses to respond.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to check if content is toxic

def is_toxic(text: str) -> bool:
    response = client.moderations.create(
        model="omni-moderation-latest",
        input=text
    )
    # The moderation response contains categories flagged as true if toxic
    results = response.results[0]
    return results.flagged

# Chatbot interaction with toxic content blocking

def chatbot_response(user_input: str) -> str:
    if is_toxic(user_input):
        return "Your message was flagged as inappropriate and cannot be processed."

    # Use a system prompt to reinforce guardrails
    messages = [
        {"role": "system", "content": "You are a helpful assistant that refuses to generate or engage with toxic or harmful content."},
        {"role": "user", "content": user_input}
    ]

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

# Example usage
if __name__ == "__main__":
    user_text = input("User: ")
    reply = chatbot_response(user_text)
    print("Bot:", reply)

output

User: You are stupid
Bot: Your message was flagged as inappropriate and cannot be processed.

Common variations

Use async calls with asyncio for non-blocking moderation and chat requests.
Switch to other models like gpt-4o-mini for cost efficiency.
Implement additional prompt engineering to instruct the model to refuse toxic content explicitly.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def is_toxic_async(text: str) -> bool:
    response = await client.moderations.acreate(
        model="omni-moderation-latest",
        input=text
    )
    return response.results[0].flagged

async def chatbot_response_async(user_input: str) -> str:
    if await is_toxic_async(user_input):
        return "Your message was flagged as inappropriate and cannot be processed."

    messages = [
        {"role": "system", "content": "You are a helpful assistant that refuses to generate or engage with toxic or harmful content."},
        {"role": "user", "content": user_input}
    ]

    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

async def main():
    user_text = "You are dumb"
    reply = await chatbot_response_async(user_text)
    print("Bot:", reply)

if __name__ == "__main__":
    asyncio.run(main())

output

Bot: Your message was flagged as inappropriate and cannot be processed.

Troubleshooting

If the moderation API returns false positives, adjust your prompt to clarify context or whitelist safe phrases.
If you see API rate limit errors, implement exponential backoff retries.
Ensure your environment variable OPENAI_API_KEY is set correctly to avoid authentication errors.

✅

Key Takeaways

Use content moderation APIs to pre-filter user inputs for toxic content before chatbot processing.
Combine automated filtering with system prompt guardrails instructing the model to refuse harmful content.
Implement async calls and error handling for robust, scalable chatbot deployments.

Verified 2026-04 · gpt-4o-mini, omni-moderation-latest

Verify ↗