SAFETY
google.generativeai.types.HarmBlockedError or finish_reason=SAFETY
Stack trace
google.generativeai.types.HarmBlockedError: The response was blocked by the safety filter.
Response finish_reason: SAFETY
Blocked reason categories: [HarmCategory.HARM_CATEGORY_HATE_SPEECH, HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT]
At generation.py:345 in generate_content
response = self.generate_content(
^ Why it happens
Google's safety classifiers analyze both user prompts and model-generated text against four harm categories: hate speech, dangerous content, sexual content, and harassment. When the model's response contains text flagged as high-probability harmful, the safety filter blocks it and returns finish_reason=SAFETY instead of the generated content. This happens even if your input prompt was benign: the model's output itself triggered the filter. The threshold for blocking depends on the safety settings you configure; stricter settings (BLOCK_MOST_LOW) block more content, while permissive settings (BLOCK_NONE) allow more.
Detection
Always check the finish_reason field in the response object. If finish_reason equals 'SAFETY', the content was blocked. Log the blocked text and harm categories to identify patterns: are your prompts asking for violent content, explicit instructions for harm, or other policy violations? Monitor for repeated blocks: they indicate systematic issues with your prompt design or model selection.
Causes & fixes
Prompt explicitly asks for harmful content (instructions for violence, illegal activities, explicit sexual content, or hate speech)
Reframe your prompt to request the information in a safety-compliant way. Instead of 'write instructions for making explosives', ask 'explain the chemistry of combustion reactions in academic terms'. Use neutral, educational framing.
Safety settings are too strict (BLOCK_MOST_LOW) for your use case, blocking legitimate content
Lower the safety threshold by setting safety_settings with HarmBlockThreshold.BLOCK_ONLY_HIGH or BLOCK_MEDIUM_AND_ABOVE. Use this for content moderation, creative writing, or discussing sensitive topics academically.
Model is generating toxic follow-up text on its own due to prompt context or topic sensitivity
Add explicit safety instructions in your system prompt: 'Generate helpful, respectful content. Avoid hate speech, violence, explicit content, and harmful instructions.' Use a model with better safety alignment like gemini-2.0-flash.
Using a model that's overly sensitive or misclassifying benign content due to keyword triggers
Switch to gemini-2.0-flash which has improved safety accuracy. If blocked content is truly benign, set safety_settings to BLOCK_ONLY_HIGH for the specific harm category causing issues.
Code: broken vs fixed
import os
import google.generativeai as genai
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
# This prompt is too explicit and triggers the safety filter
prompt = "Write detailed instructions on how to make a molotov cocktail"
try:
response = model.generate_content(prompt) # ← BLOCKED by safety filter
print(response.text)
except Exception as e:
print(f"Error: {e}")
# Error message: 'The response was blocked by the safety filter.'
# finish_reason='SAFETY' import os
import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-2.0-flash')
# FIXED: Reframe prompt educationally and set appropriate safety thresholds
prompt = "Explain the historical context and chemistry of combustion reactions, focusing on their legitimate industrial and scientific applications."
# Set safety settings to allow legitimate educational content
safety_settings = [
{
"category": HarmCategory.HARM_CATEGORY_HATE_SPEECH,
"threshold": HarmBlockThreshold.BLOCK_ONLY_HIGH
},
{
"category": HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
"threshold": HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
},
{
"category": HarmCategory.HARM_CATEGORY_SEXUAL_CONTENT,
"threshold": HarmBlockThreshold.BLOCK_ONLY_HIGH
},
{
"category": HarmCategory.HARM_CATEGORY_HARASSMENT,
"threshold": HarmBlockThreshold.BLOCK_ONLY_HIGH
}
]
try:
response = model.generate_content(
prompt,
safety_settings=safety_settings # ← ADDED: Configured thresholds
)
print(f"Response: {response.text}")
print(f"Finish reason: {response.candidates[0].finish_reason}")
except Exception as e:
print(f"Error: {e}")
print(f"Finish reason: {getattr(response.candidates[0], 'finish_reason', 'UNKNOWN')}") Workaround
If you cannot modify your prompt or safety settings, wrap the generate_content() call in a try/except block that catches the HarmBlockedError, then request a reformulated response with explicit instructions to avoid flagged categories. Log the blocked response and use human review as a fallback before returning an error to the user. Example: catch the exception, log the raw finish_reason, and return a user-friendly message like 'Your request involves sensitive content. Please rephrase and try again.'
Prevention
Design prompts with safety in mind from the start: use educational framing, avoid explicit requests for harmful content, and test with gemini-2.0-flash which has improved safety accuracy. Set safety_settings based on your use case: use BLOCK_ONLY_HIGH for content moderation and BLOCK_MOST_LOW for sensitive but legitimate topics. Implement monitoring to track finish_reason values; if SAFETY blocks exceed 5% of requests, audit your prompt design. Consider using Anthropic's Claude for sensitive applications where you need finer-grained safety control.