ModelOutputPIILeakError
ai_security.errors.ModelOutputPIILeakError
Stack trace
ai_security.errors.ModelOutputPIILeakError: Detected PII in model output: 'SSN: 123-45-6789'
File "app.py", line 42, in generate_response
response = client.chat.completions.create(...)
File "ai_security/monitor.py", line 88, in check_pii
raise ModelOutputPIILeakError("Detected PII in model output") Why it happens
AI models sometimes generate outputs containing sensitive PII because training data or prompt context includes such data, or the model memorizes and reproduces it. Without explicit filtering or redaction, this leads to accidental data leaks.
Detection
Implement automated PII detection on model outputs by scanning for patterns like SSNs, emails, phone numbers, or use specialized PII detection libraries before returning responses to users.
Causes & fixes
Model memorized sensitive PII from training data and outputs it verbatim.
Use privacy-preserving fine-tuning techniques and remove or mask PII from training datasets before model training.
Prompt includes user data or context that contains PII without redaction.
Sanitize and redact all user inputs and context data before passing them to the model.
No output filtering or PII detection applied on model responses.
Integrate automated PII detection and redaction pipelines on all model outputs before usage or display.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Provide user info including SSN."}]
)
print(response.choices[0].message.content) # This may leak PII from openai import OpenAI
import os
import re
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def redact_pii(text):
# Simple regex to redact SSNs as example
return re.sub(r"\b\d{3}-\d{2}-\d{4}\b", "[REDACTED_SSN]", text)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Provide user info including SSN."}]
)
output = response.choices[0].message.content
safe_output = redact_pii(output) # Redact PII before use
print(safe_output) # PII redacted output Workaround
Wrap model output handling with a try/except that scans for PII patterns using regex or a PII detection library and redacts or blocks output if PII is found.
Prevention
Adopt a privacy-first architecture by removing PII from training data, sanitizing inputs, and enforcing output filtering with automated PII detection before any user-facing output.