SystemPromptLeakError
ai_security.errors.SystemPromptLeakError
Stack trace
ai_security.errors.SystemPromptLeakError: Detected system prompt content leaked in LLM response or logs at ai_security.monitoring.detect_prompt_leak (monitoring.py:45) at ai_security.llm_client.call (llm_client.py:112) at app.main (main.py:78)
Why it happens
This error happens because the system prompt, which often contains sensitive instructions or context, is inadvertently included in the model's output or exposed in logs. This can occur due to improper prompt construction, logging raw prompt data, or model behavior that echoes system instructions.
Detection
Implement monitoring that scans LLM outputs and logs for substrings matching system prompt content, raising alerts or exceptions when leaks are detected before further processing.
Causes & fixes
System prompt text is concatenated directly into user-visible output without redaction
Separate system prompt from user prompt and ensure only user prompt and model output are exposed or logged.
Logging raw prompt data including system prompt without filtering
Filter or redact system prompt content from logs before writing to persistent storage or monitoring tools.
Model echoes system prompt instructions verbatim in its response
Use instruction-tuned models that minimize prompt echoing and add explicit instructions to avoid repeating system prompt text.
Code: broken vs fixed
from ai_security import LLMClient
client = LLMClient(api_key=os.environ['AI_SECURITY_API_KEY'])
response = client.call(system_prompt='[SECRET INSTRUCTIONS]', user_prompt='Hello') # triggers leak error
print(response) from ai_security import LLMClient
import os
client = LLMClient(api_key=os.environ['AI_SECURITY_API_KEY'])
# Separate system prompt from user prompt and enable leak detection
response = client.call(user_prompt='Hello', system_prompt='[SECRET INSTRUCTIONS]', leak_detection=True) # fixed
print(response) Workaround
Wrap calls in try/except SystemPromptLeakError and sanitize outputs by removing or masking any detected system prompt substrings before retrying or logging.
Prevention
Architect your prompt handling to strictly separate system and user prompts, avoid logging raw system prompts, and use models and instructions that minimize prompt echoing to prevent leaks.