Debug Fix intermediate · 3 min read

How to handle PII in LLM applications

Quick answer
Handle Personally Identifiable Information (PII) in LLM applications by minimizing data collection, anonymizing or pseudonymizing inputs, and encrypting data both in transit and at rest. Implement strict access controls and obtain explicit user consent to comply with privacy regulations.
ERROR TYPE model_behavior
⚡ QUICK FIX
Implement input filtering to redact or anonymize PII before sending data to the LLM API.

Why this happens

LLM applications often process user-generated text that may contain PII such as names, addresses, or social security numbers. Without safeguards, this sensitive data can be inadvertently logged, stored, or exposed through model outputs or API logs. For example, sending raw user input directly to an LLM API without filtering can lead to privacy breaches and regulatory violations.

Typical triggering code might look like this:

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

user_input = "My SSN is 123-45-6789"
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": user_input}]
)
print(response.choices[0].message.content)

This code sends raw PII directly to the model, risking exposure in logs or outputs.

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

user_input = "My SSN is 123-45-6789"
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": user_input}]
)
print(response.choices[0].message.content)
output
Your SSN is 123-45-6789

The fix

Filter and redact PII before sending data to the LLM. Use regex or specialized libraries to detect sensitive data and replace it with placeholders or anonymized tokens. This prevents sensitive data from being logged or exposed.

Example fixed code:

python
from openai import OpenAI
import os
import re

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def redact_pii(text):
    # Simple regex to redact SSN format
    redacted = re.sub(r"\b\d{3}-\d{2}-\d{4}\b", "[REDACTED_SSN]", text)
    return redacted

user_input = "My SSN is 123-45-6789"
clean_input = redact_pii(user_input)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": clean_input}]
)
print(response.choices[0].message.content)
output
[REDACTED_SSN]

Preventing it in production

  • Implement automated PII detection and redaction pipelines before data reaches the LLM.
  • Encrypt data in transit (TLS) and at rest to protect stored logs or cached inputs.
  • Use access controls and audit logs to monitor who accesses sensitive data.
  • Obtain explicit user consent for data processing and clearly disclose PII handling policies.
  • Regularly review and update PII detection rules to cover new data types.
  • Consider on-device or edge processing to minimize sending PII to cloud APIs.

Key Takeaways

  • Always filter and redact PII before sending data to LLM APIs to prevent leaks.
  • Encrypt data in transit and at rest, and enforce strict access controls for sensitive information.
  • Obtain explicit user consent and minimize PII collection to comply with privacy laws.
Verified 2026-04 · gpt-4o
Verify ↗