How to Intermediate · 3 min read

Input sanitization for LLM apps

Quick answer
Use input sanitization in LLM apps to clean and validate user inputs before passing them to the model, preventing injection attacks and unsafe outputs. Techniques include escaping special characters, limiting input length, and filtering disallowed content.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Sanitize user input by trimming whitespace, escaping special characters, limiting length, and filtering disallowed patterns before sending to the LLM.

python
import os
import re
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Basic input sanitization function
def sanitize_input(user_input: str) -> str:
    # Trim whitespace
    cleaned = user_input.strip()
    # Limit length to 500 characters
    cleaned = cleaned[:500]
    # Escape potentially dangerous characters
    cleaned = re.sub(r'["'"\\]', '', cleaned)
    # Filter out disallowed words (example)
    disallowed = ['DROP', 'DELETE', 'INSERT', 'UPDATE']
    pattern = re.compile('|'.join(disallowed), re.IGNORECASE)
    if pattern.search(cleaned):
        raise ValueError("Input contains disallowed content.")
    return cleaned

# Example usage
try:
    user_text = "  Hello, can you DROP all tables?  "
    safe_text = sanitize_input(user_text)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": safe_text}]
    )
    print("LLM response:", response.choices[0].message.content)
except ValueError as e:
    print("Input rejected:", e)
output
Input rejected: Input contains disallowed content.

Common variations

You can implement asynchronous calls, use streaming responses, or switch to other models like gpt-4o-mini. Input sanitization logic remains similar but adapt to your app's context.

python
import os
import re
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def sanitize_and_query(user_input: str):
    cleaned = user_input.strip()[:500]
    cleaned = re.sub(r'["'"\\]', '', cleaned)
    disallowed = ['DROP', 'DELETE', 'INSERT', 'UPDATE']
    pattern = re.compile('|'.join(disallowed), re.IGNORECASE)
    if pattern.search(cleaned):
        raise ValueError("Input contains disallowed content.")

    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": cleaned}],
        stream=True
    )
    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

# Run async example
asyncio.run(sanitize_and_query("Hello, please DELETE all data."))
output
Input rejected: Input contains disallowed content.

Troubleshooting

  • If you see unexpected LLM outputs, verify your sanitization filters cover all injection patterns.
  • For input rejections, log inputs to refine your disallowed patterns.
  • Ensure environment variables like OPENAI_API_KEY are set correctly to avoid authentication errors.

Key Takeaways

  • Always sanitize and validate user inputs before sending to LLMs to prevent injection and unsafe outputs.
  • Use regex filtering, length limits, and character escaping as core sanitization techniques.
  • Adapt sanitization logic for async or streaming LLM calls without compromising security.
Verified 2026-04 · gpt-4o-mini
Verify ↗