How to beginner · 3 min read

How to use regex constraints for LLM output

Quick answer
Use regex constraints by post-processing LLM output with Python's re module or by prompting the model to follow a regex pattern explicitly. Some APIs support output format instructions, but validating with regex after generation ensures structured, predictable results.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python SDK and set your API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable in your shell: export OPENAI_API_KEY='your_api_key'
bash
pip install openai

Step by step

This example shows how to prompt gpt-4o to produce output matching a regex pattern and then validate it in Python.

python
import os
import re
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define the regex pattern for a date in YYYY-MM-DD format
pattern = r"^\d{4}-\d{2}-\d{2}$"

prompt = (
    "Generate a date string in the format YYYY-MM-DD, e.g., 2026-04-15."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

output = response.choices[0].message.content.strip()

# Validate output against regex
if re.match(pattern, output):
    print(f"Valid date output: {output}")
else:
    print(f"Output does not match regex: {output}")
output
Valid date output: 2026-04-15

Common variations

You can use regex constraints with other models like claude-3-5-sonnet-20241022 from Anthropic or apply streaming and async calls. Post-processing with regex is universal across SDKs.

python
import os
import re
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

pattern = r"^\d{4}-\d{2}-\d{2}$"

system_prompt = "You are a helpful assistant that outputs a date in YYYY-MM-DD format."
user_prompt = "Provide a date string."

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=50,
    system=system_prompt,
    messages=[{"role": "user", "content": user_prompt}]
)

output = response.content[0].text.strip()

if re.match(pattern, output):
    print(f"Valid date output: {output}")
else:
    print(f"Output does not match regex: {output}")
output
Valid date output: 2026-04-15

Troubleshooting

  • If your output does not match the regex, refine your prompt to explicitly instruct the model to follow the pattern.
  • Use post-processing validation to catch unexpected formats.
  • For complex patterns, consider multiple regex checks or parsing libraries.

Key Takeaways

  • Always validate LLM output with regex post-generation for reliable structured data.
  • Explicitly instruct the model in prompts to follow regex-like formats for better compliance.
  • Regex constraints work universally across OpenAI, Anthropic, and other LLM SDKs.
  • Use Python's re module to implement regex validation simply and effectively.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗