How to use regex constraints for LLM output
Quick answer
Use
regex constraints by post-processing LLM output with Python's re module or by prompting the model to follow a regex pattern explicitly. Some APIs support output format instructions, but validating with regex after generation ensures structured, predictable results.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python SDK and set your API key as an environment variable.
- Install SDK:
pip install openai - Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key'
pip install openai Step by step
This example shows how to prompt gpt-4o to produce output matching a regex pattern and then validate it in Python.
import os
import re
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define the regex pattern for a date in YYYY-MM-DD format
pattern = r"^\d{4}-\d{2}-\d{2}$"
prompt = (
"Generate a date string in the format YYYY-MM-DD, e.g., 2026-04-15."
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
output = response.choices[0].message.content.strip()
# Validate output against regex
if re.match(pattern, output):
print(f"Valid date output: {output}")
else:
print(f"Output does not match regex: {output}") output
Valid date output: 2026-04-15
Common variations
You can use regex constraints with other models like claude-3-5-sonnet-20241022 from Anthropic or apply streaming and async calls. Post-processing with regex is universal across SDKs.
import os
import re
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
pattern = r"^\d{4}-\d{2}-\d{2}$"
system_prompt = "You are a helpful assistant that outputs a date in YYYY-MM-DD format."
user_prompt = "Provide a date string."
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=50,
system=system_prompt,
messages=[{"role": "user", "content": user_prompt}]
)
output = response.content[0].text.strip()
if re.match(pattern, output):
print(f"Valid date output: {output}")
else:
print(f"Output does not match regex: {output}") output
Valid date output: 2026-04-15
Troubleshooting
- If your output does not match the regex, refine your prompt to explicitly instruct the model to follow the pattern.
- Use post-processing validation to catch unexpected formats.
- For complex patterns, consider multiple regex checks or parsing libraries.
Key Takeaways
- Always validate LLM output with regex post-generation for reliable structured data.
- Explicitly instruct the model in prompts to follow regex-like formats for better compliance.
- Regex constraints work universally across OpenAI, Anthropic, and other LLM SDKs.
- Use Python's
remodule to implement regex validation simply and effectively.