How to integrate Llama Guard in Python
Direct answer
Use the guardrails Python SDK to define and enforce guardrails on AI model outputs by wrapping your LLM calls with Guard and specifying a Rail schema.
Setup
Install
pip install guardrails Env vars
OPENAI_API_KEY Imports
import os
from guardrails import Guard
from openai import OpenAI Examples
inGenerate a polite email response to a customer complaint.
outSubject: Apology and Resolution for Your Recent Experience
Dear Customer,
Thank you for reaching out and sharing your concerns. We sincerely apologize for any inconvenience caused and are committed to resolving this promptly...
inExtract the user's name and age from the text: 'Alice is 28 years old.'
out{
"name": "Alice",
"age": 28
}
inSummarize the following text with a maximum of 50 words.
outThis text provides a concise overview of the main points, highlighting key information while maintaining brevity and clarity.
Integration steps
- Install the guardrails Python package via pip.
- Import Guard from guardrails and initialize your LLM client (e.g., OpenAI).
- Define a guardrail schema in YAML or JSON that specifies the expected output format and constraints.
- Create a Guard instance with the schema and the LLM client as the provider.
- Call the guard's generate method with your prompt to get validated, safe outputs.
- Handle the structured output or errors according to your application logic.
Full code
import os
from guardrails import Guard
from openai import OpenAI
# Initialize OpenAI client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define a simple guardrail schema as a YAML string
schema = """
version: 1
rails:
- name: polite_email
type: completion
prompt: |
Generate a polite email response to the customer complaint below:
{input}
output:
type: object
properties:
subject:
type: string
description: Email subject line
body:
type: string
description: Email body content
required: [subject, body]
"""
# Create a Guard instance with the schema and OpenAI client
guard = Guard.from_rail_string(schema, llm=client)
# Input prompt
prompt = "The customer is unhappy about a delayed shipment."
# Generate output with guardrails enforcement
response = guard.generate(input=prompt)
# Print structured output
print("Subject:", response.subject)
print("Body:", response.body) output
Subject: Apology and Resolution for Your Recent Experience Body: Dear Customer,\n\nThank you for reaching out and sharing your concerns. We sincerely apologize for the delay in your shipment and are working to resolve this promptly. We appreciate your patience and value your business.
API trace
Request
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Generate a polite email response to the customer complaint below: The customer is unhappy about a delayed shipment."}]} Response
{"choices": [{"message": {"content": "{\"subject\": \"Apology and Resolution for Your Recent Experience\", \"body\": \"Dear Customer, ...\"}"}}]} Extract
response.choices[0].message.content parsed as JSON to access 'subject' and 'body'Variants
Streaming output with Llama Guard ›
Use streaming when you want to display the AI's response incrementally for better user experience.
import os
from guardrails import Guard
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
schema = """
version: 1
rails:
- name: polite_email
type: completion
prompt: |
Generate a polite email response to the customer complaint below:
{input}
output:
type: object
properties:
subject:
type: string
body:
type: string
streaming: true
"""
guard = Guard.from_rail_string(schema, llm=client)
prompt = "The customer is unhappy about a delayed shipment."
for chunk in guard.stream_generate(input=prompt):
print(chunk, end='') Async integration with Llama Guard ›
Use async when integrating into asynchronous Python applications or frameworks for concurrency.
import os
import asyncio
from guardrails import Guard
from openai import OpenAI
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
schema = """
version: 1
rails:
- name: polite_email
type: completion
prompt: |
Generate a polite email response to the customer complaint below:
{input}
output:
type: object
properties:
subject:
type: string
body:
type: string
"""
guard = Guard.from_rail_string(schema, llm=client)
prompt = "The customer is unhappy about a delayed shipment."
response = await guard.agenerate(input=prompt)
print("Subject:", response.subject)
print("Body:", response.body)
asyncio.run(main()) Using a JSON schema file with Llama Guard ›
Use external schema files to manage complex or reusable guardrail definitions separately from code.
import os
from guardrails import Guard
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Load schema from external JSON file
with open("email_schema.json", "r") as f:
schema_json = f.read()
guard = Guard.from_rail_string(schema_json, llm=client)
prompt = "The customer is unhappy about a delayed shipment."
response = guard.generate(input=prompt)
print("Subject:", response.subject)
print("Body:", response.body) Performance
Latency~800ms for a typical non-streaming OpenAI GPT-4o call with guardrails
Cost~$0.002 per 500 tokens exchanged when using OpenAI GPT-4o
Rate limitsDepends on underlying LLM provider; OpenAI default tier is 3500 RPM / 90K TPM
- Keep prompts concise to reduce token usage.
- Use guardrails to enforce structured outputs, avoiding costly retries.
- Cache frequent prompts and responses when possible.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard guardrails with OpenAI GPT-4o | ~800ms | ~$0.002 | Reliable structured output with safety |
| Streaming guardrails output | ~600ms initial + streaming | ~$0.002 | Better UX for long responses |
| Async guardrails integration | ~800ms | ~$0.002 | Concurrent applications and web servers |
Quick tip
Define clear output schemas in your guardrails to catch and handle unexpected AI responses early.
Common mistake
Not passing the LLM client to Guard causes the guardrails to not enforce or generate outputs properly.