Code intermediate · 3 min read

How to integrate Llama Guard in Python

Direct answer
Use the guardrails Python SDK to define and enforce guardrails on AI model outputs by wrapping your LLM calls with Guard and specifying a Rail schema.

Setup

Install
bash
pip install guardrails
Env vars
OPENAI_API_KEY
Imports
python
import os
from guardrails import Guard
from openai import OpenAI

Examples

inGenerate a polite email response to a customer complaint.
outSubject: Apology and Resolution for Your Recent Experience Dear Customer, Thank you for reaching out and sharing your concerns. We sincerely apologize for any inconvenience caused and are committed to resolving this promptly...
inExtract the user's name and age from the text: 'Alice is 28 years old.'
out{ "name": "Alice", "age": 28 }
inSummarize the following text with a maximum of 50 words.
outThis text provides a concise overview of the main points, highlighting key information while maintaining brevity and clarity.

Integration steps

  1. Install the guardrails Python package via pip.
  2. Import Guard from guardrails and initialize your LLM client (e.g., OpenAI).
  3. Define a guardrail schema in YAML or JSON that specifies the expected output format and constraints.
  4. Create a Guard instance with the schema and the LLM client as the provider.
  5. Call the guard's generate method with your prompt to get validated, safe outputs.
  6. Handle the structured output or errors according to your application logic.

Full code

python
import os
from guardrails import Guard
from openai import OpenAI

# Initialize OpenAI client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define a simple guardrail schema as a YAML string
schema = """
version: 1

rails:
  - name: polite_email
    type: completion
    prompt: |
      Generate a polite email response to the customer complaint below:
      {input}
    output:
      type: object
      properties:
        subject:
          type: string
          description: Email subject line
        body:
          type: string
          description: Email body content
      required: [subject, body]
"""

# Create a Guard instance with the schema and OpenAI client
guard = Guard.from_rail_string(schema, llm=client)

# Input prompt
prompt = "The customer is unhappy about a delayed shipment."

# Generate output with guardrails enforcement
response = guard.generate(input=prompt)

# Print structured output
print("Subject:", response.subject)
print("Body:", response.body)
output
Subject: Apology and Resolution for Your Recent Experience
Body: Dear Customer,\n\nThank you for reaching out and sharing your concerns. We sincerely apologize for the delay in your shipment and are working to resolve this promptly. We appreciate your patience and value your business.

API trace

Request
json
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Generate a polite email response to the customer complaint below: The customer is unhappy about a delayed shipment."}]}
Response
json
{"choices": [{"message": {"content": "{\"subject\": \"Apology and Resolution for Your Recent Experience\", \"body\": \"Dear Customer, ...\"}"}}]}
Extractresponse.choices[0].message.content parsed as JSON to access 'subject' and 'body'

Variants

Streaming output with Llama Guard

Use streaming when you want to display the AI's response incrementally for better user experience.

python
import os
from guardrails import Guard
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

schema = """
version: 1
rails:
  - name: polite_email
    type: completion
    prompt: |
      Generate a polite email response to the customer complaint below:
      {input}
    output:
      type: object
      properties:
        subject:
          type: string
        body:
          type: string
    streaming: true
"""

guard = Guard.from_rail_string(schema, llm=client)
prompt = "The customer is unhappy about a delayed shipment."

for chunk in guard.stream_generate(input=prompt):
    print(chunk, end='')
Async integration with Llama Guard

Use async when integrating into asynchronous Python applications or frameworks for concurrency.

python
import os
import asyncio
from guardrails import Guard
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    schema = """
version: 1
rails:
  - name: polite_email
    type: completion
    prompt: |
      Generate a polite email response to the customer complaint below:
      {input}
    output:
      type: object
      properties:
        subject:
          type: string
        body:
          type: string
    """

    guard = Guard.from_rail_string(schema, llm=client)
    prompt = "The customer is unhappy about a delayed shipment."

    response = await guard.agenerate(input=prompt)
    print("Subject:", response.subject)
    print("Body:", response.body)

asyncio.run(main())
Using a JSON schema file with Llama Guard

Use external schema files to manage complex or reusable guardrail definitions separately from code.

python
import os
from guardrails import Guard
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Load schema from external JSON file
with open("email_schema.json", "r") as f:
    schema_json = f.read()

guard = Guard.from_rail_string(schema_json, llm=client)

prompt = "The customer is unhappy about a delayed shipment."
response = guard.generate(input=prompt)
print("Subject:", response.subject)
print("Body:", response.body)

Performance

Latency~800ms for a typical non-streaming OpenAI GPT-4o call with guardrails
Cost~$0.002 per 500 tokens exchanged when using OpenAI GPT-4o
Rate limitsDepends on underlying LLM provider; OpenAI default tier is 3500 RPM / 90K TPM
  • Keep prompts concise to reduce token usage.
  • Use guardrails to enforce structured outputs, avoiding costly retries.
  • Cache frequent prompts and responses when possible.
ApproachLatencyCost/callBest for
Standard guardrails with OpenAI GPT-4o~800ms~$0.002Reliable structured output with safety
Streaming guardrails output~600ms initial + streaming~$0.002Better UX for long responses
Async guardrails integration~800ms~$0.002Concurrent applications and web servers

Quick tip

Define clear output schemas in your guardrails to catch and handle unexpected AI responses early.

Common mistake

Not passing the LLM client to Guard causes the guardrails to not enforce or generate outputs properly.

Verified 2026-04 · gpt-4o
Verify ↗