Code intermediate · 3 min read

How to integrate Llama Guard in Python

Direct answer

Use the guardrails Python SDK to define and enforce guardrails on AI model outputs by wrapping your LLM calls with Guard and specifying a Rail schema.

Setup

Install

bash

pip install guardrails

Env vars

OPENAI_API_KEY

Imports

python

import os
from guardrails import Guard
from openai import OpenAI

Examples

inGenerate a polite email response to a customer complaint.

outSubject: Apology and Resolution for Your Recent Experience Dear Customer, Thank you for reaching out and sharing your concerns. We sincerely apologize for any inconvenience caused and are committed to resolving this promptly...

inExtract the user's name and age from the text: 'Alice is 28 years old.'

out{ "name": "Alice", "age": 28 }

inSummarize the following text with a maximum of 50 words.

outThis text provides a concise overview of the main points, highlighting key information while maintaining brevity and clarity.

Integration steps

Install the guardrails Python package via pip.
Import Guard from guardrails and initialize your LLM client (e.g., OpenAI).
Define a guardrail schema in YAML or JSON that specifies the expected output format and constraints.
Create a Guard instance with the schema and the LLM client as the provider.
Call the guard's generate method with your prompt to get validated, safe outputs.
Handle the structured output or errors according to your application logic.

Full code

python

import os
from guardrails import Guard
from openai import OpenAI

# Initialize OpenAI client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define a simple guardrail schema as a YAML string
schema = """
version: 1

rails:
  - name: polite_email
    type: completion
    prompt: |
      Generate a polite email response to the customer complaint below:
      {input}
    output:
      type: object
      properties:
        subject:
          type: string
          description: Email subject line
        body:
          type: string
          description: Email body content
      required: [subject, body]
"""

# Create a Guard instance with the schema and OpenAI client
guard = Guard.from_rail_string(schema, llm=client)

# Input prompt
prompt = "The customer is unhappy about a delayed shipment."

# Generate output with guardrails enforcement
response = guard.generate(input=prompt)

# Print structured output
print("Subject:", response.subject)
print("Body:", response.body)

output

Subject: Apology and Resolution for Your Recent Experience
Body: Dear Customer,\n\nThank you for reaching out and sharing your concerns. We sincerely apologize for the delay in your shipment and are working to resolve this promptly. We appreciate your patience and value your business.

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": "Generate a polite email response to the customer complaint below: The customer is unhappy about a delayed shipment."}]}

Response

json

{"choices": [{"message": {"content": "{\"subject\": \"Apology and Resolution for Your Recent Experience\", \"body\": \"Dear Customer, ...\"}"}}]}

Extractresponse.choices[0].message.content parsed as JSON to access 'subject' and 'body'

Variants

Streaming output with Llama Guard ›

Use streaming when you want to display the AI's response incrementally for better user experience.

python

import os
from guardrails import Guard
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

schema = """
version: 1
rails:
  - name: polite_email
    type: completion
    prompt: |
      Generate a polite email response to the customer complaint below:
      {input}
    output:
      type: object
      properties:
        subject:
          type: string
        body:
          type: string
    streaming: true
"""

guard = Guard.from_rail_string(schema, llm=client)
prompt = "The customer is unhappy about a delayed shipment."

for chunk in guard.stream_generate(input=prompt):
    print(chunk, end='')

Async integration with Llama Guard ›

Use async when integrating into asynchronous Python applications or frameworks for concurrency.

python

import os
import asyncio
from guardrails import Guard
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    schema = """
version: 1
rails:
  - name: polite_email
    type: completion
    prompt: |
      Generate a polite email response to the customer complaint below:
      {input}
    output:
      type: object
      properties:
        subject:
          type: string
        body:
          type: string
    """

    guard = Guard.from_rail_string(schema, llm=client)
    prompt = "The customer is unhappy about a delayed shipment."

    response = await guard.agenerate(input=prompt)
    print("Subject:", response.subject)
    print("Body:", response.body)

asyncio.run(main())

Using a JSON schema file with Llama Guard ›

Use external schema files to manage complex or reusable guardrail definitions separately from code.

python

import os
from guardrails import Guard
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Load schema from external JSON file
with open("email_schema.json", "r") as f:
    schema_json = f.read()

guard = Guard.from_rail_string(schema_json, llm=client)

prompt = "The customer is unhappy about a delayed shipment."
response = guard.generate(input=prompt)
print("Subject:", response.subject)
print("Body:", response.body)

Performance

Latency~800ms for a typical non-streaming OpenAI GPT-4o call with guardrails

Cost~$0.002 per 500 tokens exchanged when using OpenAI GPT-4o

Rate limitsDepends on underlying LLM provider; OpenAI default tier is 3500 RPM / 90K TPM

Keep prompts concise to reduce token usage.
Use guardrails to enforce structured outputs, avoiding costly retries.
Cache frequent prompts and responses when possible.

Approach	Latency	Cost/call	Best for
Standard guardrails with OpenAI GPT-4o	~800ms	~$0.002	Reliable structured output with safety
Streaming guardrails output	~600ms initial + streaming	~$0.002	Better UX for long responses
Async guardrails integration	~800ms	~$0.002	Concurrent applications and web servers

✓

Quick tip

Define clear output schemas in your guardrails to catch and handle unexpected AI responses early.

⚠

Common mistake

Not passing the LLM client to Guard causes the guardrails to not enforce or generate outputs properly.

Verified 2026-04 · gpt-4o

Verify ↗