How to beginner · 3 min read

How to use Llama Guard for content moderation

Q: How to use Llama Guard for content moderation

Use Llama Guard by defining moderation guardrails as YAML or JSON schemas and integrating them with your LLM calls to automatically detect and block unsafe content. The llama_guard Python package enables easy enforcement of content policies during generation.

Quick answer

Use Llama Guard by defining moderation guardrails as YAML or JSON schemas and integrating them with your LLM calls to automatically detect and block unsafe content. The llama_guard Python package enables easy enforcement of content policies during generation.

PREREQUISITES

Python 3.8+
pip install llama_guard
An LLM API key (e.g., OpenAI or Anthropic)
Basic knowledge of YAML or JSON

Setup

Install the llama_guard package and prepare your environment. You need Python 3.8 or higher and an API key for your LLM provider (OpenAI, Anthropic, etc.).

Install via pip:

bash

pip install llama_guard

Step by step

Define a guardrail schema to specify moderation rules, then use llama_guard to enforce these rules on your LLM's output. Below is a complete example using OpenAI's gpt-4o model.

python

import os
from llama_guard import Guard
from openai import OpenAI

# Define a simple guardrail schema in YAML format
guard_yaml = '''
- id: content_moderation
  prompt: |
    You are a content moderator. Block any text containing hate speech, violence, or adult content.
  output:
    type: bool
    description: Whether the content is safe (true) or unsafe (false).
'''

# Initialize the guard
guard = Guard.from_yaml(guard_yaml)

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Input prompt
user_prompt = "Write a story about kindness and friendship."

# Generate completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": user_prompt}]
)

output_text = response.choices[0].message.content

# Run guardrail check
result = guard.check(output_text)

print("Generated text:", output_text)
print("Content safe:", result.is_valid)
if not result.is_valid:
    print("Blocked content due to policy violation.")

output

Generated text: Once upon a time, two friends showed kindness to everyone they met...
Content safe: True

Common variations

You can use llama_guard with different LLM providers by adapting the client initialization. Guard schemas can be more complex, including regex patterns or structured outputs. Async usage is possible by integrating with async LLM clients.

python

import asyncio
import os
from llama_guard import Guard
from openai import OpenAI

async def main():
    guard_yaml = '''
    - id: content_moderation
      prompt: |
        Block any text with hate speech, violence, or adult content.
      output:
        type: bool
        description: Whether content is safe.
    '''
    guard = Guard.from_yaml(guard_yaml)
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    user_prompt = "Tell me a joke about cats."

    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_prompt}]
    )

    output_text = response.choices[0].message.content
    result = guard.check(output_text)

    print("Generated text:", output_text)
    print("Content safe:", result.is_valid)

asyncio.run(main())

output

Generated text: Why did the cat sit on the computer? Because it wanted to keep an eye on the mouse!
Content safe: True

Troubleshooting

If guard.check() always returns invalid, verify your guard schema syntax and ensure the output matches expected types.
For API errors, confirm your environment variable OPENAI_API_KEY is set correctly.
Use logging or verbose mode in llama_guard to debug rule matching.

✅

Key Takeaways

Use llama_guard to define and enforce content moderation rules declaratively.
Integrate guard checks immediately after LLM output to block unsafe content.
Guard schemas are flexible and support complex validation logic.
Works with any LLM client by adapting the input/output handling.
Async and sync usage patterns are both supported for modern Python apps.

Verified 2026-04 · gpt-4o

Verify ↗