How to use Llama Guard for content moderation
Quick answer
Use
Llama Guard by defining moderation guardrails as YAML or JSON schemas and integrating them with your LLM calls to automatically detect and block unsafe content. The llama_guard Python package enables easy enforcement of content policies during generation.PREREQUISITES
Python 3.8+pip install llama_guardAn LLM API key (e.g., OpenAI or Anthropic)Basic knowledge of YAML or JSON
Setup
Install the llama_guard package and prepare your environment. You need Python 3.8 or higher and an API key for your LLM provider (OpenAI, Anthropic, etc.).
- Install via pip:
pip install llama_guard Step by step
Define a guardrail schema to specify moderation rules, then use llama_guard to enforce these rules on your LLM's output. Below is a complete example using OpenAI's gpt-4o model.
import os
from llama_guard import Guard
from openai import OpenAI
# Define a simple guardrail schema in YAML format
guard_yaml = '''
- id: content_moderation
prompt: |
You are a content moderator. Block any text containing hate speech, violence, or adult content.
output:
type: bool
description: Whether the content is safe (true) or unsafe (false).
'''
# Initialize the guard
guard = Guard.from_yaml(guard_yaml)
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Input prompt
user_prompt = "Write a story about kindness and friendship."
# Generate completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_prompt}]
)
output_text = response.choices[0].message.content
# Run guardrail check
result = guard.check(output_text)
print("Generated text:", output_text)
print("Content safe:", result.is_valid)
if not result.is_valid:
print("Blocked content due to policy violation.") output
Generated text: Once upon a time, two friends showed kindness to everyone they met... Content safe: True
Common variations
You can use llama_guard with different LLM providers by adapting the client initialization. Guard schemas can be more complex, including regex patterns or structured outputs. Async usage is possible by integrating with async LLM clients.
import asyncio
import os
from llama_guard import Guard
from openai import OpenAI
async def main():
guard_yaml = '''
- id: content_moderation
prompt: |
Block any text with hate speech, violence, or adult content.
output:
type: bool
description: Whether content is safe.
'''
guard = Guard.from_yaml(guard_yaml)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
user_prompt = "Tell me a joke about cats."
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": user_prompt}]
)
output_text = response.choices[0].message.content
result = guard.check(output_text)
print("Generated text:", output_text)
print("Content safe:", result.is_valid)
asyncio.run(main()) output
Generated text: Why did the cat sit on the computer? Because it wanted to keep an eye on the mouse! Content safe: True
Troubleshooting
- If
guard.check()always returns invalid, verify your guard schema syntax and ensure the output matches expected types. - For API errors, confirm your environment variable
OPENAI_API_KEYis set correctly. - Use logging or verbose mode in
llama_guardto debug rule matching.
Key Takeaways
- Use
llama_guardto define and enforce content moderation rules declaratively. - Integrate guard checks immediately after LLM output to block unsafe content.
- Guard schemas are flexible and support complex validation logic.
- Works with any LLM client by adapting the input/output handling.
- Async and sync usage patterns are both supported for modern Python apps.