Langfuse LLM as judge setup
Quick answer
Use the
Langfuse Python SDK to initialize a client with your API keys, then create an LLM instance as a judge by wrapping your model calls with the @observe decorator or using the Langfuse client directly. This setup enables automatic tracing and evaluation of LLM outputs as a judge in your AI workflows.PREREQUISITES
Python 3.8+OpenAI API key or other LLM API keypip install langfuse openai
Setup
Install the langfuse Python package and set your environment variables for Langfuse and your LLM provider (e.g., OpenAI). This enables tracing and evaluation features.
pip install langfuse openai Step by step
Initialize the Langfuse client with your public and secret keys, then create a function that calls your LLM model wrapped with the @observe decorator to enable judging and tracing.
import os
from langfuse import Langfuse
from langfuse.decorators import observe
from openai import OpenAI
# Initialize Langfuse client
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host="https://cloud.langfuse.com"
)
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@observe()
def llm_judge(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
if __name__ == "__main__":
prompt = "Judge if the following statement is true: The sky is green."
result = llm_judge(prompt)
print("Judge output:", result) output
Judge output: The statement is false. The sky is blue under normal conditions.
Common variations
- Use async functions with
@observe()for asynchronous LLM calls. - Swap
OpenAIclient with other supported LLM clients (Anthropic, Mistral) by initializingLangfuseaccordingly. - Stream responses by integrating Langfuse with streaming LLM calls and observing partial outputs.
import asyncio
from langfuse import Langfuse
from langfuse.decorators import observe
from openai import OpenAI
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host="https://cloud.langfuse.com"
)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@observe()
async def async_llm_judge(prompt: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
prompt = "Judge if the following statement is true: Water boils at 100°C."
result = await async_llm_judge(prompt)
print("Async judge output:", result)
if __name__ == "__main__":
asyncio.run(main()) output
Async judge output: The statement is true. Water boils at 100°C at standard atmospheric pressure.
Troubleshooting
- If you see authentication errors, verify your
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYenvironment variables are set correctly. - For missing API key errors, ensure your LLM provider key (e.g.,
OPENAI_API_KEY) is set in your environment. - If
@observedoes not trace calls, confirm you installed the latestlangfusepackage and imported the decorator properly.
Key Takeaways
- Use the
LangfusePython SDK with@observeto enable LLM judging and tracing. - Always set
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYenvironment variables for authentication. - You can use any OpenAI-compatible LLM client with Langfuse for judging AI outputs.
- Async and streaming LLM calls are supported with Langfuse's decorator.
- Troubleshoot by verifying environment variables and package versions.