How to beginner · 3 min read

Langfuse LLM as judge setup

Q: Langfuse LLM as judge setup

Use the Langfuse Python SDK to initialize a client with your API keys, then create an LLM instance as a judge by wrapping your model calls with the @observe decorator or using the Langfuse client directly. This setup enables automatic tracing and evaluation of LLM outputs as a judge in your AI workflows.

Quick answer

Use the Langfuse Python SDK to initialize a client with your API keys, then create an LLM instance as a judge by wrapping your model calls with the @observe decorator or using the Langfuse client directly. This setup enables automatic tracing and evaluation of LLM outputs as a judge in your AI workflows.

PREREQUISITES

Python 3.8+
OpenAI API key or other LLM API key
pip install langfuse openai

Setup

Install the langfuse Python package and set your environment variables for Langfuse and your LLM provider (e.g., OpenAI). This enables tracing and evaluation features.

bash

pip install langfuse openai

Step by step

Initialize the Langfuse client with your public and secret keys, then create a function that calls your LLM model wrapped with the @observe decorator to enable judging and tracing.

python

import os
from langfuse import Langfuse
from langfuse.decorators import observe
from openai import OpenAI

# Initialize Langfuse client
langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host="https://cloud.langfuse.com"
)

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@observe()
def llm_judge(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    prompt = "Judge if the following statement is true: The sky is green."
    result = llm_judge(prompt)
    print("Judge output:", result)

output

Judge output: The statement is false. The sky is blue under normal conditions.

Common variations

Use async functions with @observe() for asynchronous LLM calls.
Swap OpenAI client with other supported LLM clients (Anthropic, Mistral) by initializing Langfuse accordingly.
Stream responses by integrating Langfuse with streaming LLM calls and observing partial outputs.

python

import asyncio
from langfuse import Langfuse
from langfuse.decorators import observe
from openai import OpenAI

langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host="https://cloud.langfuse.com"
)

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@observe()
async def async_llm_judge(prompt: str) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    prompt = "Judge if the following statement is true: Water boils at 100°C."
    result = await async_llm_judge(prompt)
    print("Async judge output:", result)

if __name__ == "__main__":
    asyncio.run(main())

output

Async judge output: The statement is true. Water boils at 100°C at standard atmospheric pressure.

Troubleshooting

If you see authentication errors, verify your LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY environment variables are set correctly.
For missing API key errors, ensure your LLM provider key (e.g., OPENAI_API_KEY) is set in your environment.
If @observe does not trace calls, confirm you installed the latest langfuse package and imported the decorator properly.

✅

Key Takeaways

Use the Langfuse Python SDK with @observe to enable LLM judging and tracing.
Always set LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY environment variables for authentication.
You can use any OpenAI-compatible LLM client with Langfuse for judging AI outputs.
Async and streaming LLM calls are supported with Langfuse's decorator.
Troubleshoot by verifying environment variables and package versions.

Verified 2026-04 · gpt-4o-mini

Verify ↗