How to beginner · 4 min read

How to evaluate LLM outputs with Langfuse

Q: How to evaluate LLM outputs with Langfuse

Use the Langfuse Python SDK to instrument your LLM calls by wrapping them with the @observe() decorator or manual tracking methods. This captures inputs, outputs, and metadata automatically, enabling detailed evaluation and monitoring of LLM outputs in your applications.

Quick answer

Use the Langfuse Python SDK to instrument your LLM calls by wrapping them with the @observe() decorator or manual tracking methods. This captures inputs, outputs, and metadata automatically, enabling detailed evaluation and monitoring of LLM outputs in your applications.

PREREQUISITES

Python 3.8+
pip install langfuse
OpenAI API key or other LLM API key
Langfuse public and secret keys

Setup

Install the langfuse Python package and set environment variables for your Langfuse public and secret keys. This enables secure communication with the Langfuse cloud for tracking LLM outputs.

bash

pip install langfuse

Step by step

Initialize the Langfuse client in Python, then decorate your LLM call function with @observe() to automatically capture inputs and outputs. Call the function with a prompt to see tracked results.

python

import os
from langfuse import Langfuse
from langfuse.decorators import observe
from openai import OpenAI

# Initialize Langfuse client with your keys
langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host="https://cloud.langfuse.com"
)

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@observe()
def generate_text(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    output = generate_text("Explain Langfuse observability.")
    print("LLM output:", output)

output

LLM output: Langfuse provides automatic tracing and observability for your LLM calls, capturing inputs, outputs, and metadata for evaluation.

Common variations

Use @observe() on async functions for asynchronous LLM calls.
Manually track calls with langfuse.start_span() and langfuse.end_span() for custom instrumentation.
Integrate with other LLM providers by wrapping their client calls similarly.

python

import asyncio
from langfuse.decorators import observe

@observe()
async def async_generate_text(prompt: str) -> str:
    # Example async LLM call (replace with actual async client)
    await asyncio.sleep(0.1)
    return f"Async response for: {prompt}"

async def main():
    output = await async_generate_text("Async Langfuse example.")
    print(output)

if __name__ == "__main__":
    asyncio.run(main())

output

Async response for: Async Langfuse example.

Troubleshooting

If you see no data in Langfuse dashboard, verify your LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY environment variables are set correctly.
Ensure network connectivity to https://cloud.langfuse.com.
Check that your LLM calls are wrapped with @observe() or manual tracking methods.

Key Takeaways

Use the Langfuse Python SDK and @observe() decorator to automatically capture LLM inputs and outputs.
Initialize Langfuse with your public and secret keys to enable secure tracking.
You can manually instrument calls for custom evaluation scenarios.
Async and sync LLM calls are both supported with Langfuse decorators.
Verify environment variables and network access if tracking data does not appear.

Verified 2026-04 · gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.