How to evaluate LLM outputs with Langfuse
Quick answer
Use the
Langfuse Python SDK to instrument your LLM calls by wrapping them with the @observe() decorator or manual tracking methods. This captures inputs, outputs, and metadata automatically, enabling detailed evaluation and monitoring of LLM outputs in your applications.PREREQUISITES
Python 3.8+pip install langfuseOpenAI API key or other LLM API keyLangfuse public and secret keys
Setup
Install the langfuse Python package and set environment variables for your Langfuse public and secret keys. This enables secure communication with the Langfuse cloud for tracking LLM outputs.
pip install langfuse Step by step
Initialize the Langfuse client in Python, then decorate your LLM call function with @observe() to automatically capture inputs and outputs. Call the function with a prompt to see tracked results.
import os
from langfuse import Langfuse
from langfuse.decorators import observe
from openai import OpenAI
# Initialize Langfuse client with your keys
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host="https://cloud.langfuse.com"
)
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@observe()
def generate_text(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
if __name__ == "__main__":
output = generate_text("Explain Langfuse observability.")
print("LLM output:", output) output
LLM output: Langfuse provides automatic tracing and observability for your LLM calls, capturing inputs, outputs, and metadata for evaluation.
Common variations
- Use
@observe()on async functions for asynchronous LLM calls. - Manually track calls with
langfuse.start_span()andlangfuse.end_span()for custom instrumentation. - Integrate with other LLM providers by wrapping their client calls similarly.
import asyncio
from langfuse.decorators import observe
@observe()
async def async_generate_text(prompt: str) -> str:
# Example async LLM call (replace with actual async client)
await asyncio.sleep(0.1)
return f"Async response for: {prompt}"
async def main():
output = await async_generate_text("Async Langfuse example.")
print(output)
if __name__ == "__main__":
asyncio.run(main()) output
Async response for: Async Langfuse example.
Troubleshooting
- If you see no data in Langfuse dashboard, verify your
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYenvironment variables are set correctly. - Ensure network connectivity to
https://cloud.langfuse.com. - Check that your LLM calls are wrapped with
@observe()or manual tracking methods.
Key Takeaways
- Use the Langfuse Python SDK and @observe() decorator to automatically capture LLM inputs and outputs.
- Initialize Langfuse with your public and secret keys to enable secure tracking.
- You can manually instrument calls for custom evaluation scenarios.
- Async and sync LLM calls are both supported with Langfuse decorators.
- Verify environment variables and network access if tracking data does not appear.