ScoreNotFoundError
langfuse.evaluation.errors.ScoreNotFoundError
Stack trace
langfuse.evaluation.errors.ScoreNotFoundError: Could not find evaluation score in the provided data or LLM output
File "/usr/local/lib/python3.9/site-packages/langfuse/evaluation/evaluator.py", line 112, in evaluate
raise ScoreNotFoundError("Score not found in evaluation output")
File "/usr/local/lib/python3.9/site-packages/langfuse/evaluation/evaluator.py", line 85, in _extract_score
raise ScoreNotFoundError("Score key missing in evaluation response") Why it happens
Langfuse expects the evaluation output to contain a specific score key or field. If the LLM output or evaluation data does not include this score due to missing prompt instructions, incorrect parsing, or misconfigured evaluation setup, this error is raised.
Detection
Catch ScoreNotFoundError exceptions during evaluation calls and log the raw evaluation output to verify if the score field is missing or malformed before the error propagates.
Causes & fixes
The evaluation prompt does not instruct the LLM to return a score field.
Update the evaluation prompt to explicitly request a numeric score output with a clear key name matching Langfuse's expected schema.
The output parser or evaluation extractor is misconfigured and cannot find the score key.
Ensure the output parser schema matches the exact key name and format of the score in the LLM response or evaluation data.
The LLM model used ignores or does not follow the evaluation prompt instructions properly.
Switch to an instruction-tuned model known to comply with evaluation prompts, or add prompt engineering to enforce score output.
Code: broken vs fixed
from langfuse import EvaluationClient
client = EvaluationClient(api_key=os.environ['LANGFUSE_API_KEY'])
result = client.evaluate(prompt="Rate this response from 1 to 10.", response="Good answer.") # triggers ScoreNotFoundError
print(result.score) import os
from langfuse import EvaluationClient
client = EvaluationClient(api_key=os.environ['LANGFUSE_API_KEY'])
# Added explicit score instruction in prompt
prompt = "Rate this response from 1 to 10 and return JSON with key 'score'."
response = "Good answer."
result = client.evaluate(prompt=prompt, response=response)
print(result.score) # fixed: score found and printed Workaround
Wrap the evaluation call in try/except ScoreNotFoundError, then parse the raw LLM output manually using regex or JSON parsing to extract the score as a fallback.
Prevention
Use structured output formats and enforce strict prompt templates that guarantee the presence of evaluation scores, or leverage Langfuse's built-in schema validation to catch missing scores early.