How to create custom DSPy metrics
Quick answer
To create custom metrics in
dspy, define a new class inheriting from dspy.Metric and implement the compute method with your metric logic. Then use this metric class in your dspy.Predict or dspy.ChainOfThought workflows to evaluate model outputs.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install dspy openai>=1.0
Setup
Install the dspy package and set your OpenAI API key as an environment variable.
- Install DSPy and OpenAI SDK:
pip install dspy openai>=1.0 Step by step
Define a custom metric by subclassing dspy.Metric and overriding the compute method. Use this metric to evaluate model predictions.
import os
from dspy import LM, Metric, Predict
import dspy
# Initialize DSPy LM with OpenAI GPT-4o-mini
lm = LM("openai/gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
dspy.configure(lm=lm)
# Define a custom metric class
class ExactMatchMetric(Metric):
def compute(self, prediction: str, reference: str) -> float:
# Return 1.0 if prediction exactly matches reference, else 0.0
return 1.0 if prediction.strip() == reference.strip() else 0.0
# Define a DSPy signature for QA
from dspy import Signature, InputField, OutputField
class QA(Signature):
question: str = InputField()
answer: str = OutputField()
# Create a Predict instance with the custom metric
qa = Predict(QA, metrics=[ExactMatchMetric()])
# Run prediction with evaluation
result = qa(question="What is 2 + 2?", answer="4")
# Access prediction and metric score
print("Prediction:", result.answer)
print("Exact match score:", result.metrics["ExactMatchMetric"]) output
Prediction: 4 Exact match score: 1.0
Common variations
You can create async custom metrics by defining async def compute if your metric requires async operations. Use different models by changing the LM initialization. For streaming, DSPy currently focuses on batch prediction and metric evaluation.
import asyncio
from dspy import Metric
class AsyncLengthMetric(Metric):
async def compute(self, prediction: str, reference: str) -> float:
# Example async metric: ratio of prediction length to reference length
await asyncio.sleep(0) # simulate async operation
return len(prediction) / max(len(reference), 1)
# Usage in DSPy remains the same, DSPy handles async metrics internally if supported. Troubleshooting
- If your custom metric is not being called, ensure it inherits from
dspy.Metricand thecomputemethod signature matchescompute(self, prediction: str, reference: str) -> float. - If you get type errors, verify your DSPy and OpenAI SDK versions are up to date.
- For environment variable issues, confirm
OPENAI_API_KEYis set correctly in your shell.
Key Takeaways
- Create custom DSPy metrics by subclassing
dspy.Metricand implementingcompute. - Use custom metrics in
dspy.Predictordspy.ChainOfThoughtto evaluate model outputs. - Async metrics are supported by defining
async def computefor advanced use cases. - Always configure DSPy with a valid
LMinstance and API key from environment variables. - Check method signatures and environment setup if custom metrics do not run as expected.