How to intermediate · 3 min read

How to create custom DSPy metrics

Q: How to create custom DSPy metrics

To create custom metrics in dspy, define a new class inheriting from dspy.Metric and implement the compute method with your metric logic. Then use this metric class in your dspy.Predict or dspy.ChainOfThought workflows to evaluate model outputs.

Quick answer

To create custom metrics in dspy, define a new class inheriting from dspy.Metric and implement the compute method with your metric logic. Then use this metric class in your dspy.Predict or dspy.ChainOfThought workflows to evaluate model outputs.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install dspy openai>=1.0

Setup

Install the dspy package and set your OpenAI API key as an environment variable.

Install DSPy and OpenAI SDK:

bash

pip install dspy openai>=1.0

Step by step

Define a custom metric by subclassing dspy.Metric and overriding the compute method. Use this metric to evaluate model predictions.

python

import os
from dspy import LM, Metric, Predict
import dspy

# Initialize DSPy LM with OpenAI GPT-4o-mini
lm = LM("openai/gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
dspy.configure(lm=lm)

# Define a custom metric class
class ExactMatchMetric(Metric):
    def compute(self, prediction: str, reference: str) -> float:
        # Return 1.0 if prediction exactly matches reference, else 0.0
        return 1.0 if prediction.strip() == reference.strip() else 0.0

# Define a DSPy signature for QA
from dspy import Signature, InputField, OutputField

class QA(Signature):
    question: str = InputField()
    answer: str = OutputField()

# Create a Predict instance with the custom metric
qa = Predict(QA, metrics=[ExactMatchMetric()])

# Run prediction with evaluation
result = qa(question="What is 2 + 2?", answer="4")

# Access prediction and metric score
print("Prediction:", result.answer)
print("Exact match score:", result.metrics["ExactMatchMetric"])

output

Prediction: 4
Exact match score: 1.0

Common variations

You can create async custom metrics by defining async def compute if your metric requires async operations. Use different models by changing the LM initialization. For streaming, DSPy currently focuses on batch prediction and metric evaluation.

python

import asyncio
from dspy import Metric

class AsyncLengthMetric(Metric):
    async def compute(self, prediction: str, reference: str) -> float:
        # Example async metric: ratio of prediction length to reference length
        await asyncio.sleep(0)  # simulate async operation
        return len(prediction) / max(len(reference), 1)

# Usage in DSPy remains the same, DSPy handles async metrics internally if supported.

Troubleshooting

If your custom metric is not being called, ensure it inherits from dspy.Metric and the compute method signature matches compute(self, prediction: str, reference: str) -> float.
If you get type errors, verify your DSPy and OpenAI SDK versions are up to date.
For environment variable issues, confirm OPENAI_API_KEY is set correctly in your shell.

✅

Key Takeaways

Create custom DSPy metrics by subclassing dspy.Metric and implementing compute.
Use custom metrics in dspy.Predict or dspy.ChainOfThought to evaluate model outputs.
Async metrics are supported by defining async def compute for advanced use cases.
Always configure DSPy with a valid LM instance and API key from environment variables.
Check method signatures and environment setup if custom metrics do not run as expected.

Verified 2026-04 · gpt-4o-mini

Verify ↗