How to intermediate · 3 min read

How to create custom DSPy metrics

Quick answer
To create custom metrics in dspy, define a new class inheriting from dspy.Metric and implement the compute method with your metric logic. Then use this metric class in your dspy.Predict or dspy.ChainOfThought workflows to evaluate model outputs.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install dspy openai>=1.0

Setup

Install the dspy package and set your OpenAI API key as an environment variable.

  • Install DSPy and OpenAI SDK:
bash
pip install dspy openai>=1.0

Step by step

Define a custom metric by subclassing dspy.Metric and overriding the compute method. Use this metric to evaluate model predictions.

python
import os
from dspy import LM, Metric, Predict
import dspy

# Initialize DSPy LM with OpenAI GPT-4o-mini
lm = LM("openai/gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
dspy.configure(lm=lm)

# Define a custom metric class
class ExactMatchMetric(Metric):
    def compute(self, prediction: str, reference: str) -> float:
        # Return 1.0 if prediction exactly matches reference, else 0.0
        return 1.0 if prediction.strip() == reference.strip() else 0.0

# Define a DSPy signature for QA
from dspy import Signature, InputField, OutputField

class QA(Signature):
    question: str = InputField()
    answer: str = OutputField()

# Create a Predict instance with the custom metric
qa = Predict(QA, metrics=[ExactMatchMetric()])

# Run prediction with evaluation
result = qa(question="What is 2 + 2?", answer="4")

# Access prediction and metric score
print("Prediction:", result.answer)
print("Exact match score:", result.metrics["ExactMatchMetric"])
output
Prediction: 4
Exact match score: 1.0

Common variations

You can create async custom metrics by defining async def compute if your metric requires async operations. Use different models by changing the LM initialization. For streaming, DSPy currently focuses on batch prediction and metric evaluation.

python
import asyncio
from dspy import Metric

class AsyncLengthMetric(Metric):
    async def compute(self, prediction: str, reference: str) -> float:
        # Example async metric: ratio of prediction length to reference length
        await asyncio.sleep(0)  # simulate async operation
        return len(prediction) / max(len(reference), 1)

# Usage in DSPy remains the same, DSPy handles async metrics internally if supported.

Troubleshooting

  • If your custom metric is not being called, ensure it inherits from dspy.Metric and the compute method signature matches compute(self, prediction: str, reference: str) -> float.
  • If you get type errors, verify your DSPy and OpenAI SDK versions are up to date.
  • For environment variable issues, confirm OPENAI_API_KEY is set correctly in your shell.

Key Takeaways

  • Create custom DSPy metrics by subclassing dspy.Metric and implementing compute.
  • Use custom metrics in dspy.Predict or dspy.ChainOfThought to evaluate model outputs.
  • Async metrics are supported by defining async def compute for advanced use cases.
  • Always configure DSPy with a valid LM instance and API key from environment variables.
  • Check method signatures and environment setup if custom metrics do not run as expected.
Verified 2026-04 · gpt-4o-mini
Verify ↗