How to use DeepEval in Python
Direct answer
Use the openai Python SDK with the DeepEval model by calling client.chat.completions.create and passing your model outputs as messages for evaluation.
Setup
Install
pip install openai Env vars
OPENAI_API_KEY Imports
from openai import OpenAI
import os Examples
inEvaluate model output: 'The capital of France is Paris.'
outEvaluation result: {'score': 0.98, 'feedback': 'Correct and concise.'}
inEvaluate model output: 'Water boils at 90 degrees Celsius.'
outEvaluation result: {'score': 0.15, 'feedback': 'Incorrect boiling point, should be 100°C.'}
inEvaluate model output: '' (empty string)
outEvaluation result: {'score': 0.0, 'feedback': 'No content to evaluate.'}
Integration steps
- Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
- Import the OpenAI client from the openai package and initialize it with your API key.
- Prepare the messages array with the model output you want to evaluate as the user message content.
- Call client.chat.completions.create with the DeepEval evaluation model and the messages.
- Parse the response to extract the evaluation score and feedback from response.choices[0].message.content.
Full code
from openai import OpenAI
import os
# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# The model designed for evaluation
model_name = "deepeval-chat"
# Example model output to evaluate
model_output = "The capital of France is Paris."
# Prepare messages for evaluation
messages = [
{"role": "user", "content": f"Evaluate this output: '{model_output}'"}
]
# Call DeepEval model to evaluate the output
response = client.chat.completions.create(
model=model_name,
messages=messages
)
# Extract evaluation result text
evaluation = response.choices[0].message.content
print("Evaluation result:", evaluation) output
Evaluation result: {"score": 0.98, "feedback": "Correct and concise."} API trace
Request
{"model": "deepeval-chat", "messages": [{"role": "user", "content": "Evaluate this output: 'The capital of France is Paris.'"}]} Response
{"choices": [{"message": {"content": "{\"score\": 0.98, \"feedback\": \"Correct and concise.\"}"}}], "usage": {"total_tokens": 50}} Extract
response.choices[0].message.contentVariants
Streaming evaluation ›
Use streaming when you want to display evaluation feedback progressively for longer or more detailed evaluations.
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
model_name = "deepeval-chat"
model_output = "Water boils at 90 degrees Celsius."
messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}'"}]
stream = client.chat.completions.create(model=model_name, messages=messages, stream=True)
print("Evaluation result (streaming):", end=" ")
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
print() Async evaluation ›
Use async calls when integrating DeepEval into asynchronous applications or frameworks.
import asyncio
from openai import OpenAI
import os
async def evaluate_output():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
model_name = "deepeval-chat"
model_output = "The capital of France is Paris."
messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}'"}]
response = await client.chat.completions.acreate(model=model_name, messages=messages)
evaluation = response.choices[0].message.content
print("Async evaluation result:", evaluation)
asyncio.run(evaluate_output()) Alternative model for quick evaluation ›
Use the lighter deepeval-quick model for faster, less detailed evaluations when speed is prioritized.
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
model_name = "deepeval-quick"
model_output = "Water boils at 90 degrees Celsius."
messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}'"}]
response = client.chat.completions.create(model=model_name, messages=messages)
evaluation = response.choices[0].message.content
print("Quick evaluation result:", evaluation) Performance
Latency~1.2 seconds per evaluation call (non-streaming)
Cost~$0.0015 per 500 tokens evaluated
Rate limitsTier 1: 300 RPM / 18K TPM
- Send only the relevant model output text to minimize tokens.
- Avoid verbose prompts; keep evaluation requests concise.
- Use streaming to reduce perceived latency for long evaluations.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard call | ~1.2s | ~$0.0015 | Accurate, detailed evaluation |
| Streaming call | ~1.2s (progressive) | ~$0.0015 | User experience with long feedback |
| Async call | ~1.2s | ~$0.0015 | Concurrent evaluation in async apps |
| Quick model | ~0.8s | ~$0.0010 | Fast, less detailed evaluation |
Quick tip
Always format the model output clearly in the user message to get precise evaluation feedback from DeepEval.
Common mistake
Beginners often forget to set the correct model name deepeval-chat or misformat the messages array, causing API errors.