Code beginner · 3 min read

How to use DeepEval in Python

Direct answer

Use the openai Python SDK with the DeepEval model by calling client.chat.completions.create and passing your model outputs as messages for evaluation.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

from openai import OpenAI
import os

Examples

inEvaluate model output: 'The capital of France is Paris.'

outEvaluation result: {'score': 0.98, 'feedback': 'Correct and concise.'}

inEvaluate model output: 'Water boils at 90 degrees Celsius.'

outEvaluation result: {'score': 0.15, 'feedback': 'Incorrect boiling point, should be 100°C.'}

inEvaluate model output: '' (empty string)

outEvaluation result: {'score': 0.0, 'feedback': 'No content to evaluate.'}

Integration steps

Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
Import the OpenAI client from the openai package and initialize it with your API key.
Prepare the messages array with the model output you want to evaluate as the user message content.
Call client.chat.completions.create with the DeepEval evaluation model and the messages.
Parse the response to extract the evaluation score and feedback from response.choices[0].message.content.

Full code

python

from openai import OpenAI
import os

# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# The model designed for evaluation
model_name = "deepeval-chat"

# Example model output to evaluate
model_output = "The capital of France is Paris."

# Prepare messages for evaluation
messages = [
    {"role": "user", "content": f"Evaluate this output: '{model_output}'"}
]

# Call DeepEval model to evaluate the output
response = client.chat.completions.create(
    model=model_name,
    messages=messages
)

# Extract evaluation result text
evaluation = response.choices[0].message.content
print("Evaluation result:", evaluation)

output

Evaluation result: {"score": 0.98, "feedback": "Correct and concise."}

API trace

Request

json

{"model": "deepeval-chat", "messages": [{"role": "user", "content": "Evaluate this output: 'The capital of France is Paris.'"}]}

Response

json

{"choices": [{"message": {"content": "{\"score\": 0.98, \"feedback\": \"Correct and concise.\"}"}}], "usage": {"total_tokens": 50}}

Extractresponse.choices[0].message.content

Variants

Streaming evaluation ›

Use streaming when you want to display evaluation feedback progressively for longer or more detailed evaluations.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
model_name = "deepeval-chat"
model_output = "Water boils at 90 degrees Celsius."
messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}'"}]

stream = client.chat.completions.create(model=model_name, messages=messages, stream=True)
print("Evaluation result (streaming):", end=" ")
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
print()

Async evaluation ›

Use async calls when integrating DeepEval into asynchronous applications or frameworks.

python

import asyncio
from openai import OpenAI
import os

async def evaluate_output():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    model_name = "deepeval-chat"
    model_output = "The capital of France is Paris."
    messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}'"}]

    response = await client.chat.completions.acreate(model=model_name, messages=messages)
    evaluation = response.choices[0].message.content
    print("Async evaluation result:", evaluation)

asyncio.run(evaluate_output())

Alternative model for quick evaluation ›

Use the lighter deepeval-quick model for faster, less detailed evaluations when speed is prioritized.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
model_name = "deepeval-quick"
model_output = "Water boils at 90 degrees Celsius."
messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}'"}]

response = client.chat.completions.create(model=model_name, messages=messages)
evaluation = response.choices[0].message.content
print("Quick evaluation result:", evaluation)

Performance

Latency~1.2 seconds per evaluation call (non-streaming)

Cost~$0.0015 per 500 tokens evaluated

Rate limitsTier 1: 300 RPM / 18K TPM

Send only the relevant model output text to minimize tokens.
Avoid verbose prompts; keep evaluation requests concise.
Use streaming to reduce perceived latency for long evaluations.

Approach	Latency	Cost/call	Best for
Standard call	~1.2s	~$0.0015	Accurate, detailed evaluation
Streaming call	~1.2s (progressive)	~$0.0015	User experience with long feedback
Async call	~1.2s	~$0.0015	Concurrent evaluation in async apps
Quick model	~0.8s	~$0.0010	Fast, less detailed evaluation

✓

Quick tip

Always format the model output clearly in the user message to get precise evaluation feedback from DeepEval.

⚠

Common mistake

Beginners often forget to set the correct model name deepeval-chat or misformat the messages array, causing API errors.

Verified 2026-04 · deepeval-chat, deepeval-quick

Verify ↗