Code beginner · 3 min read

How to use DeepEval in Python

Direct answer
Use the DeepEval API in Python by sending your model outputs and references to the DeepEval endpoint via the OpenAI-compatible client, then parse the returned evaluation scores from the response.

Setup

Install
bash
pip install openai
Env vars
DEEPSEEK_API_KEY
Imports
python
from openai import OpenAI
import os

Examples

inEvaluate model output 'The cat sat on the mat.' against reference 'A cat is sitting on a mat.'
outScore: 0.92, Feedback: Output is semantically close to the reference.
inEvaluate model output 'The quick brown fox jumps.' against reference 'A fast fox leaps over the lazy dog.'
outScore: 0.75, Feedback: Output captures some meaning but misses details.
inEvaluate model output '' (empty) against reference 'Hello world!'
outScore: 0.0, Feedback: Output is empty, no content to evaluate.

Integration steps

  1. Install the OpenAI Python SDK and set the DEEPSEEK_API_KEY environment variable.
  2. Import the OpenAI client and initialize it with your API key from os.environ.
  3. Prepare the evaluation request by including the model output and reference text in the messages array.
  4. Call the DeepEval model endpoint using client.chat.completions.create with model='deepseek-chat'.
  5. Extract the evaluation score and feedback from the response's choices[0].message.content.
  6. Use or display the evaluation results as needed in your application.

Full code

python
from openai import OpenAI
import os

# Initialize DeepSeek client
client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])

# Define model output and reference
model_output = "The cat sat on the mat."
reference_text = "A cat is sitting on a mat."

# Prepare messages for DeepEval
messages = [
    {"role": "user", "content": f"Evaluate this output: '{model_output}' against reference: '{reference_text}'"}
]

# Call DeepEval via deepseek-chat model
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=messages
)

# Extract evaluation result
evaluation = response.choices[0].message.content
print("Evaluation Result:", evaluation)
output
Evaluation Result: Score: 0.92, Feedback: Output is semantically close to the reference.

API trace

Request
json
{"model": "deepseek-chat", "messages": [{"role": "user", "content": "Evaluate this output: 'The cat sat on the mat.' against reference: 'A cat is sitting on a mat.'"}]}
Response
json
{"choices": [{"message": {"content": "Score: 0.92, Feedback: Output is semantically close to the reference."}}], "usage": {"total_tokens": 45}}
Extractresponse.choices[0].message.content

Variants

Streaming evaluation output

Use streaming when you want to display evaluation results progressively for better user experience.

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])

model_output = "The cat sat on the mat."
reference_text = "A cat is sitting on a mat."

messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}' against reference: '{reference_text}'"}]

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=messages,
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.get('content', ''), end='')
print()
Async evaluation call

Use async calls to integrate DeepEval in applications requiring concurrency or non-blocking behavior.

python
import asyncio
from openai import OpenAI
import os

async def evaluate_async():
    client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])
    model_output = "The cat sat on the mat."
    reference_text = "A cat is sitting on a mat."
    messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}' against reference: '{reference_text}'"}]
    response = await client.chat.completions.acreate(
        model="deepseek-chat",
        messages=messages
    )
    print("Async Evaluation Result:", response.choices[0].message.content)

asyncio.run(evaluate_async())
Alternative model for evaluation

Use the deepseek-reasoner model for more complex or reasoning-based evaluation tasks.

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"])

model_output = "The cat sat on the mat."
reference_text = "A cat is sitting on a mat."

messages = [{"role": "user", "content": f"Evaluate this output: '{model_output}' against reference: '{reference_text}'"}]

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages
)

print("Evaluation with Reasoner model:", response.choices[0].message.content)

Performance

Latency~900ms for deepseek-chat non-streaming evaluation
Cost~$0.0015 per 100 tokens evaluated
Rate limitsTier 1: 400 RPM / 25K TPM
  • Keep evaluation prompts concise to reduce token usage.
  • Batch multiple evaluations in one request when possible.
  • Avoid sending unnecessary context to minimize cost.
ApproachLatencyCost/callBest for
Standard call~900ms~$0.0015Simple synchronous evaluation
Streaming call~900ms (progressive)~$0.0015Real-time UI feedback
Async call~900ms~$0.0015Concurrent or non-blocking apps

Quick tip

Always format your evaluation prompt clearly with both output and reference to get precise DeepEval scores.

Common mistake

Beginners often forget to set the DEEPSEEK_API_KEY environment variable, causing authentication errors.

Verified 2026-04 · deepseek-chat, deepseek-reasoner
Verify ↗