How to Beginner · 3 min read

How to measure LLM response latency

Q: How to measure LLM response latency

Measure LLM response latency by recording the time immediately before sending a request and right after receiving the response using Python's time module. Use SDKs like openai or anthropic to send a prompt and calculate the difference to get the latency in seconds or milliseconds.

Quick answer

Measure LLM response latency by recording the time immediately before sending a request and right after receiving the response using Python's time module. Use SDKs like openai or anthropic to send a prompt and calculate the difference to get the latency in seconds or milliseconds.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works) or Anthropic API key
pip install openai>=1.0 or pip install anthropic>=0.20

Setup

Install the required SDK and set your API key as an environment variable for secure authentication.

bash

pip install openai

Step by step

This example uses the openai SDK to measure latency by capturing timestamps before and after the API call.

python

import os
import time
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

start_time = time.perf_counter()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
end_time = time.perf_counter()

latency_seconds = end_time - start_time
print(f"LLM response latency: {latency_seconds:.3f} seconds")
print("Response content:", response.choices[0].message.content)

output

LLM response latency: 1.234 seconds
Response content: I'm doing well, thank you! How can I assist you today?

Common variations

You can measure latency asynchronously using asyncio or test different models like claude-3-5-haiku-20241022 with the Anthropic SDK. Streaming responses require measuring latency until the first token arrives.

python

import os
import time
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

start_time = time.perf_counter()
message = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=100,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
end_time = time.perf_counter()

latency_seconds = end_time - start_time
print(f"Claude LLM response latency: {latency_seconds:.3f} seconds")
print("Response content:", message.content[0].text)

output

Claude LLM response latency: 1.567 seconds
Response content: I'm doing well, thank you! How can I assist you today?

Troubleshooting

If latency seems unusually high, check your network connection and API endpoint region.
Ensure your API key is valid and has sufficient quota.
For streaming APIs, measure latency until the first token to avoid inflated times.

✅

Key Takeaways

Use high-resolution timers like time.perf_counter() to measure latency accurately.
Measure latency from just before the API call to immediately after receiving the full response.
Adjust measurement method for streaming responses by timing until the first token arrives.

Verified 2026-04 · gpt-4o-mini, claude-3-5-haiku-20241022

Verify ↗