How to measure LLM response latency
Quick answer
Measure
LLM response latency by recording the time immediately before sending a request and right after receiving the response using Python's time module. Use SDKs like openai or anthropic to send a prompt and calculate the difference to get the latency in seconds or milliseconds.PREREQUISITES
Python 3.8+OpenAI API key (free tier works) or Anthropic API keypip install openai>=1.0 or pip install anthropic>=0.20
Setup
Install the required SDK and set your API key as an environment variable for secure authentication.
pip install openai Step by step
This example uses the openai SDK to measure latency by capturing timestamps before and after the API call.
import os
import time
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
start_time = time.perf_counter()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
end_time = time.perf_counter()
latency_seconds = end_time - start_time
print(f"LLM response latency: {latency_seconds:.3f} seconds")
print("Response content:", response.choices[0].message.content) output
LLM response latency: 1.234 seconds Response content: I'm doing well, thank you! How can I assist you today?
Common variations
You can measure latency asynchronously using asyncio or test different models like claude-3-5-haiku-20241022 with the Anthropic SDK. Streaming responses require measuring latency until the first token arrives.
import os
import time
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
start_time = time.perf_counter()
message = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=100,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
end_time = time.perf_counter()
latency_seconds = end_time - start_time
print(f"Claude LLM response latency: {latency_seconds:.3f} seconds")
print("Response content:", message.content[0].text) output
Claude LLM response latency: 1.567 seconds Response content: I'm doing well, thank you! How can I assist you today?
Troubleshooting
- If latency seems unusually high, check your network connection and API endpoint region.
- Ensure your API key is valid and has sufficient quota.
- For streaming APIs, measure latency until the first token to avoid inflated times.
Key Takeaways
- Use high-resolution timers like
time.perf_counter()to measure latency accurately. - Measure latency from just before the API call to immediately after receiving the full response.
- Adjust measurement method for streaming responses by timing until the first token arrives.