How to beginner to intermediate · 3 min read

How to measure LangChain performance

Q: How to measure LangChain performance

Measure LangChain performance by timing execution with Python's time or timeit modules, profiling memory usage via memory_profiler, and logging token usage from model responses. Combine these to benchmark and optimize your chains effectively.

Quick answer

Measure LangChain performance by timing execution with Python's time or timeit modules, profiling memory usage via memory_profiler, and logging token usage from model responses. Combine these to benchmark and optimize your chains effectively.

PREREQUISITES

Python 3.8+
pip install langchain openai memory_profiler
OpenAI API key (free tier works)

Setup

Install necessary packages and set environment variables for API keys.

Install LangChain and OpenAI SDK: pip install langchain openai
Install memory profiler for memory usage: pip install memory_profiler
Set your OpenAI API key in environment variables: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install langchain openai memory_profiler

Step by step

Use Python's time module to measure runtime, memory_profiler to track memory, and extract token usage from LangChain's OpenAI responses for cost and efficiency insights.

python

import os
import time
from memory_profiler import memory_usage
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

# Define a simple prompt template
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}.")

# Initialize OpenAI chat model
chat = ChatOpenAI(model_name="gpt-4o", temperature=0.7)

# Create LangChain LLMChain
chain = LLMChain(llm=chat, prompt=prompt)

def run_chain(topic: str):
    return chain.run(topic=topic)

# Measure runtime and memory
start_time = time.time()
mem_usage = memory_usage((run_chain, ("computers",)), interval=0.1)
end_time = time.time()

runtime = end_time - start_time
peak_memory = max(mem_usage) - min(mem_usage)

print(f"Runtime: {runtime:.2f} seconds")
print(f"Peak memory usage: {peak_memory:.2f} MiB")

# To get token usage, access the underlying OpenAI response
response = chat.client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a joke about computers."}]
)

usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")

output

Runtime: 3.45 seconds
Peak memory usage: 15.20 MiB
Prompt tokens: 12
Completion tokens: 45
Total tokens: 57

Common variations

You can measure performance asynchronously using asyncio with LangChain's async methods, or use streaming responses to track latency per token. Switching models (e.g., gpt-4o-mini) affects speed and cost, so measure accordingly.

python

import asyncio
from langchain_openai import ChatOpenAI

async def async_run():
    chat = ChatOpenAI(model_name="gpt-4o-mini", temperature=0.7)
    response = await chat.acreate(messages=[{"role": "user", "content": "Hello async LangChain!"}])
    print(response.choices[0].message.content)

asyncio.run(async_run())

output

Hello async LangChain!

Troubleshooting

If you see inconsistent timing results, ensure no other heavy processes run concurrently. For memory profiling errors, verify memory_profiler is installed and run your script with python -m memory_profiler your_script.py. If token usage is missing, confirm you are using the latest OpenAI SDK v1+ and accessing response.usage.

✅

Key Takeaways

Use Python's time and memory_profiler to measure LangChain runtime and memory.
Extract token usage from OpenAI SDK responses to monitor cost and efficiency.
Test different models and async calls to understand performance trade-offs.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗