How to beginner to intermediate · 3 min read

How to measure LangChain performance

Quick answer
Measure LangChain performance by timing execution with Python's time or timeit modules, profiling memory usage via memory_profiler, and logging token usage from model responses. Combine these to benchmark and optimize your chains effectively.

PREREQUISITES

  • Python 3.8+
  • pip install langchain openai memory_profiler
  • OpenAI API key (free tier works)

Setup

Install necessary packages and set environment variables for API keys.

  • Install LangChain and OpenAI SDK: pip install langchain openai
  • Install memory profiler for memory usage: pip install memory_profiler
  • Set your OpenAI API key in environment variables: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install langchain openai memory_profiler

Step by step

Use Python's time module to measure runtime, memory_profiler to track memory, and extract token usage from LangChain's OpenAI responses for cost and efficiency insights.

python
import os
import time
from memory_profiler import memory_usage
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

# Define a simple prompt template
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}.")

# Initialize OpenAI chat model
chat = ChatOpenAI(model_name="gpt-4o", temperature=0.7)

# Create LangChain LLMChain
chain = LLMChain(llm=chat, prompt=prompt)

def run_chain(topic: str):
    return chain.run(topic=topic)

# Measure runtime and memory
start_time = time.time()
mem_usage = memory_usage((run_chain, ("computers",)), interval=0.1)
end_time = time.time()

runtime = end_time - start_time
peak_memory = max(mem_usage) - min(mem_usage)

print(f"Runtime: {runtime:.2f} seconds")
print(f"Peak memory usage: {peak_memory:.2f} MiB")

# To get token usage, access the underlying OpenAI response
response = chat.client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a joke about computers."}]
)

usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")
output
Runtime: 3.45 seconds
Peak memory usage: 15.20 MiB
Prompt tokens: 12
Completion tokens: 45
Total tokens: 57

Common variations

You can measure performance asynchronously using asyncio with LangChain's async methods, or use streaming responses to track latency per token. Switching models (e.g., gpt-4o-mini) affects speed and cost, so measure accordingly.

python
import asyncio
from langchain_openai import ChatOpenAI

async def async_run():
    chat = ChatOpenAI(model_name="gpt-4o-mini", temperature=0.7)
    response = await chat.acreate(messages=[{"role": "user", "content": "Hello async LangChain!"}])
    print(response.choices[0].message.content)

asyncio.run(async_run())
output
Hello async LangChain!

Troubleshooting

If you see inconsistent timing results, ensure no other heavy processes run concurrently. For memory profiling errors, verify memory_profiler is installed and run your script with python -m memory_profiler your_script.py. If token usage is missing, confirm you are using the latest OpenAI SDK v1+ and accessing response.usage.

Key Takeaways

  • Use Python's time and memory_profiler to measure LangChain runtime and memory.
  • Extract token usage from OpenAI SDK responses to monitor cost and efficiency.
  • Test different models and async calls to understand performance trade-offs.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗