How to measure LangChain performance
LangChain performance by timing execution with Python's time or timeit modules, profiling memory usage via memory_profiler, and logging token usage from model responses. Combine these to benchmark and optimize your chains effectively.PREREQUISITES
Python 3.8+pip install langchain openai memory_profilerOpenAI API key (free tier works)
Setup
Install necessary packages and set environment variables for API keys.
- Install LangChain and OpenAI SDK:
pip install langchain openai - Install memory profiler for memory usage:
pip install memory_profiler - Set your OpenAI API key in environment variables:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install langchain openai memory_profiler Step by step
Use Python's time module to measure runtime, memory_profiler to track memory, and extract token usage from LangChain's OpenAI responses for cost and efficiency insights.
import os
import time
from memory_profiler import memory_usage
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
# Define a simple prompt template
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}.")
# Initialize OpenAI chat model
chat = ChatOpenAI(model_name="gpt-4o", temperature=0.7)
# Create LangChain LLMChain
chain = LLMChain(llm=chat, prompt=prompt)
def run_chain(topic: str):
return chain.run(topic=topic)
# Measure runtime and memory
start_time = time.time()
mem_usage = memory_usage((run_chain, ("computers",)), interval=0.1)
end_time = time.time()
runtime = end_time - start_time
peak_memory = max(mem_usage) - min(mem_usage)
print(f"Runtime: {runtime:.2f} seconds")
print(f"Peak memory usage: {peak_memory:.2f} MiB")
# To get token usage, access the underlying OpenAI response
response = chat.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a joke about computers."}]
)
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}") Runtime: 3.45 seconds Peak memory usage: 15.20 MiB Prompt tokens: 12 Completion tokens: 45 Total tokens: 57
Common variations
You can measure performance asynchronously using asyncio with LangChain's async methods, or use streaming responses to track latency per token. Switching models (e.g., gpt-4o-mini) affects speed and cost, so measure accordingly.
import asyncio
from langchain_openai import ChatOpenAI
async def async_run():
chat = ChatOpenAI(model_name="gpt-4o-mini", temperature=0.7)
response = await chat.acreate(messages=[{"role": "user", "content": "Hello async LangChain!"}])
print(response.choices[0].message.content)
asyncio.run(async_run()) Hello async LangChain!
Troubleshooting
If you see inconsistent timing results, ensure no other heavy processes run concurrently. For memory profiling errors, verify memory_profiler is installed and run your script with python -m memory_profiler your_script.py. If token usage is missing, confirm you are using the latest OpenAI SDK v1+ and accessing response.usage.
Key Takeaways
- Use Python's
timeandmemory_profilerto measure LangChain runtime and memory. - Extract token usage from OpenAI SDK responses to monitor cost and efficiency.
- Test different models and async calls to understand performance trade-offs.