Comparison beginner · 3 min read

ConversationBufferMemory vs ConversationSummaryMemory

Quick answer

ConversationBufferMemory stores the full chat history in memory, preserving all messages for context. ConversationSummaryMemory compresses past interactions into a summary, reducing memory usage while retaining key information.

VERDICT

Use ConversationBufferMemory for detailed context retention in short to medium conversations; use ConversationSummaryMemory to efficiently manage long conversations with limited memory.

Memory Type	Storage Method	Context Detail	Memory Usage	Best for	API Access
ConversationBufferMemory	Stores full message history	High (all messages)	High (grows with conversation)	Short/medium chats needing full context	LangChain, custom implementations
ConversationSummaryMemory	Stores summarized history	Medium (key points only)	Low (fixed summary size)	Long chats with memory constraints	LangChain, custom implementations
ConversationBufferMemory	Immediate recall of all messages	Exact past context	Consumes more tokens	Debugging and detailed context	LangChain
ConversationSummaryMemory	Uses LLM to generate summary	Abstracted context	Saves tokens and compute	Scaling long conversations	LangChain

Key differences

ConversationBufferMemory retains the entire conversation history verbatim, providing exact context but increasing memory and token usage as the chat grows. ConversationSummaryMemory uses an LLM to generate a concise summary of past messages, reducing token usage and memory footprint but potentially losing fine-grained details.

The buffer memory is straightforward and ideal for short to medium conversations, while summary memory is optimized for long-running chats where token limits and efficiency matter.

Side-by-side example: ConversationBufferMemory

This example shows how to use ConversationBufferMemory in a LangChain chat application to keep full chat history.

python

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
import os

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])
memory = ConversationBufferMemory()

chain = ConversationChain(llm=llm, memory=memory)

response1 = chain.invoke({"input": "Hello, who won the 2024 Olympics?"})
response2 = chain.invoke({"input": "What about the previous games?"})

print("Full chat history:")
print(memory.buffer)

output

Full chat history:
Human: Hello, who won the 2024 Olympics?
AI: [LLM response]
Human: What about the previous games?
AI: [LLM response]

Equivalent example: ConversationSummaryMemory

This example uses ConversationSummaryMemory to summarize past conversation, keeping context concise for long chats.

python

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationChain
import os

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])
memory = ConversationSummaryMemory(llm=llm, max_token_limit=500)

chain = ConversationChain(llm=llm, memory=memory)

response1 = chain.invoke({"input": "Hello, who won the 2024 Olympics?"})
response2 = chain.invoke({"input": "What about the previous games?"})

print("Summary of chat history:")
print(memory.summary)

output

Summary of chat history:
The user asked about the winners of the 2024 Olympics and previous games. The assistant provided relevant information.

When to use each

Use ConversationBufferMemory when you need exact recall of all messages, such as debugging, short conversations, or when token limits are not a concern.

Use ConversationSummaryMemory for long conversations where token limits and cost are critical, and a high-level summary suffices to maintain context.

Use case	Recommended Memory	Reason
Short chats with detailed context	ConversationBufferMemory	Preserves full message history for accuracy
Long-running conversations	ConversationSummaryMemory	Reduces token usage by summarizing context
Debugging or audit trails	ConversationBufferMemory	Complete history needed for traceability
Cost-sensitive applications	ConversationSummaryMemory	Minimizes token consumption and cost

Pricing and access

Both memory types are implemented in LangChain and require an LLM API key (e.g., OpenAI). The cost depends on the underlying LLM usage, with ConversationSummaryMemory typically reducing token usage and cost.

Option	Free	Paid	API access
ConversationBufferMemory	Yes (LangChain open source)	LLM API usage costs apply	Yes (LangChain + LLM API)
ConversationSummaryMemory	Yes (LangChain open source)	LLM API usage costs apply, usually lower	Yes (LangChain + LLM API)

Key Takeaways

ConversationBufferMemory stores full chat history, ideal for short or detailed conversations.
ConversationSummaryMemory compresses history into summaries, saving tokens for long chats.
Choose memory type based on conversation length, token limits, and cost sensitivity.

Verified 2026-04 · gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.