Comparison beginner · 3 min read

ConversationBufferMemory vs ConversationSummaryMemory

Quick answer
ConversationBufferMemory stores the full chat history in memory, preserving all messages for context. ConversationSummaryMemory compresses past interactions into a summary, reducing memory usage while retaining key information.

VERDICT

Use ConversationBufferMemory for detailed context retention in short to medium conversations; use ConversationSummaryMemory to efficiently manage long conversations with limited memory.
Memory TypeStorage MethodContext DetailMemory UsageBest forAPI Access
ConversationBufferMemoryStores full message historyHigh (all messages)High (grows with conversation)Short/medium chats needing full contextLangChain, custom implementations
ConversationSummaryMemoryStores summarized historyMedium (key points only)Low (fixed summary size)Long chats with memory constraintsLangChain, custom implementations
ConversationBufferMemoryImmediate recall of all messagesExact past contextConsumes more tokensDebugging and detailed contextLangChain
ConversationSummaryMemoryUses LLM to generate summaryAbstracted contextSaves tokens and computeScaling long conversationsLangChain

Key differences

ConversationBufferMemory retains the entire conversation history verbatim, providing exact context but increasing memory and token usage as the chat grows. ConversationSummaryMemory uses an LLM to generate a concise summary of past messages, reducing token usage and memory footprint but potentially losing fine-grained details.

The buffer memory is straightforward and ideal for short to medium conversations, while summary memory is optimized for long-running chats where token limits and efficiency matter.

Side-by-side example: ConversationBufferMemory

This example shows how to use ConversationBufferMemory in a LangChain chat application to keep full chat history.

python
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
import os

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])
memory = ConversationBufferMemory()

chain = ConversationChain(llm=llm, memory=memory)

response1 = chain.invoke({"input": "Hello, who won the 2024 Olympics?"})
response2 = chain.invoke({"input": "What about the previous games?"})

print("Full chat history:")
print(memory.buffer)
output
Full chat history:
Human: Hello, who won the 2024 Olympics?
AI: [LLM response]
Human: What about the previous games?
AI: [LLM response]

Equivalent example: ConversationSummaryMemory

This example uses ConversationSummaryMemory to summarize past conversation, keeping context concise for long chats.

python
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationChain
import os

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])
memory = ConversationSummaryMemory(llm=llm, max_token_limit=500)

chain = ConversationChain(llm=llm, memory=memory)

response1 = chain.invoke({"input": "Hello, who won the 2024 Olympics?"})
response2 = chain.invoke({"input": "What about the previous games?"})

print("Summary of chat history:")
print(memory.summary)
output
Summary of chat history:
The user asked about the winners of the 2024 Olympics and previous games. The assistant provided relevant information.

When to use each

Use ConversationBufferMemory when you need exact recall of all messages, such as debugging, short conversations, or when token limits are not a concern.

Use ConversationSummaryMemory for long conversations where token limits and cost are critical, and a high-level summary suffices to maintain context.

Use caseRecommended MemoryReason
Short chats with detailed contextConversationBufferMemoryPreserves full message history for accuracy
Long-running conversationsConversationSummaryMemoryReduces token usage by summarizing context
Debugging or audit trailsConversationBufferMemoryComplete history needed for traceability
Cost-sensitive applicationsConversationSummaryMemoryMinimizes token consumption and cost

Pricing and access

Both memory types are implemented in LangChain and require an LLM API key (e.g., OpenAI). The cost depends on the underlying LLM usage, with ConversationSummaryMemory typically reducing token usage and cost.

OptionFreePaidAPI access
ConversationBufferMemoryYes (LangChain open source)LLM API usage costs applyYes (LangChain + LLM API)
ConversationSummaryMemoryYes (LangChain open source)LLM API usage costs apply, usually lowerYes (LangChain + LLM API)

Key Takeaways

  • ConversationBufferMemory stores full chat history, ideal for short or detailed conversations.
  • ConversationSummaryMemory compresses history into summaries, saving tokens for long chats.
  • Choose memory type based on conversation length, token limits, and cost sensitivity.
Verified 2026-04 · gpt-4o-mini
Verify ↗