ConversationBufferMemory vs ConversationSummaryMemory
VERDICT
| Memory Type | Storage Method | Context Detail | Memory Usage | Best for | API Access |
|---|---|---|---|---|---|
| ConversationBufferMemory | Stores full message history | High (all messages) | High (grows with conversation) | Short/medium chats needing full context | LangChain, custom implementations |
| ConversationSummaryMemory | Stores summarized history | Medium (key points only) | Low (fixed summary size) | Long chats with memory constraints | LangChain, custom implementations |
| ConversationBufferMemory | Immediate recall of all messages | Exact past context | Consumes more tokens | Debugging and detailed context | LangChain |
| ConversationSummaryMemory | Uses LLM to generate summary | Abstracted context | Saves tokens and compute | Scaling long conversations | LangChain |
Key differences
ConversationBufferMemory retains the entire conversation history verbatim, providing exact context but increasing memory and token usage as the chat grows. ConversationSummaryMemory uses an LLM to generate a concise summary of past messages, reducing token usage and memory footprint but potentially losing fine-grained details.
The buffer memory is straightforward and ideal for short to medium conversations, while summary memory is optimized for long-running chats where token limits and efficiency matter.
Side-by-side example: ConversationBufferMemory
This example shows how to use ConversationBufferMemory in a LangChain chat application to keep full chat history.
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
import os
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
response1 = chain.invoke({"input": "Hello, who won the 2024 Olympics?"})
response2 = chain.invoke({"input": "What about the previous games?"})
print("Full chat history:")
print(memory.buffer) Full chat history: Human: Hello, who won the 2024 Olympics? AI: [LLM response] Human: What about the previous games? AI: [LLM response]
Equivalent example: ConversationSummaryMemory
This example uses ConversationSummaryMemory to summarize past conversation, keeping context concise for long chats.
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationChain
import os
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.environ["OPENAI_API_KEY"])
memory = ConversationSummaryMemory(llm=llm, max_token_limit=500)
chain = ConversationChain(llm=llm, memory=memory)
response1 = chain.invoke({"input": "Hello, who won the 2024 Olympics?"})
response2 = chain.invoke({"input": "What about the previous games?"})
print("Summary of chat history:")
print(memory.summary) Summary of chat history: The user asked about the winners of the 2024 Olympics and previous games. The assistant provided relevant information.
When to use each
Use ConversationBufferMemory when you need exact recall of all messages, such as debugging, short conversations, or when token limits are not a concern.
Use ConversationSummaryMemory for long conversations where token limits and cost are critical, and a high-level summary suffices to maintain context.
| Use case | Recommended Memory | Reason |
|---|---|---|
| Short chats with detailed context | ConversationBufferMemory | Preserves full message history for accuracy |
| Long-running conversations | ConversationSummaryMemory | Reduces token usage by summarizing context |
| Debugging or audit trails | ConversationBufferMemory | Complete history needed for traceability |
| Cost-sensitive applications | ConversationSummaryMemory | Minimizes token consumption and cost |
Pricing and access
Both memory types are implemented in LangChain and require an LLM API key (e.g., OpenAI). The cost depends on the underlying LLM usage, with ConversationSummaryMemory typically reducing token usage and cost.
| Option | Free | Paid | API access |
|---|---|---|---|
| ConversationBufferMemory | Yes (LangChain open source) | LLM API usage costs apply | Yes (LangChain + LLM API) |
| ConversationSummaryMemory | Yes (LangChain open source) | LLM API usage costs apply, usually lower | Yes (LangChain + LLM API) |
Key Takeaways
- ConversationBufferMemory stores full chat history, ideal for short or detailed conversations.
- ConversationSummaryMemory compresses history into summaries, saving tokens for long chats.
- Choose memory type based on conversation length, token limits, and cost sensitivity.