How to add memory to LlamaIndex chat
Quick answer
To add memory to
LlamaIndex chat, use a vector store retriever like FAISS or Chroma to persist and retrieve conversation context. Integrate this retriever with LLMPredictor and ServiceContext to enable chat memory across sessions.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install llama-index faiss-cpu openai
Setup
Install the required packages and set your OpenAI API key as an environment variable.
pip install llama-index faiss-cpu openai Step by step
This example demonstrates adding memory to a LlamaIndex chat by using a FAISS vector store retriever to store and retrieve conversation context.
import os
from llama_index import (
GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext, LLMPredictor, PromptHelper
)
from langchain_openai import ChatOpenAI
from llama_index.vector_stores import FAISS
from llama_index.retrievers import VectorIndexRetriever
# Set your OpenAI API key in environment variable
# export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]
# Initialize LLM predictor with OpenAI GPT-4o model
llm_predictor = LLMPredictor(
llm=ChatOpenAI(model_name="gpt-4o", temperature=0, openai_api_key=os.environ["OPENAI_API_KEY"])
)
# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()
# Create vector store index
index = GPTVectorStoreIndex.from_documents(documents, llm_predictor=llm_predictor)
# Persist vector store to disk (optional)
index.storage_context.persist(persist_dir="./storage")
# Create a retriever from the vector store
retriever = VectorIndexRetriever(index=index)
# Example chat memory function
def chat_with_memory(query: str):
# Retrieve relevant context from memory
relevant_docs = retriever.retrieve(query)
# Combine retrieved docs with query for context-aware response
response = llm_predictor.llm.chat(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": query},
{"role": "assistant", "content": ""}
]
)
return response.choices[0].message.content
# Example usage
if __name__ == "__main__":
user_input = "Explain the main points from the documents."
answer = chat_with_memory(user_input)
print("Assistant:", answer) output
Assistant: The main points from the documents are ...
Common variations
- Use
Chromainstead ofFAISSfor vector storage. - Use
asyncmethods if your environment supports asynchronous calls. - Switch to other LLMs like
gpt-4o-miniorgemini-1.5-proby changing themodel_nameinChatOpenAI.
Troubleshooting
- If you get
API key missingerrors, ensureOPENAI_API_KEYis set in your environment. - If retrieval returns empty results, verify your documents are loaded and indexed correctly.
- For slow responses, check your network and consider reducing
max_tokensor model complexity.
Key Takeaways
- Use vector stores like FAISS or Chroma to add persistent memory to LlamaIndex chat.
- Integrate the retriever with your LLM predictor to enable context-aware responses.
- Persist your index to disk to maintain memory across sessions.
- Adjust model and vector store choices based on your latency and accuracy needs.
- Always set your API keys securely via environment variables.