How to use chat engine in LlamaIndex
Quick answer
Use the
llama_index library's ChatEngine class to build conversational AI by loading documents into an index and querying it with chat messages. Initialize the chat engine with an index and an LLM predictor like OpenAI to handle chat completions.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install llama_index openai
Setup
Install the llama_index and openai Python packages and set your OpenAI API key as an environment variable.
- Run
pip install llama_index openai - Set environment variable
OPENAI_API_KEYwith your OpenAI API key
pip install llama_index openai Step by step
This example shows how to create a simple chat engine using llama_index with an OpenAI LLM predictor. It loads documents, builds an index, and then uses the chat engine to answer queries conversationally.
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, ServiceContext, ChatEngine
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Setup LLM predictor using OpenAI chat completions
llm_predictor = LLMPredictor(llm=lambda prompt: client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}]))
# Load documents from a directory
documents = SimpleDirectoryReader('data').load_data()
# Create an index from documents
index = GPTVectorStoreIndex.from_documents(documents, llm_predictor=llm_predictor)
# Create a chat engine from the index
chat_engine = ChatEngine(index=index)
# Chat with the engine
response = chat_engine.chat("What is the main topic of the documents?")
print("Response:", response.response) output
Response: The main topic of the documents is about LlamaIndex usage and chat engine integration.
Common variations
- Use different LLM models by changing the
OpenAIclient model parameter. - Use asynchronous calls if your environment supports async.
- Customize the chat engine behavior by passing parameters like
chat_modeorservice_context.
from llama_index import LLMPredictor, ServiceContext, ChatEngine
from openai import OpenAI
import asyncio
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
llm_predictor = LLMPredictor(llm=lambda prompt: client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}]))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# Async example
async def async_chat():
chat_engine = ChatEngine(index=index, service_context=service_context)
response = await chat_engine.achat("Explain LlamaIndex chat engine.")
print("Async response:", response.response)
asyncio.run(async_chat()) output
Async response: The LlamaIndex chat engine enables conversational querying over your document index using LLMs.
Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If the chat engine returns empty or irrelevant responses, ensure your documents are loaded properly and the index is built.
- For rate limits or timeouts, consider using smaller documents or batching queries.
Key Takeaways
- Use
ChatEnginefromllama_indexto enable conversational AI over document indexes. - Initialize
LLMPredictorwith an OpenAI client for chat completions. - Load documents and build an index before creating the chat engine.
- Customize chat behavior with
service_contextand different LLM models. - Check environment variables and document loading if responses are incorrect.