How to beginner · 3 min read

How to use chat engine in LlamaIndex

Quick answer
Use the llama_index library's ChatEngine class to build conversational AI by loading documents into an index and querying it with chat messages. Initialize the chat engine with an index and an LLM predictor like OpenAI to handle chat completions.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install llama_index openai

Setup

Install the llama_index and openai Python packages and set your OpenAI API key as an environment variable.

  • Run pip install llama_index openai
  • Set environment variable OPENAI_API_KEY with your OpenAI API key
bash
pip install llama_index openai

Step by step

This example shows how to create a simple chat engine using llama_index with an OpenAI LLM predictor. It loads documents, builds an index, and then uses the chat engine to answer queries conversationally.

python
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, ServiceContext, ChatEngine
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Setup LLM predictor using OpenAI chat completions
llm_predictor = LLMPredictor(llm=lambda prompt: client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}]))

# Load documents from a directory
documents = SimpleDirectoryReader('data').load_data()

# Create an index from documents
index = GPTVectorStoreIndex.from_documents(documents, llm_predictor=llm_predictor)

# Create a chat engine from the index
chat_engine = ChatEngine(index=index)

# Chat with the engine
response = chat_engine.chat("What is the main topic of the documents?")
print("Response:", response.response)
output
Response: The main topic of the documents is about LlamaIndex usage and chat engine integration.

Common variations

  • Use different LLM models by changing the OpenAI client model parameter.
  • Use asynchronous calls if your environment supports async.
  • Customize the chat engine behavior by passing parameters like chat_mode or service_context.
python
from llama_index import LLMPredictor, ServiceContext, ChatEngine
from openai import OpenAI
import asyncio

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
llm_predictor = LLMPredictor(llm=lambda prompt: client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}]))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# Async example
async def async_chat():
    chat_engine = ChatEngine(index=index, service_context=service_context)
    response = await chat_engine.achat("Explain LlamaIndex chat engine.")
    print("Async response:", response.response)

asyncio.run(async_chat())
output
Async response: The LlamaIndex chat engine enables conversational querying over your document index using LLMs.

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If the chat engine returns empty or irrelevant responses, ensure your documents are loaded properly and the index is built.
  • For rate limits or timeouts, consider using smaller documents or batching queries.

Key Takeaways

  • Use ChatEngine from llama_index to enable conversational AI over document indexes.
  • Initialize LLMPredictor with an OpenAI client for chat completions.
  • Load documents and build an index before creating the chat engine.
  • Customize chat behavior with service_context and different LLM models.
  • Check environment variables and document loading if responses are incorrect.
Verified 2026-04 · gpt-4o, OpenAI
Verify ↗