How to Intermediate · 4 min read

How to build conversational RAG with LlamaIndex

Quick answer
Use LlamaIndex to create an index over your documents, then integrate it with a chat model like gpt-4o to build a conversational Retrieval-Augmented Generation (RAG) system. Query the index with user inputs and generate responses that combine retrieved knowledge and language model generation.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install llama-index openai

Setup

Install llama-index and openai Python packages, and set your OpenAI API key as an environment variable.

bash
pip install llama-index openai

Step by step

This example shows how to build a conversational RAG system using LlamaIndex with OpenAI's gpt-4o model. It loads documents, creates an index, and then runs a chat loop that queries the index and generates conversational answers.

python
import os
from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader, LLMPredictor, PromptHelper
from openai import OpenAI

# Set up OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Load documents from a directory
documents = SimpleDirectoryReader("./docs").load_data()

# Configure LLM predictor using OpenAI's gpt-4o
llm_predictor = LLMPredictor(
    llm=lambda prompt: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content
)

# Create the vector index
index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor)

# Conversational query loop
print("Start chatting with your documents (type 'exit' to quit):")
while True:
    query = input("You: ")
    if query.lower() == "exit":
        break
    response = index.query(query)
    print(f"Bot: {response.response}")
output
Start chatting with your documents (type 'exit' to quit):
You: What is LlamaIndex?
Bot: LlamaIndex is a Python library that helps you build retrieval-augmented generation systems by creating indices over your documents and querying them with language models.
You: exit

Common variations

  • Use asynchronous calls with OpenAI SDK for better performance in web apps.
  • Switch to other LlamaIndex index types like GPTTreeIndex or GPTKeywordTableIndex for different retrieval strategies.
  • Use other chat models such as claude-3-5-sonnet-20241022 by adapting the LLM predictor accordingly.

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If document loading fails, check the path and file formats supported by SimpleDirectoryReader.
  • For slow queries, consider limiting max_tokens or using smaller models like gpt-4o-mini.

Key Takeaways

  • Use LlamaIndex to build a vector index over your documents for retrieval.
  • Integrate the index with OpenAI's gpt-4o model to generate conversational answers.
  • Customize index types and models to optimize retrieval and generation for your use case.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022, gpt-4o-mini
Verify ↗