How to Intermediate · 3 min read

How to use CondensePlusContextChatEngine LlamaIndex

Q: How to use CondensePlusContextChatEngine LlamaIndex

Use CondensePlusContextChatEngine from llama_index to build chatbots that condense user queries and retrieve relevant context for better responses. Initialize it with a service_context and a retriever, then call chat with user input to get context-aware answers.

Quick answer

Use CondensePlusContextChatEngine from llama_index to build chatbots that condense user queries and retrieve relevant context for better responses. Initialize it with a service_context and a retriever, then call chat with user input to get context-aware answers.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install llama-index openai

Setup

Install the llama-index package and set your OpenAI API key as an environment variable.

Run pip install llama-index openai
Set environment variable OPENAI_API_KEY with your OpenAI API key

bash

pip install llama-index openai

Step by step

This example shows how to create a CondensePlusContextChatEngine with a simple document index and OpenAI GPT-4o model for chat completions.

python

import os
from llama_index import (
    SimpleDirectoryReader,
    GPTVectorStoreIndex,
    ServiceContext,
    LLMPredictor,
    PromptHelper,
    CondensePlusContextChatEngine
)
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Setup LLM predictor using OpenAI GPT-4o
llm_predictor = LLMPredictor(
    llm=lambda prompt: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
)

# Define prompt helper parameters
prompt_helper = PromptHelper(max_input_size=4096, num_output=512, max_chunk_overlap=20)

# Load documents from a directory (replace 'docs/' with your folder)
documents = SimpleDirectoryReader('docs/').load_data()

# Create vector store index
index = GPTVectorStoreIndex(
    documents,
    llm_predictor=llm_predictor,
    prompt_helper=prompt_helper
)

# Create service context
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    prompt_helper=prompt_helper
)

# Create CondensePlusContextChatEngine
chat_engine = CondensePlusContextChatEngine(
    index=index,
    service_context=service_context
)

# Chat with context
response = chat_engine.chat("What are the main points from the documents?")
print("Response:", response.response)

output

Response: <context-aware answer based on documents>

Common variations

You can customize the CondensePlusContextChatEngine by using different LLM models, async calls, or streaming responses.

Use gpt-4o or gpt-4o-mini models by changing the llm_predictor setup.
For async usage, integrate with async OpenAI client calls.
Adjust PromptHelper parameters for different chunk sizes or overlaps.

python

from openai import OpenAI
import asyncio

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_chat():
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello async chat"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_chat())

output

Hello async chat

Troubleshooting

If you get API key missing errors, ensure OPENAI_API_KEY is set in your environment.
If responses are empty or irrelevant, check your document loading path and indexing.
For rate limits, consider reducing max_tokens or upgrading your OpenAI plan.

✅

Key Takeaways

Use CondensePlusContextChatEngine to combine query condensation and context retrieval for better chat responses.
Initialize with a service_context and a document index to enable context-aware chat.
Customize LLM models and prompt parameters to optimize performance and cost.
Always set your OpenAI API key in environment variables to avoid authentication errors.

Verified 2026-04 · gpt-4o

Verify ↗