How to Intermediate · 3 min read

How to use CondensePlusContextChatEngine LlamaIndex

Quick answer
Use CondensePlusContextChatEngine from llama_index to build chatbots that condense user queries and retrieve relevant context for better responses. Initialize it with a service_context and a retriever, then call chat with user input to get context-aware answers.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install llama-index openai

Setup

Install the llama-index package and set your OpenAI API key as an environment variable.

  • Run pip install llama-index openai
  • Set environment variable OPENAI_API_KEY with your OpenAI API key
bash
pip install llama-index openai

Step by step

This example shows how to create a CondensePlusContextChatEngine with a simple document index and OpenAI GPT-4o model for chat completions.

python
import os
from llama_index import (
    SimpleDirectoryReader,
    GPTVectorStoreIndex,
    ServiceContext,
    LLMPredictor,
    PromptHelper,
    CondensePlusContextChatEngine
)
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Setup LLM predictor using OpenAI GPT-4o
llm_predictor = LLMPredictor(
    llm=lambda prompt: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
)

# Define prompt helper parameters
prompt_helper = PromptHelper(max_input_size=4096, num_output=512, max_chunk_overlap=20)

# Load documents from a directory (replace 'docs/' with your folder)
documents = SimpleDirectoryReader('docs/').load_data()

# Create vector store index
index = GPTVectorStoreIndex(
    documents,
    llm_predictor=llm_predictor,
    prompt_helper=prompt_helper
)

# Create service context
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    prompt_helper=prompt_helper
)

# Create CondensePlusContextChatEngine
chat_engine = CondensePlusContextChatEngine(
    index=index,
    service_context=service_context
)

# Chat with context
response = chat_engine.chat("What are the main points from the documents?")
print("Response:", response.response)
output
Response: <context-aware answer based on documents>

Common variations

You can customize the CondensePlusContextChatEngine by using different LLM models, async calls, or streaming responses.

  • Use gpt-4o or gpt-4o-mini models by changing the llm_predictor setup.
  • For async usage, integrate with async OpenAI client calls.
  • Adjust PromptHelper parameters for different chunk sizes or overlaps.
python
from openai import OpenAI
import asyncio

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_chat():
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello async chat"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_chat())
output
Hello async chat

Troubleshooting

  • If you get API key missing errors, ensure OPENAI_API_KEY is set in your environment.
  • If responses are empty or irrelevant, check your document loading path and indexing.
  • For rate limits, consider reducing max_tokens or upgrading your OpenAI plan.

Key Takeaways

  • Use CondensePlusContextChatEngine to combine query condensation and context retrieval for better chat responses.
  • Initialize with a service_context and a document index to enable context-aware chat.
  • Customize LLM models and prompt parameters to optimize performance and cost.
  • Always set your OpenAI API key in environment variables to avoid authentication errors.
Verified 2026-04 · gpt-4o
Verify ↗