How to Intermediate · 3 min read

How to use response synthesizer in LlamaIndex

Q: How to use response synthesizer in LlamaIndex

Use the ResponseSynthesizer in LlamaIndex to aggregate and refine multiple document query responses into a single coherent answer. Instantiate it with a language model predictor and optionally a prompt, then call response_synthesizer.synthesize_responses() with a list of individual responses.

Quick answer

Use the ResponseSynthesizer in LlamaIndex to aggregate and refine multiple document query responses into a single coherent answer. Instantiate it with a language model predictor and optionally a prompt, then call response_synthesizer.synthesize_responses() with a list of individual responses.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install llama-index openai

Setup

Install llama-index and set your OpenAI API key in the environment variables.

Run pip install llama-index openai
Set your API key: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install llama-index openai

Step by step

This example shows how to use ResponseSynthesizer to combine multiple document responses into one synthesized answer using OpenAI GPT-4o model.

python

import os
from llama_index import (
    SimpleDirectoryReader,
    GPTVectorStoreIndex,
    ResponseSynthesizer,
    LLMPredictor,
    Prompt
)
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Setup LLM predictor with GPT-4o
llm_predictor = LLMPredictor(
    llm=lambda prompt: client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}])
)

# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# Create an index
index = GPTVectorStoreIndex(documents, llm_predictor=llm_predictor)

# Query the index to get multiple responses (simulate multiple docs)
query = "Explain the benefits of renewable energy."
response_1 = index.query(query, response_mode="default")
response_2 = index.query(query, response_mode="default")

# Initialize ResponseSynthesizer with the same LLM predictor
response_synthesizer = ResponseSynthesizer(llm_predictor=llm_predictor)

# Synthesize multiple responses into one
final_response = response_synthesizer.synthesize_responses(
    responses=[response_1, response_2]
)

print("Synthesized response:\n", final_response)

output

Synthesized response:
 Renewable energy offers numerous benefits including reducing greenhouse gas emissions, decreasing dependence on fossil fuels, and promoting sustainable development.

Common variations

You can customize the ResponseSynthesizer by providing a custom prompt template or using different LLM models like gpt-4o-mini for faster, cheaper synthesis. Async usage is also supported by adapting to async LLM clients.

python

from llama_index import ResponseSynthesizer, LLMPredictor, Prompt
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

llm_predictor = LLMPredictor(
    llm=lambda prompt: client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}])
)

custom_prompt = Prompt(
    template="""
    You are a helpful assistant that synthesizes multiple answers into one concise response.
    Answers:
    {responses}
    """
)

response_synthesizer = ResponseSynthesizer(
    llm_predictor=llm_predictor,
    prompt=custom_prompt
)

# Use as before with response_synthesizer.synthesize_responses([...])

Troubleshooting

If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
If synthesis results are poor, try increasing the LLM max tokens or refining your prompt.
For slow responses, consider using smaller models like gpt-4o-mini.

✅

Key Takeaways

Use ResponseSynthesizer to combine multiple document responses into a single coherent answer.
Customize synthesis with prompts and different LLM models for cost and speed trade-offs.
Always set your API key in environment variables to avoid authentication issues.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗