How to Intermediate · 3 min read

How to use response synthesizer in LlamaIndex

Quick answer
Use the ResponseSynthesizer in LlamaIndex to aggregate and refine multiple document query responses into a single coherent answer. Instantiate it with a language model predictor and optionally a prompt, then call response_synthesizer.synthesize_responses() with a list of individual responses.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install llama-index openai

Setup

Install llama-index and set your OpenAI API key in the environment variables.

  • Run pip install llama-index openai
  • Set your API key: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install llama-index openai

Step by step

This example shows how to use ResponseSynthesizer to combine multiple document responses into one synthesized answer using OpenAI GPT-4o model.

python
import os
from llama_index import (
    SimpleDirectoryReader,
    GPTVectorStoreIndex,
    ResponseSynthesizer,
    LLMPredictor,
    Prompt
)
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Setup LLM predictor with GPT-4o
llm_predictor = LLMPredictor(
    llm=lambda prompt: client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}])
)

# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# Create an index
index = GPTVectorStoreIndex(documents, llm_predictor=llm_predictor)

# Query the index to get multiple responses (simulate multiple docs)
query = "Explain the benefits of renewable energy."
response_1 = index.query(query, response_mode="default")
response_2 = index.query(query, response_mode="default")

# Initialize ResponseSynthesizer with the same LLM predictor
response_synthesizer = ResponseSynthesizer(llm_predictor=llm_predictor)

# Synthesize multiple responses into one
final_response = response_synthesizer.synthesize_responses(
    responses=[response_1, response_2]
)

print("Synthesized response:\n", final_response)
output
Synthesized response:
 Renewable energy offers numerous benefits including reducing greenhouse gas emissions, decreasing dependence on fossil fuels, and promoting sustainable development.

Common variations

You can customize the ResponseSynthesizer by providing a custom prompt template or using different LLM models like gpt-4o-mini for faster, cheaper synthesis. Async usage is also supported by adapting to async LLM clients.

python
from llama_index import ResponseSynthesizer, LLMPredictor, Prompt
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

llm_predictor = LLMPredictor(
    llm=lambda prompt: client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}])
)

custom_prompt = Prompt(
    template="""
    You are a helpful assistant that synthesizes multiple answers into one concise response.
    Answers:
    {responses}
    """
)

response_synthesizer = ResponseSynthesizer(
    llm_predictor=llm_predictor,
    prompt=custom_prompt
)

# Use as before with response_synthesizer.synthesize_responses([...])

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If synthesis results are poor, try increasing the LLM max tokens or refining your prompt.
  • For slow responses, consider using smaller models like gpt-4o-mini.

Key Takeaways

  • Use ResponseSynthesizer to combine multiple document responses into a single coherent answer.
  • Customize synthesis with prompts and different LLM models for cost and speed trade-offs.
  • Always set your API key in environment variables to avoid authentication issues.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗