How to use response synthesizer in LlamaIndex
Quick answer
Use the
ResponseSynthesizer in LlamaIndex to aggregate and refine multiple document query responses into a single coherent answer. Instantiate it with a language model predictor and optionally a prompt, then call response_synthesizer.synthesize_responses() with a list of individual responses.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install llama-index openai
Setup
Install llama-index and set your OpenAI API key in the environment variables.
- Run
pip install llama-index openai - Set your API key:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install llama-index openai Step by step
This example shows how to use ResponseSynthesizer to combine multiple document responses into one synthesized answer using OpenAI GPT-4o model.
import os
from llama_index import (
SimpleDirectoryReader,
GPTVectorStoreIndex,
ResponseSynthesizer,
LLMPredictor,
Prompt
)
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Setup LLM predictor with GPT-4o
llm_predictor = LLMPredictor(
llm=lambda prompt: client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}])
)
# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()
# Create an index
index = GPTVectorStoreIndex(documents, llm_predictor=llm_predictor)
# Query the index to get multiple responses (simulate multiple docs)
query = "Explain the benefits of renewable energy."
response_1 = index.query(query, response_mode="default")
response_2 = index.query(query, response_mode="default")
# Initialize ResponseSynthesizer with the same LLM predictor
response_synthesizer = ResponseSynthesizer(llm_predictor=llm_predictor)
# Synthesize multiple responses into one
final_response = response_synthesizer.synthesize_responses(
responses=[response_1, response_2]
)
print("Synthesized response:\n", final_response) output
Synthesized response: Renewable energy offers numerous benefits including reducing greenhouse gas emissions, decreasing dependence on fossil fuels, and promoting sustainable development.
Common variations
You can customize the ResponseSynthesizer by providing a custom prompt template or using different LLM models like gpt-4o-mini for faster, cheaper synthesis. Async usage is also supported by adapting to async LLM clients.
from llama_index import ResponseSynthesizer, LLMPredictor, Prompt
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
llm_predictor = LLMPredictor(
llm=lambda prompt: client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}])
)
custom_prompt = Prompt(
template="""
You are a helpful assistant that synthesizes multiple answers into one concise response.
Answers:
{responses}
"""
)
response_synthesizer = ResponseSynthesizer(
llm_predictor=llm_predictor,
prompt=custom_prompt
)
# Use as before with response_synthesizer.synthesize_responses([...]) Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If synthesis results are poor, try increasing the LLM max tokens or refining your prompt.
- For slow responses, consider using smaller models like
gpt-4o-mini.
Key Takeaways
- Use
ResponseSynthesizerto combine multiple document responses into a single coherent answer. - Customize synthesis with prompts and different LLM models for cost and speed trade-offs.
- Always set your API key in environment variables to avoid authentication issues.