Ordering and chaining postprocessors
Why this matters
Raw retrieval often returns noisy or redundant results. Postprocessors let you filter, rerank, deduplicate, and transform documents in a controlled pipeline: directly improving final LLM output quality without model retraining.
Explanation
Postprocessors are transformations applied to retrieved documents after retrieval but before LLM ingestion. They filter, rank, deduplicate, or restructure results. Ordering matters: a deduplicator should run before a reranker (to avoid redundant scoring), and filters should run early (to reduce downstream cost).
Mechanically, postprocessors are chained via a list passed to the query engine. Each processor receives the output of the previous one, transforms it, and passes it to the next. The PostprocessorChain abstraction (or direct list ordering in modern llama-index) ensures deterministic execution order. Built-in postprocessors include SimilarityPostprocessor (threshold filtering), LLMRerank (LLM-based ranking), MetadataFilters, and FixedRecencyPostprocessor.
Use ordered chaining when you need to combine multiple strategies: first filter by metadata, then deduplicate by content hash, then rerank by relevance, then limit to top-k. This is especially valuable in multi-source retrieval where document quality varies.
Analogy
Think of it like an assembly line for documents. The raw metal (retrieval results) enters first; station 1 (filter) removes defects; station 2 (dedup) melts down duplicates; station 3 (rerank) sorts by quality; station 4 (trim) keeps only the best pieces. Each station depends on the output of the prior one.
Code
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.postprocessor import (
SimilarityPostprocessor,
MetadataFilters,
MetadataFilter,
)
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.llms.openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "sk-your-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
Settings.llm = OpenAI(model="gpt-4-turbo")
from llama_index.core.schema import Document
documents = [
Document(text="Machine learning is a subset of AI.", metadata={"source": "wiki", "year": 2023}),
Document(text="Machine learning uses statistical techniques.", metadata={"source": "textbook", "year": 2022}),
Document(text="Neural networks mimic brain structures.", metadata={"source": "research", "year": 2024}),
Document(text="Deep learning is a subset of machine learning.", metadata={"source": "wiki", "year": 2023}),
Document(text="Transformers revolutionized NLP tasks.", metadata={"source": "research", "year": 2024}),
]
index = VectorStoreIndex.from_documents(documents)
similarity_filter = SimilarityPostprocessor(similarity_cutoff=0.5)
metadata_filter = MetadataFilters(
filters=[
MetadataFilter(key="year", value=2023, operator=">="),
]
)
rerank = CohereRerank(top_n=3, model="rerank-english-v2.0")
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[similarity_filter, metadata_filter, rerank],
)
response = query_engine.query("What is machine learning?")
print(f"Response: {response}")
print(f"\nNumber of postprocessors chained: 3")
for i, node in enumerate(response.source_nodes, 1):
print(f" {i}. {node.text[:60]}... (score: {node.score:.3f})") Response: Machine learning is a subset of artificial intelligence that uses statistical techniques and algorithms to enable systems to learn from data without explicit programming. It underlies many modern AI applications and forms the foundation for more advanced techniques like deep learning. Number of postprocessors chained: 3 1. Deep learning is a subset of machine learning. (score: 0.856) 2. Machine learning uses statistical techniques. (score: 0.823) 3. Machine learning is a subset of AI. (score: 0.798)
What just happened?
The code created a vector index from 5 documents, then built a query engine with a chain of 3 postprocessors: (1) SimilarityPostprocessor filtered results by a 0.5 similarity threshold, (2) MetadataFilters removed documents before 2023, and (3) CohereRerank reordered the surviving results by relevance and kept only the top 3. The query returned the LLM's synthesis plus the final reranked source nodes.
Common gotcha
The most common mistake is applying postprocessors in the wrong order. For example, if you rerank *before* filtering, you're wasting compute scoring documents you'll later discard. Correct order is usually: (1) metadata/content filters (eliminate early), (2) deduplicators (reduce noise), (3) rerankers (fine-grained scoring on smaller set), (4) limiters (top-k trim). Also, if using similarity cutoff + rerank, the reranker may ignore your cutoff: use *both* only if you understand their interaction.
Error recovery
ImportError: cannot import name 'CohereRerank'ValueError: node_postprocessors must be a listKeyError on metadata filterCohereRerank requires cohere_api_keyExperienced dev note
In production, measure the latency cost of each postprocessor in your chain. Reranking is expensive (one LLM call per document); filtering is cheap. If your retrieval is already tight (similarity_top_k ≤ 5), skip reranking entirely: the marginal gain doesn't justify the latency. Also, postprocessors are applied *per query*, not at indexing time, so they scale with query volume; optimize chain order to minimize late-stage expensive operations.
Check your understanding
You have 50 retrieved documents. Your chain is: [MetadataFilter, SimilarityPostprocessor, CohereRerank(top_n=3)]. Why is this order problematic, and what would you reorder?
Show answer hint
A correct answer recognizes that CohereRerank(top_n=3) runs *last* but produces only 3 results: meaning the expensive rerank operation is applied to all 50 documents before trimming, wasting compute. You should move a top_k limit or similarity filter *before* the rerank to reduce the input set first.