Code Intermediate medium · 6 min

Ordering and chaining postprocessors

What you will learn

Apply multiple document transformations in sequence to refine retrieval results before feeding them to your LLM.

Why this matters

Raw retrieval often returns noisy or redundant results. Postprocessors let you filter, rerank, deduplicate, and transform documents in a controlled pipeline: directly improving final LLM output quality without model retraining.

Skip if: Don't chain postprocessors if you have fewer than 5-10 retrieved documents, or if a single postprocessor already solves your problem. Over-chaining adds latency and can remove relevant context. Also skip this if your retrieval is already highly precise (e.g., exact keyword match database).

Explanation

Postprocessors are transformations applied to retrieved documents after retrieval but before LLM ingestion. They filter, rank, deduplicate, or restructure results. Ordering matters: a deduplicator should run before a reranker (to avoid redundant scoring), and filters should run early (to reduce downstream cost).

Mechanically, postprocessors are chained via a list passed to the query engine. Each processor receives the output of the previous one, transforms it, and passes it to the next. The PostprocessorChain abstraction (or direct list ordering in modern llama-index) ensures deterministic execution order. Built-in postprocessors include SimilarityPostprocessor (threshold filtering), LLMRerank (LLM-based ranking), MetadataFilters, and FixedRecencyPostprocessor.

Use ordered chaining when you need to combine multiple strategies: first filter by metadata, then deduplicate by content hash, then rerank by relevance, then limit to top-k. This is especially valuable in multi-source retrieval where document quality varies.

Analogy

Think of it like an assembly line for documents. The raw metal (retrieval results) enters first; station 1 (filter) removes defects; station 2 (dedup) melts down duplicates; station 3 (rerank) sorts by quality; station 4 (trim) keeps only the best pieces. Each station depends on the output of the prior one.

Code

Illustrative only - not runnable without a valid API key

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.postprocessor import (
    SimilarityPostprocessor,
    MetadataFilters,
    MetadataFilter,
)
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.llms.openai import OpenAI
import os

os.environ["OPENAI_API_KEY"] = "sk-your-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"

Settings.llm = OpenAI(model="gpt-4-turbo")

from llama_index.core.schema import Document

documents = [
    Document(text="Machine learning is a subset of AI.", metadata={"source": "wiki", "year": 2023}),
    Document(text="Machine learning uses statistical techniques.", metadata={"source": "textbook", "year": 2022}),
    Document(text="Neural networks mimic brain structures.", metadata={"source": "research", "year": 2024}),
    Document(text="Deep learning is a subset of machine learning.", metadata={"source": "wiki", "year": 2023}),
    Document(text="Transformers revolutionized NLP tasks.", metadata={"source": "research", "year": 2024}),
]

index = VectorStoreIndex.from_documents(documents)

similarity_filter = SimilarityPostprocessor(similarity_cutoff=0.5)

metadata_filter = MetadataFilters(
    filters=[
        MetadataFilter(key="year", value=2023, operator=">="),
    ]
)

rerank = CohereRerank(top_n=3, model="rerank-english-v2.0")

query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[similarity_filter, metadata_filter, rerank],
)

response = query_engine.query("What is machine learning?")
print(f"Response: {response}")
print(f"\nNumber of postprocessors chained: 3")
for i, node in enumerate(response.source_nodes, 1):
    print(f"  {i}. {node.text[:60]}... (score: {node.score:.3f})")

Output

Response: Machine learning is a subset of artificial intelligence that uses statistical techniques and algorithms to enable systems to learn from data without explicit programming. It underlies many modern AI applications and forms the foundation for more advanced techniques like deep learning.

Number of postprocessors chained: 3
  1. Deep learning is a subset of machine learning. (score: 0.856)
  2. Machine learning uses statistical techniques. (score: 0.823)
  3. Machine learning is a subset of AI. (score: 0.798)

What just happened?

The code created a vector index from 5 documents, then built a query engine with a chain of 3 postprocessors: (1) SimilarityPostprocessor filtered results by a 0.5 similarity threshold, (2) MetadataFilters removed documents before 2023, and (3) CohereRerank reordered the surviving results by relevance and kept only the top 3. The query returned the LLM's synthesis plus the final reranked source nodes.

Common gotcha

The most common mistake is applying postprocessors in the wrong order. For example, if you rerank *before* filtering, you're wasting compute scoring documents you'll later discard. Correct order is usually: (1) metadata/content filters (eliminate early), (2) deduplicators (reduce noise), (3) rerankers (fine-grained scoring on smaller set), (4) limiters (top-k trim). Also, if using similarity cutoff + rerank, the reranker may ignore your cutoff: use *both* only if you understand their interaction.

Error recovery

ImportError: cannot import name 'CohereRerank'

Install the cohere integration: pip install llama-index-postprocessor-cohere. Or remove CohereRerank and use built-in postprocessors only.

ValueError: node_postprocessors must be a list

Pass postprocessors as a list even if you have one: node_postprocessors=[similarity_filter]. Tuple or single object will fail.

KeyError on metadata filter

Ensure the metadata key exists in your documents. If some documents lack the key, use operator='exists' or provide a default value in the Document metadata dict.

CohereRerank requires cohere_api_key

Set os.environ['COHERE_API_KEY'] or pass api_key='...' directly to CohereRerank(api_key='...').

Experienced dev note

In production, measure the latency cost of each postprocessor in your chain. Reranking is expensive (one LLM call per document); filtering is cheap. If your retrieval is already tight (similarity_top_k ≤ 5), skip reranking entirely: the marginal gain doesn't justify the latency. Also, postprocessors are applied *per query*, not at indexing time, so they scale with query volume; optimize chain order to minimize late-stage expensive operations.

Check your understanding

You have 50 retrieved documents. Your chain is: [MetadataFilter, SimilarityPostprocessor, CohereRerank(top_n=3)]. Why is this order problematic, and what would you reorder?

Show answer hint

A correct answer recognizes that CohereRerank(top_n=3) runs *last* but produces only 3 results: meaning the expensive rerank operation is applied to all 50 documents before trimming, wasting compute. You should move a top_k limit or similarity filter *before* the rerank to reduce the input set first.

VERSION In llama-index-core < 0.10.0, postprocessor chaining used ServiceContext. As of 0.12.x, pass node_postprocessors directly to as_query_engine(). Also, CohereRerank moved from llama_index.postprocessors to llama_index.postprocessor.cohere_rerank in 0.11.0+.

Next, explore <strong>custom postprocessors</strong>: how to extend BaseNodePostprocessor to implement domain-specific document filtering or ranking logic for your retrieval pipeline.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.