Intermediate Course

LlamaIndex Intermediate

49 lessons across 7 chapters. Every lesson is standalone — start anywhere.

49 lessons 7 chapters

1 Response Synthesizers 7 lessons

What response synthesis does Response synthesis combines retrieved context chunks into a single coherent answer using an LLM.

get_response_synthesizer(): the factory get_response_synthesizer() is a factory function that creates the right response generation strategy for your query engine without forcing you to know the implementation details.

Refine mode: iterative answer building Refine mode builds answers by synthesizing context from multiple document chunks sequentially, useful when you need the LLM to reconcile conflicting information or build understanding progressively.

Compact mode: fitting context efficiently Compact mode intelligently selects the most relevant nodes from your index to fit within token limits without losing answer quality.

Tree summarize: hierarchical synthesis Tree summarize recursively condenses documents by building a hierarchy of summaries, letting you process hundreds of pages without token explosion.

No text mode: retrieval only Configure LlamaIndex to retrieve documents without synthesizing text responses, useful when you need raw search results or custom post-processing.

Choosing the right synthesizer for your use case Different synthesizers transform retrieved context into responses differently: pick the one that matches your latency, accuracy, and cost constraints.

2 Advanced Retrieval 7 lessons

BM25Retriever: keyword-based retrieval BM25Retriever performs keyword-based document ranking without embeddings, using statistical term relevance scoring.

Hybrid retrieval: combining BM25 and vector Hybrid retrieval combines keyword-based (BM25) and semantic (vector) search to get the best of both ranking strategies.

HyDE: hypothetical document embeddings HyDE generates synthetic documents from queries before embedding them, improving retrieval by bridging the vocabulary gap between questions and stored documents.

Auto-merging retrieval: parent-child nodes Split documents into small chunks for retrieval but automatically merge them with parent context during answer generation for better relevance.

Sentence window retrieval: surrounding context Retrieve not just the matching sentence, but the sentences immediately before and after it to give the LLM richer context.

Recursive retrieval: following node relationships Use recursive retrieval to automatically follow parent-child node relationships and fetch additional context during query answering.

Ensemble retrieval: voting across multiple retrievers Combine results from multiple retrievers and use voting or ranking to return the best candidates, improving recall and relevance.

3 Node Postprocessors 7 lessons

What node postprocessors do Node postprocessors filter, rerank, and transform retrieved nodes before they reach your LLM.

SimilarityPostprocessor: filtering by score threshold Filter retrieved nodes by their similarity score to eliminate low-confidence matches before passing them to the LLM.

KeywordNodePostprocessor: keyword filtering Filter retrieved nodes by requiring or excluding specific keywords to improve relevance without re-querying.

LLMRerank: LLM-based reranking Use an LLM to intelligently re-order retrieved documents by relevance before passing them to your final query.

SentenceTransformerRerank: embedding-based reranking Re-rank retrieved documents using semantic similarity instead of relying on initial retrieval scores.

Ordering and chaining postprocessors Apply multiple document transformations in sequence to refine retrieval results before feeding them to your LLM.

Building a custom postprocessor Postprocessors filter, rank, or transform retrieved nodes before they reach your LLM, letting you enforce quality gates on RAG results.

4 Chat Engines 7 lessons

Chat engine vs query engine: stateful vs stateless QueryEngine answers single questions statelessly; ChatEngine maintains conversation history for multi-turn dialogue.

as_chat_engine(): the simple pattern Convert any index into a conversational interface that maintains chat history automatically.

CondenseQuestionChatEngine: reformulating follow-ups Automatically rewrite user follow-up questions to include context from the conversation history so your retriever understands what 'it' refers to.

ContextChatEngine: with memory and retrieval ContextChatEngine combines a retrieval index with multi-turn conversation memory to answer questions grounded in your documents.

Managing chat history size Control memory usage and API costs in conversational RAG by truncating or summarizing chat history before each query.

Resetting a conversation Clear conversation history and memory in a chat context to prevent context drift or token waste on irrelevant prior exchanges.

Streaming chat responses Stream LLM responses token-by-token instead of waiting for the complete response, enabling real-time output to users.

5 Ingestion Pipelines 7 lessons

What an ingestion pipeline solves An ingestion pipeline transforms raw documents into indexed, queryable vectors by automating parsing, chunking, embedding, and storage in a single reproducible workflow.

IngestionPipeline: the processing graph IngestionPipeline lets you chain together document transformations (chunking, embedding, cleaning) once and reuse them on any document batch.

Transformations: splitters, embedders, metadata Transformations are pipeline stages that process documents before indexing: splitting text, computing embeddings, and attaching metadata to control retrieval behavior.

Caching transformations: avoiding re-processing Use llama-index transformation caching to skip redundant document processing and cut embedding/parsing costs.

Async ingestion for large document sets Use async document loading and batch processing to ingest thousands of documents without blocking your application.

Deduplication: handling duplicate documents Remove duplicate documents from your index before ingestion to avoid wasting vector storage and degrading retrieval quality.

Pipeline persistence and reuse Save and reload your entire index pipeline: documents, embeddings, retrievers: as a single serializable artifact to avoid recomputing embeddings and rebuilding indexes.

6 Router Query Engines 7 lessons

What routing solves: multi-index queries Routing automatically directs queries to the right index when you have multiple specialized data sources.

RouterQueryEngine: query routing pattern Route different query types to specialized query engines based on the question content.

LLM-based selector: choosing by reasoning Use an LLM to intelligently select which tool or data source to use by reasoning about the query, rather than using keyword matching or fixed rules.

Embedding-based selector: choosing by similarity Use embedding similarity to automatically select the best query engine or retriever from multiple options based on what your query is actually about.

Summary vs detailed index routing Choose between summarizing all documents into a single index node or keeping detailed per-document metadata to route queries intelligently.

Tool descriptions: guiding routing decisions Write precise tool descriptions to guide an agent's decision about which tool to call, not just what the tool does.

Fallback routing when no match Use fallback logic to handle queries that don't match any routing condition in a LlamaIndex selector.

7 Evaluation 7 lessons

Why evaluating RAG pipelines matters RAG systems can silently fail by retrieving irrelevant documents or hallucinating answers: you need metrics to catch these before production.

FaithfulnessEvaluator: hallucination detection Use FaithfulnessEvaluator to detect when an LLM generates plausible-sounding answers that contradict the source documents.

RelevancyEvaluator: context relevance check Use RelevancyEvaluator to programmatically score whether retrieved documents actually answer your query.

CorrectnessEvaluator: answer accuracy CorrectnessEvaluator measures whether your RAG retrieval answers are factually accurate by comparing generated responses against reference answers.

Dataset generation for evaluation Generate synthetic query-answer pairs from your documents to evaluate RAG system quality without manual labeling.

Batch evaluation across many queries Process multiple evaluation queries at once using BatchEvalRunner to measure retrieval and generation quality efficiently across your entire dataset.

Using evaluation results to improve retrieval Use retrieval evaluators to identify weak queries and iteratively improve your RAG system's ranking and filtering strategy.