Intermediate Course
LlamaIndex Intermediate
49 lessons across 7 chapters. Every lesson is standalone — start anywhere.
49 lessons 7 chapters
1 Response Synthesizers 7 lessons
1
What response synthesis does Response synthesis combines retrieved context chunks into a single coherent answer using an LLM.
2 get_response_synthesizer(): the factory get_response_synthesizer() is a factory function that creates the right response generation strategy for your query engine without forcing you to know the implementation details.
3 Refine mode: iterative answer building Refine mode builds answers by synthesizing context from multiple document chunks sequentially, useful when you need the LLM to reconcile conflicting information or build understanding progressively.
4 Compact mode: fitting context efficiently Compact mode intelligently selects the most relevant nodes from your index to fit within token limits without losing answer quality.
5 Tree summarize: hierarchical synthesis Tree summarize recursively condenses documents by building a hierarchy of summaries, letting you process hundreds of pages without token explosion.
6 No text mode: retrieval only Configure LlamaIndex to retrieve documents without synthesizing text responses, useful when you need raw search results or custom post-processing.
7 Choosing the right synthesizer for your use case Different synthesizers transform retrieved context into responses differently: pick the one that matches your latency, accuracy, and cost constraints.
2 Advanced Retrieval 7 lessons
1
BM25Retriever: keyword-based retrieval BM25Retriever performs keyword-based document ranking without embeddings, using statistical term relevance scoring.
2 Hybrid retrieval: combining BM25 and vector Hybrid retrieval combines keyword-based (BM25) and semantic (vector) search to get the best of both ranking strategies.
3 HyDE: hypothetical document embeddings HyDE generates synthetic documents from queries before embedding them, improving retrieval by bridging the vocabulary gap between questions and stored documents.
4 Auto-merging retrieval: parent-child nodes Split documents into small chunks for retrieval but automatically merge them with parent context during answer generation for better relevance.
5 Sentence window retrieval: surrounding context Retrieve not just the matching sentence, but the sentences immediately before and after it to give the LLM richer context.
6 Recursive retrieval: following node relationships Use recursive retrieval to automatically follow parent-child node relationships and fetch additional context during query answering.
7 Ensemble retrieval: voting across multiple retrievers Combine results from multiple retrievers and use voting or ranking to return the best candidates, improving recall and relevance.
3 Node Postprocessors 7 lessons
1
What node postprocessors do Node postprocessors filter, rerank, and transform retrieved nodes before they reach your LLM.
2 SimilarityPostprocessor: filtering by score threshold Filter retrieved nodes by their similarity score to eliminate low-confidence matches before passing them to the LLM.
3 KeywordNodePostprocessor: keyword filtering Filter retrieved nodes by requiring or excluding specific keywords to improve relevance without re-querying.
4 LLMRerank: LLM-based reranking Use an LLM to intelligently re-order retrieved documents by relevance before passing them to your final query.
5 SentenceTransformerRerank: embedding-based reranking Re-rank retrieved documents using semantic similarity instead of relying on initial retrieval scores.
6 Ordering and chaining postprocessors Apply multiple document transformations in sequence to refine retrieval results before feeding them to your LLM.
7 Building a custom postprocessor Postprocessors filter, rank, or transform retrieved nodes before they reach your LLM, letting you enforce quality gates on RAG results.
4 Chat Engines 7 lessons
1
Chat engine vs query engine: stateful vs stateless QueryEngine answers single questions statelessly; ChatEngine maintains conversation history for multi-turn dialogue.
2 as_chat_engine(): the simple pattern Convert any index into a conversational interface that maintains chat history automatically.
3 CondenseQuestionChatEngine: reformulating follow-ups Automatically rewrite user follow-up questions to include context from the conversation history so your retriever understands what 'it' refers to.
4 ContextChatEngine: with memory and retrieval ContextChatEngine combines a retrieval index with multi-turn conversation memory to answer questions grounded in your documents.
5 Managing chat history size Control memory usage and API costs in conversational RAG by truncating or summarizing chat history before each query.
6 Resetting a conversation Clear conversation history and memory in a chat context to prevent context drift or token waste on irrelevant prior exchanges.
7 Streaming chat responses Stream LLM responses token-by-token instead of waiting for the complete response, enabling real-time output to users.
5 Ingestion Pipelines 7 lessons
1
What an ingestion pipeline solves An ingestion pipeline transforms raw documents into indexed, queryable vectors by automating parsing, chunking, embedding, and storage in a single reproducible workflow.
2 IngestionPipeline: the processing graph IngestionPipeline lets you chain together document transformations (chunking, embedding, cleaning) once and reuse them on any document batch.
3 Transformations: splitters, embedders, metadata Transformations are pipeline stages that process documents before indexing: splitting text, computing embeddings, and attaching metadata to control retrieval behavior.
4 Caching transformations: avoiding re-processing Use llama-index transformation caching to skip redundant document processing and cut embedding/parsing costs.
5 Async ingestion for large document sets Use async document loading and batch processing to ingest thousands of documents without blocking your application.
6 Deduplication: handling duplicate documents Remove duplicate documents from your index before ingestion to avoid wasting vector storage and degrading retrieval quality.
7 Pipeline persistence and reuse Save and reload your entire index pipeline: documents, embeddings, retrievers: as a single serializable artifact to avoid recomputing embeddings and rebuilding indexes.
6 Router Query Engines 7 lessons
1
What routing solves: multi-index queries Routing automatically directs queries to the right index when you have multiple specialized data sources.
2 RouterQueryEngine: query routing pattern Route different query types to specialized query engines based on the question content.
3 LLM-based selector: choosing by reasoning Use an LLM to intelligently select which tool or data source to use by reasoning about the query, rather than using keyword matching or fixed rules.
4 Embedding-based selector: choosing by similarity Use embedding similarity to automatically select the best query engine or retriever from multiple options based on what your query is actually about.
5 Summary vs detailed index routing Choose between summarizing all documents into a single index node or keeping detailed per-document metadata to route queries intelligently.
6 Tool descriptions: guiding routing decisions Write precise tool descriptions to guide an agent's decision about which tool to call, not just what the tool does.
7 Fallback routing when no match Use fallback logic to handle queries that don't match any routing condition in a LlamaIndex selector.
7 Evaluation 7 lessons
1
Why evaluating RAG pipelines matters RAG systems can silently fail by retrieving irrelevant documents or hallucinating answers: you need metrics to catch these before production.
2 FaithfulnessEvaluator: hallucination detection Use FaithfulnessEvaluator to detect when an LLM generates plausible-sounding answers that contradict the source documents.
3 RelevancyEvaluator: context relevance check Use RelevancyEvaluator to programmatically score whether retrieved documents actually answer your query.
4 CorrectnessEvaluator: answer accuracy CorrectnessEvaluator measures whether your RAG retrieval answers are factually accurate by comparing generated responses against reference answers.
5 Dataset generation for evaluation Generate synthetic query-answer pairs from your documents to evaluate RAG system quality without manual labeling.
6 Batch evaluation across many queries Process multiple evaluation queries at once using BatchEvalRunner to measure retrieval and generation quality efficiently across your entire dataset.
7 Using evaluation results to improve retrieval Use retrieval evaluators to identify weak queries and iteratively improve your RAG system's ranking and filtering strategy.