Code Beginner easy · 8 min

Core concepts: index, query engine, retriever

What you will learn

An <code>Index</code> stores your data, a <code>Retriever</code> finds relevant pieces, and a <code>QueryEngine</code> chains them together to answer questions.

Why this matters

These three components are the foundation of every RAG system you'll build with llamaindex. Understanding how they fit together is essential before you can customize retrieval, add filters, or optimize for production.

Skip if: You don't need a separate query engine if you're only building a vector search API with no LLM reasoning: use the retriever alone. You also don't need llamaindex at all if your data fits in a single context window and you're not doing retrieval.

Explanation

What it is: llamaindex provides three core abstractions. An Index is a data structure that organizes and stores your documents for efficient retrieval. A Retriever extracts the most relevant chunks from that index in response to a query. A QueryEngine wraps a retriever and an LLM together: it retrieves relevant context and then uses an LLM to synthesize an answer.

How it works mechanically: You feed documents into an Index (typically a VectorStoreIndex). When you call .as_retriever() on that index, you get a retriever that can find semantically similar chunks. When you call .as_query_engine(), llamaindex internally creates a retriever and wraps it with an LLM: so a single query call does retrieval + LLM synthesis automatically.

When to use each: Use a retriever directly when you only need ranked search results. Use a query engine when you need the LLM to reason over those results and produce prose answers. Think of the retriever as the "search" layer and the query engine as the "reasoning" layer.

Analogy

Imagine a library system. The <code>Index</code> is the catalog and organized shelves. The <code>Retriever</code> is the librarian who finds the most relevant books on your topic. The <code>QueryEngine</code> is the librarian who also reads those books and writes a custom report answering your specific question.

Code

Illustrative only - not runnable without a valid API key

python

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
    raise ValueError('OPENAI_API_KEY environment variable not set')

Settings.llm = OpenAI(api_key=api_key, model='gpt-4.1')
Settings.embed_model = OpenAIEmbedding(api_key=api_key, model='text-embedding-3-small')

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

retriever = index.as_retriever(similarity_top_k=2)
query_engine = index.as_query_engine(similarity_top_k=2)

retrieval_result = retriever.retrieve('What is machine learning?')
print('Retriever returned nodes:')
for node in retrieval_result:
    print(f'  - {node.get_content()[:60]}...')

query_result = query_engine.query('What is machine learning?')
print(f'\nQuery engine response:\n{query_result}')

Output

Retriever returned nodes:
  - Machine learning is a subset of artificial intell...
  - Supervised learning uses labeled data to train m...

Query engine response:
Machine learning is a computational approach where...

What just happened?

The code created an index from documents in a 'data' directory, extracted a retriever (which only fetches relevant chunks), and extracted a query engine (which retrieves chunks and passes them to an LLM). The retriever printed raw node content; the query engine printed a synthesized LLM response built from those same retrieved chunks.

Common gotcha

Developers often assume the retriever and query engine both call the LLM. They don't: the retriever returns raw chunks with no LLM processing. Only the query engine calls the LLM. If you use a retriever expecting natural language answers, you'll get unprocessed text snippets instead.

Error recovery

ImportError: cannot import name 'VectorStoreIndex'

You're using llama-index < 0.10.x and importing from the old API. Update to llama-index-core >= 0.12.x and use 'from llama_index.core import VectorStoreIndex'.

DirectoryNotFoundError: [Errno 2] No such file or directory: 'data'

You must create a 'data' directory with .txt or .pdf files before running this code. The path is relative to where you run the script from.

RateLimitError from OpenAI

Your API quota is exhausted or you hit rate limits. Reduce similarity_top_k or add a delay between calls. This is not a llamaindex bug.

Experienced dev note

The query engine is the convenience layer, but it hides what's actually happening. For production systems, separate your retriever and LLM calls explicitly: it gives you finer control over cost (you might not need to call the LLM every time), error handling, and observability. Also: similarity_top_k=2 is tiny for real data; start with 5-10 and measure quality vs. cost.

VERSION In llama-index < 0.10.0, this was 'from llama_index import GPTVectorStoreIndex'. In 0.10.0+, it became 'from llama_index.core import VectorStoreIndex' and you must pass an LLM explicitly via Settings, not rely on defaults.

Next, learn how to customize what the retriever returns using filters and metadata: so you can retrieve by date, category, or other fields alongside semantic similarity.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.