Code Beginner easy · 4 min

What a query engine does

What you will learn

A query engine is the interface that takes your question and searches your indexed documents to find and synthesize the answer.

Why this matters

Before you can ask questions of your documents, you need to understand what component actually retrieves and processes the answer: that's the query engine. It's the bridge between your code and the data.

Skip if: You don't need a query engine if you're just storing documents and never querying them. If your only goal is to build an index and leave it, the query engine is irrelevant. Also, don't use a query engine if you're directly querying a database with SQL: that's a different retrieval pattern.

Explanation

A query engine is the runtime object that executes retrieval and synthesis on your indexed data. After you've built an index from documents, the query engine takes a user question, finds relevant chunks from that index, and (optionally) sends them to an LLM to synthesize a final answer. It orchestrates the entire request-to-response pipeline. Mechanically, when you call engine.query(), it performs retrieval (finding relevant documents), ranks them by relevance, optionally augments the context, and returns a structured response. In llama-index, the query engine wraps your index and adds the reasoning layer on top. Use a query engine whenever you have indexed documents and want natural-language access to that data.

Analogy

Think of the query engine like a research librarian. You ask a question, the librarian searches the card catalog (the index) to find relevant books (document chunks), reads through them, and gives you a synthesized answer. The index is the catalog; the query engine is the librarian.

Code

Illustrative only - not runnable without a valid API key

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
import os

os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

Settings.llm = OpenAI(model='gpt-4.1')
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small')

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query('What are the main topics discussed?')

print(f'Query: {response.get_formatted_sources()}')
print(f'Answer: {response.response}')

Output

Query: [Source 1: data/document1.txt - ...]
Answer: The main topics discussed include vector embeddings, semantic search, and retrieval-augmented generation approaches for building knowledge systems.

What just happened?

The code created an index from documents in the 'data' folder, then called <code>.as_query_engine()</code> to wrap that index with retrieval and LLM synthesis. When <code>.query()</code> was called, the engine searched the index for relevant chunks, sent them to OpenAI's GPT-4.1 with your question, and returned a structured response object containing both the answer and source references.

Common gotcha

Developers often think the query engine returns a string: it doesn't. It returns a Response object. You must access .response to get the actual text answer, and .get_formatted_sources() for the source documents. Printing the response object directly will show a complex representation, not your answer.

Error recovery

AttributeError: 'Response' object has no attribute 'answer'

The property is .response, not .answer. Write response.response to get the string.

ModuleNotFoundError: No module named 'llama_index.core'

You have llama-index version < 0.10.0. Upgrade with: pip install --upgrade llama-index-core>=0.12.0

TypeError: as_query_engine() got unexpected keyword argument 'similarity_top_k'

Verify your Settings are configured before creating the query engine. Set Settings.llm and Settings.embed_model before calling .as_query_engine().

Experienced dev note

The query engine is stateless and can be called repeatedly with different questions without rebuilding the index. Build your index once (expensive), create the query engine once (cheap), then reuse it for thousands of queries. Also, the Response object contains metadata like relevance scores and source node IDs: use these for debugging and transparency, not just the raw .response text. Finally, query engines don't cache results; if you're asking the same question multiple times in production, add your own caching layer.

Check your understanding

If you have an indexed set of 1000 documents and you call query_engine.query() three times with different questions, how many times does the index get searched, and why doesn't the query engine need to rebuild the index each time?

Show answer hint

A correct answer explains that the index is searched three times (once per query), but the index itself is already built and stored in memory or persistent storage: the query engine reuses the same index object for all three queries without reconstruction.

VERSION llama-index-core 0.12.x (April 2026) uses Settings-based configuration. Versions < 0.10.0 used ServiceContext, which is deprecated. Ensure you're targeting 0.12.x or later.

Next, you'll learn how to customize what the query engine retrieves and how it synthesizes answers: this means configuring retrieval parameters and understanding different query engine modes.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.