Code Beginner easy · 4 min

as_query_engine(): the simplest pattern

What you will learn

Convert a LlamaIndex document index into a conversational query interface in one method call.

Why this matters

This is the most direct path from 'I have documents' to 'I can ask questions about them': no complex chains, no manual retrieval setup, just one line that handles the entire Q&A flow.

Skip if: Don't use <code>as_query_engine()</code> when you need fine-grained control over retrieval settings, custom post-processing of results, or chaining multiple retrieval steps. Use explicit <code>QueryEngine</code> or <code>StructuredQueryEngine</code> instead.

Explanation

What it is: as_query_engine() is a convenience method on any LlamaIndex index that instantly converts it into a query engine: an object that accepts natural language questions and returns answers by retrieving and synthesizing documents. How it works: When you call as_query_engine(), it takes the index you built, wraps it with a retrieval-and-synthesis pipeline, and returns a callable object. When you invoke .query("your question") on it, the engine retrieves relevant documents from the index and passes them to an LLM to generate an answer. When to use it: Use this for straightforward question-answering over documents: customer support FAQs, documentation search, knowledge base queries. It's the entry point most developers should start with before optimizing.

Analogy

Think of <code>as_query_engine()</code> as turning a library (your index) into a librarian (the query engine). You don't manage the librarian's every step: you just ask a question and get an answer back.

Code

Illustrative only - not runnable without a valid API key

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model='gpt-4.1')

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query('What is the main topic of these documents?')

print(f'Answer: {response}')
print(f'Type: {type(response)}')

Output

Answer: The main topic of these documents is artificial intelligence and machine learning fundamentals.
Type: <class 'llama_index.core.schema.Response'>

What just happened?

The code created a vector store index from documents in the 'data' folder, then called <code>as_query_engine()</code> to convert it into a query engine. When <code>.query()</code> was invoked with a question, the engine retrieved the top-k most relevant documents, passed them to GPT-4.1, and returned a <code>Response</code> object containing the synthesized answer as a string.

Common gotcha

Developers assume response is a plain string and try response.split() or len(response): but response is a Response object. Access the text with str(response) or response.response to get the actual string.

Error recovery

FileNotFoundError

You specified 'data' directory but it doesn't exist or contains no files. Create the directory and add document files (PDF, TXT, MD) to it, or change the path to where your documents actually are.

ValueError: Model must be specified

You didn't set <code>Settings.llm</code> before building the index. Add <code>Settings.llm = OpenAI(model='gpt-4.1')</code> before calling <code>from_documents()</code>.

AuthenticationError

Your <code>OPENAI_API_KEY</code> environment variable is not set or invalid. Run <code>export OPENAI_API_KEY='sk-...'</code> in your shell before running the script.

Experienced dev note

The Response object contains metadata beyond just the answer text: response.source_nodes gives you the actual document chunks that were used to generate the answer, letting you cite sources or debug retrieval quality. Always inspect .source_nodes in development: it catches when retrieval failed silently and the LLM hallucinated instead.

Check your understanding

If you called as_query_engine() on an index but the answers seem generic or don't reference your documents, what could be wrong and how would you verify it?

Show answer hint

The answer should involve checking <code>response.source_nodes</code> to confirm the retriever is actually pulling relevant documents. If <code>source_nodes</code> is empty or contains irrelevant chunks, the problem is retrieval quality (embedding model mismatch, too few documents, or bad similarity thresholds), not the query engine itself.

VERSION In llama-index-core < 0.10.0, this method was called on GPTVectorStoreIndex. As of 0.10.0+, all index types inherit as_query_engine(), and the old class names are deprecated. Use VectorStoreIndex and Settings instead.

Next, learn how to customize retrieval behavior by passing parameters to <code>as_query_engine()</code>: like setting <code>similarity_top_k</code> to control how many documents are retrieved per query.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.