Code Beginner easy · 4 min

retrieve(): getting nodes for a query

What you will learn

retrieve() returns the raw document nodes that match your query before LLM synthesis, letting you inspect what data your index actually found.

Why this matters

Understanding what nodes your index retrieves is critical for debugging why an LLM gives wrong answers: the answer is only as good as the retrieved context, and retrieve() lets you verify that before the LLM ever sees it.

Skip if: Do not use retrieve() if you only care about the final LLM-synthesized answer and don't need to debug or validate intermediate retrieval results. Use query_engine.query() directly instead: it's simpler and does retrieval + synthesis in one call.

Explanation

retrieve() is a method on query engines that returns the raw document Node objects that matched your query, without passing them to an LLM for synthesis. It's the retrieval half of the retrieval-augmented generation (RAG) pipeline, separated from the generation half.

Mechanically, when you call retriever.retrieve(query_str), the index searches its embeddings or BM25 index for nodes similar to your query, applies any filters you've set, ranks them by relevance score, and returns a list of NodeWithScore objects: each containing the actual text, metadata, and similarity score. No LLM is involved at this stage.

Use retrieve() when you need to inspect what data the index found before an LLM processes it, or when you want to build custom logic on top of retrieval without full RAG synthesis.

Analogy

retrieve() is like asking a librarian for all books matching your topic: she hands you the actual books ranked by relevance. query_engine.query() is like asking her to read those books and write you a summary. retrieve() gives you the raw materials; query() gives you the finished product.

Code

Illustrative only - not runnable without a valid API key

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
import os

os.environ['OPENAI_API_KEY'] = 'sk-your-key-here'

Settings.llm = OpenAI(model='gpt-4')
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small')

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

retriever = index.as_retriever(similarity_top_k=3)

query = 'how do neural networks learn?'
nodes = retriever.retrieve(query)

print(f'Retrieved {len(nodes)} nodes:')
for i, node in enumerate(nodes):
    print(f'\nNode {i+1} (score: {node.score:.3f})')
    print(f'Text: {node.node.get_content()[:100]}...')
    print(f'Metadata: {node.node.metadata}')

Output

Retrieved 3 nodes:

Node 1 (score: 0.845)
Text: Neural networks learn through a process called backpropagation, which adjusts weights based on prediction error. During forward pass...
Metadata: {'file_name': 'neural_basics.pdf', 'page_label': '12'}

Node 2 (score: 0.812)
Text: Learning involves computing the gradient of the loss function with respect to each weight, then using optimization algorithms like SGD or Adam...
Metadata: {'file_name': 'optimization.pdf', 'page_label': '5'}

Node 3 (score: 0.778)
Text: The learning rate controls how large each weight update is. Too high and the network overshoots the minimum; too low and training stalls...
Metadata: {'file_name': 'hyperparameters.pdf', 'page_label': '8'}

What just happened?

The code created a VectorStoreIndex from sample documents, converted it to a retriever, then called retrieve() with a query string. The retriever embedded the query using the configured embedding model, searched the vector index for the 3 most similar nodes, and returned them as a list of NodeWithScore objects, each with a similarity score and the original node text and metadata. No LLM was invoked: this is pure retrieval.

Common gotcha

Developers often assume retrieve() returns the final answer or that higher scores guarantee relevance. It doesn't: retrieve() returns raw nodes ranked by embedding similarity, which can be noisy or miss semantic nuance. A node with score 0.85 might still contain irrelevant text if your documents are poorly chunked. Always inspect the actual retrieved text, not just the scores.

Error recovery

IndexNotInitializedException

You tried to call retrieve() on an index that wasn't built yet. Call VectorStoreIndex.from_documents() first or load an existing index from disk.

EmbeddingError

The embedding model (set in Settings.embed_model) failed: usually because the OpenAI API key is invalid or the model name is wrong. Verify your OPENAI_API_KEY and that you're using a valid embedding model like 'text-embedding-3-small'.

AttributeError: 'NodeWithScore' object has no attribute 'text'

You're accessing .text instead of .node.get_content(). NodeWithScore wraps the actual Node: access its content via node.get_content() or node.node.get_text().

similarity_top_k returns fewer than requested nodes

Your index has fewer documents than similarity_top_k requested, or filters removed some results. This is normal: retrieve() returns what exists, not padded results.

Experienced dev note

retrieve() is your debugging superpower. When an LLM gives a wrong answer in production, your first instinct should be to check what retrieve() returned: 90% of RAG failures are retrieval failures, not generation failures. A senior developer always inspects nodes before blaming the LLM. Also: retrieve() calls the embedding model every time, so if you're calling it in a loop for the same query, cache the results or use a retriever with caching enabled to avoid unnecessary API costs.

Check your understanding

If you call retrieve() with similarity_top_k=5 and it returns only 3 nodes with high scores, then you query the same index with a different query and get 5 nodes back, what does this tell you about your document chunks and why might this happen?

Show answer hint

A correct answer recognizes that retrieve() is deterministic: it returns the top-k most similar nodes. If the count varies, it means either (a) the second query matched more documents than the first, or (b) your filters removed some results in the first query but not the second. It's not a bug: it's how similarity search works. A complete answer also identifies that chunking strategy affects overlap and retrieval density.

VERSION retrieve() was stabilized in llama-index-core >= 0.9.0. Prior to 0.9.0 (< 0.9.0), the method signature and return type differed. As of 0.12.x, the API is stable and recommended.

Next, you'll learn to synthesize these retrieved nodes into a final answer using query_engine.query(), which wraps retrieve() and adds the LLM synthesis step.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.