retrieve(): getting nodes for a query
Why this matters
Understanding what nodes your index retrieves is critical for debugging why an LLM gives wrong answers: the answer is only as good as the retrieved context, and retrieve() lets you verify that before the LLM ever sees it.
Explanation
retrieve() is a method on query engines that returns the raw document Node objects that matched your query, without passing them to an LLM for synthesis. It's the retrieval half of the retrieval-augmented generation (RAG) pipeline, separated from the generation half.
Mechanically, when you call retriever.retrieve(query_str), the index searches its embeddings or BM25 index for nodes similar to your query, applies any filters you've set, ranks them by relevance score, and returns a list of NodeWithScore objects: each containing the actual text, metadata, and similarity score. No LLM is involved at this stage.
Use retrieve() when you need to inspect what data the index found before an LLM processes it, or when you want to build custom logic on top of retrieval without full RAG synthesis.
Analogy
retrieve() is like asking a librarian for all books matching your topic: she hands you the actual books ranked by relevance. query_engine.query() is like asking her to read those books and write you a summary. retrieve() gives you the raw materials; query() gives you the finished product.
Code
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
import os
os.environ['OPENAI_API_KEY'] = 'sk-your-key-here'
Settings.llm = OpenAI(model='gpt-4')
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small')
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k=3)
query = 'how do neural networks learn?'
nodes = retriever.retrieve(query)
print(f'Retrieved {len(nodes)} nodes:')
for i, node in enumerate(nodes):
print(f'\nNode {i+1} (score: {node.score:.3f})')
print(f'Text: {node.node.get_content()[:100]}...')
print(f'Metadata: {node.node.metadata}') Retrieved 3 nodes:
Node 1 (score: 0.845)
Text: Neural networks learn through a process called backpropagation, which adjusts weights based on prediction error. During forward pass...
Metadata: {'file_name': 'neural_basics.pdf', 'page_label': '12'}
Node 2 (score: 0.812)
Text: Learning involves computing the gradient of the loss function with respect to each weight, then using optimization algorithms like SGD or Adam...
Metadata: {'file_name': 'optimization.pdf', 'page_label': '5'}
Node 3 (score: 0.778)
Text: The learning rate controls how large each weight update is. Too high and the network overshoots the minimum; too low and training stalls...
Metadata: {'file_name': 'hyperparameters.pdf', 'page_label': '8'} What just happened?
The code created a VectorStoreIndex from sample documents, converted it to a retriever, then called retrieve() with a query string. The retriever embedded the query using the configured embedding model, searched the vector index for the 3 most similar nodes, and returned them as a list of NodeWithScore objects, each with a similarity score and the original node text and metadata. No LLM was invoked: this is pure retrieval.
Common gotcha
Developers often assume retrieve() returns the final answer or that higher scores guarantee relevance. It doesn't: retrieve() returns raw nodes ranked by embedding similarity, which can be noisy or miss semantic nuance. A node with score 0.85 might still contain irrelevant text if your documents are poorly chunked. Always inspect the actual retrieved text, not just the scores.
Error recovery
IndexNotInitializedExceptionEmbeddingErrorAttributeError: 'NodeWithScore' object has no attribute 'text'similarity_top_k returns fewer than requested nodesExperienced dev note
retrieve() is your debugging superpower. When an LLM gives a wrong answer in production, your first instinct should be to check what retrieve() returned: 90% of RAG failures are retrieval failures, not generation failures. A senior developer always inspects nodes before blaming the LLM. Also: retrieve() calls the embedding model every time, so if you're calling it in a loop for the same query, cache the results or use a retriever with caching enabled to avoid unnecessary API costs.
Check your understanding
If you call retrieve() with similarity_top_k=5 and it returns only 3 nodes with high scores, then you query the same index with a different query and get 5 nodes back, what does this tell you about your document chunks and why might this happen?
Show answer hint
A correct answer recognizes that retrieve() is deterministic: it returns the top-k most similar nodes. If the count varies, it means either (a) the second query matched more documents than the first, or (b) your filters removed some results in the first query but not the second. It's not a bug: it's how similarity search works. A complete answer also identifies that chunking strategy affects overlap and retrieval density.