Empty index error: no documents loaded
Why this matters
The most common mistake when starting with LlamaIndex is creating an index without documents, then spending 30 minutes debugging why queries return nothing. Understanding this upfront saves frustration and teaches the correct initialization pattern.
Explanation
An empty index is a VectorStoreIndex created with zero documents. When you call VectorStoreIndex.from_documents([]) or create an index without ever adding documents to it, the index has no embeddings to search against. Mechanically, the index is initialized but contains no vectors in its vector store. When you query it, the retriever finds zero context chunks, so the LLM either generates a response from its training data alone or explicitly states it has no relevant context. You should always load documents into your index before querying it. The pattern is: (1) read documents, (2) create index from those documents, (3) query the index. Skipping step 1 or 2 results in an empty index.
Analogy
An empty index is like opening a library that has shelves built but no books on them. The search system works perfectly: it just has nothing to search through.
Code
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document, Settings
from openai import OpenAI
Settings.llm = OpenAI(model='gpt-4.1')
print("\n--- Case 1: Empty index (wrong) ---")
empty_index = VectorStoreIndex.from_documents([])
print(f"Index node count: {len(empty_index.docstore.docs)}")
query_engine = empty_index.as_query_engine()
response = query_engine.query("What is machine learning?")
print(f"Response: {response}")
print(f"Source nodes found: {len(response.source_nodes)}")
print("\n--- Case 2: Index with documents (correct) ---")
docs = [
Document(text="Machine learning is a subset of artificial intelligence that enables systems to learn from data."),
Document(text="Deep learning uses neural networks with multiple layers to process complex patterns.")
]
proper_index = VectorStoreIndex.from_documents(docs)
print(f"Index node count: {len(proper_index.docstore.docs)}")
query_engine = proper_index.as_query_engine()
response = query_engine.query("What is machine learning?")
print(f"Response: {response}")
print(f"Source nodes found: {len(response.source_nodes)}") --- Case 1: Empty index (wrong) --- Index node count: 0 Response: I don't have specific information about machine learning in the provided context. However, I can share that machine learning is a field of artificial intelligence that focuses on the development of algorithms and statistical models that enable computers to improve their performance on tasks through experience without being explicitly programmed. Source nodes found: 0 --- Case 2: Index with documents (correct) --- Index node count: 2 Response: According to the provided context, machine learning is a subset of artificial intelligence that enables systems to learn from data. Source nodes found: 1
What just happened?
The code created two indexes: one with an empty documents list and one with two actual documents. The empty index's docstore has zero nodes, so when queried, it finds zero source nodes and the LLM generates a response from general knowledge alone. The proper index has 2 nodes, finds relevant context when queried, and returns that context in source_nodes. The difference is immediate and visible in the node counts and source_node lists.
Common gotcha
The empty index doesn't crash: it succeeds silently. You can call .as_query_engine().query() on an empty index and get a response back. The response looks plausible because the LLM is just hallucinating based on training data. You won't realize it's empty until you check len(response.source_nodes) and see it's 0, or until you run len(index.docstore.docs) and realize the index never loaded your documents.
Error recovery
Empty docstore (check node count)Query returns no source nodesFile path not foundExperienced dev note
In production, always assert that your index is non-empty before deploying. Add a simple health check: assert len(index.docstore.docs) > 0, 'Index is empty: documents were not loaded'. This prevents silent failures where your RAG system appears to work but is actually generating hallucinations from an empty knowledge base. A response that looks correct but has zero source nodes is harder to debug than a crash.
Check your understanding
You create an index from an empty list of documents and query it. The LLM returns a well-formatted, confident answer with proper citations. Why is this dangerous, and how would you detect it in code?
Show answer hint
A correct answer explains that: (1) An empty index will still generate responses because the LLM uses its training data, not RAG context; (2) You detect this by checking that <code>response.source_nodes</code> is non-empty or by inspecting <code>len(index.docstore.docs)</code> before querying; (3) The danger is that users trust the response when it's actually a hallucination, not retrieved knowledge.
from llama_index import VectorStoreIndex and GPTVectorStoreIndex. This item uses the modern 0.12.x API: from llama_index.core import VectorStoreIndex and Settings for configuration.