Code Beginner easy · 3 min

Empty index error: no documents loaded

What you will learn

A VectorStoreIndex requires documents to be loaded before you can query it, or it will silently fail or return no results.

Why this matters

The most common mistake when starting with LlamaIndex is creating an index without documents, then spending 30 minutes debugging why queries return nothing. Understanding this upfront saves frustration and teaches the correct initialization pattern.

Skip if: You don't need to worry about this if you're using a pre-built index that was serialized and loaded from disk: that index already contains documents. You also don't need this concept if you're building a chat application that doesn't use vector search (e.g., pure LLM conversation).

Explanation

An empty index is a VectorStoreIndex created with zero documents. When you call VectorStoreIndex.from_documents([]) or create an index without ever adding documents to it, the index has no embeddings to search against. Mechanically, the index is initialized but contains no vectors in its vector store. When you query it, the retriever finds zero context chunks, so the LLM either generates a response from its training data alone or explicitly states it has no relevant context. You should always load documents into your index before querying it. The pattern is: (1) read documents, (2) create index from those documents, (3) query the index. Skipping step 1 or 2 results in an empty index.

Analogy

An empty index is like opening a library that has shelves built but no books on them. The search system works perfectly: it just has nothing to search through.

Code

Illustrative only - not runnable without a valid API key

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document, Settings
from openai import OpenAI

Settings.llm = OpenAI(model='gpt-4.1')

print("\n--- Case 1: Empty index (wrong) ---")
empty_index = VectorStoreIndex.from_documents([])
print(f"Index node count: {len(empty_index.docstore.docs)}")
query_engine = empty_index.as_query_engine()
response = query_engine.query("What is machine learning?")
print(f"Response: {response}")
print(f"Source nodes found: {len(response.source_nodes)}")

print("\n--- Case 2: Index with documents (correct) ---")
docs = [
    Document(text="Machine learning is a subset of artificial intelligence that enables systems to learn from data."),
    Document(text="Deep learning uses neural networks with multiple layers to process complex patterns.")
]
proper_index = VectorStoreIndex.from_documents(docs)
print(f"Index node count: {len(proper_index.docstore.docs)}")
query_engine = proper_index.as_query_engine()
response = query_engine.query("What is machine learning?")
print(f"Response: {response}")
print(f"Source nodes found: {len(response.source_nodes)}")

Output

--- Case 1: Empty index (wrong) ---
Index node count: 0
Response: I don't have specific information about machine learning in the provided context. However, I can share that machine learning is a field of artificial intelligence that focuses on the development of algorithms and statistical models that enable computers to improve their performance on tasks through experience without being explicitly programmed.
Source nodes found: 0

--- Case 2: Index with documents (correct) ---
Index node count: 2
Response: According to the provided context, machine learning is a subset of artificial intelligence that enables systems to learn from data.
Source nodes found: 1

What just happened?

The code created two indexes: one with an empty documents list and one with two actual documents. The empty index's docstore has zero nodes, so when queried, it finds zero source nodes and the LLM generates a response from general knowledge alone. The proper index has 2 nodes, finds relevant context when queried, and returns that context in source_nodes. The difference is immediate and visible in the node counts and source_node lists.

Common gotcha

The empty index doesn't crash: it succeeds silently. You can call .as_query_engine().query() on an empty index and get a response back. The response looks plausible because the LLM is just hallucinating based on training data. You won't realize it's empty until you check len(response.source_nodes) and see it's 0, or until you run len(index.docstore.docs) and realize the index never loaded your documents.

Error recovery

Empty docstore (check node count)

Run <code>print(len(index.docstore.docs))</code>. If it's 0, you never loaded documents. Call <code>docs = SimpleDirectoryReader('./data').load_data()</code> first, then <code>index = VectorStoreIndex.from_documents(docs)</code>.

Query returns no source nodes

Check if <code>len(response.source_nodes) == 0</code>. This means the index is empty or the query didn't match any documents. Verify documents were loaded by inspecting <code>index.docstore.docs</code>.

File path not found

If <code>SimpleDirectoryReader</code> finds no files, you'll get an empty list. Ensure the directory path is correct and contains .txt, .pdf, or other supported files. Use an absolute path to debug: <code>from pathlib import Path; print(Path('./data').exists())</code>.

Experienced dev note

In production, always assert that your index is non-empty before deploying. Add a simple health check: assert len(index.docstore.docs) > 0, 'Index is empty: documents were not loaded'. This prevents silent failures where your RAG system appears to work but is actually generating hallucinations from an empty knowledge base. A response that looks correct but has zero source nodes is harder to debug than a crash.

Check your understanding

You create an index from an empty list of documents and query it. The LLM returns a well-formatted, confident answer with proper citations. Why is this dangerous, and how would you detect it in code?

Show answer hint

A correct answer explains that: (1) An empty index will still generate responses because the LLM uses its training data, not RAG context; (2) You detect this by checking that <code>response.source_nodes</code> is non-empty or by inspecting <code>len(index.docstore.docs)</code> before querying; (3) The danger is that users trust the response when it's actually a hallucination, not retrieved knowledge.

VERSION In llama-index-core < 0.10.0, the API was from llama_index import VectorStoreIndex and GPTVectorStoreIndex. This item uses the modern 0.12.x API: from llama_index.core import VectorStoreIndex and Settings for configuration.

Next, learn how to load documents from files using SimpleDirectoryReader and understand what document splitting does to an index.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.