Code Intermediate medium · 8 min

Pipeline persistence and reuse

What you will learn

Save and reload your entire index pipeline: documents, embeddings, retrievers: as a single serializable artifact to avoid recomputing embeddings and rebuilding indexes.

Why this matters

In production, re-embedding 10,000 documents every time your app restarts costs money, time, and API quota. Persistence lets you build once and serve many times. This is non-optional for any deployed RAG system.

Skip if: Do not persist if your document corpus changes frequently (daily refreshes). Use event-driven rebuilds instead. Also skip persistence for rapid prototyping where rebuild speed doesn't matter and code iteration is faster than save/load cycles.

Explanation

Pipeline persistence means serializing your entire index: the vector store, embeddings, and metadata: to disk or cloud storage, then deserializing it later without re-embedding. llama-index achieves this through StorageContext, which manages where and how your index state is stored. When you persist, you're saving the graph of documents, computed embeddings, and index metadata; when you load, you reconstruct that exact state without touching your LLM or embedding model. The persist_dir parameter is your entry point: set it once during index creation, call .persist() after building, then load with StorageContext.from_defaults(persist_dir=...) on restart. Under the hood, llama-index serializes the vector store (usually to SQLite + JSONL), embeddings cache, and index structure into that directory. Loading deserializes all of it back into memory, reconstructing the searchable index instantly.

Analogy

Think of persistence like taking a snapshot of your entire database. Instead of replaying every INSERT statement every time you restart (re-embedding), you just load the snapshot. Your RAG app becomes a photo album viewer instead of a photo developer.

Code

Illustrative only - not runnable without a valid API key

python

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings, StorageContext
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

os.environ["OPENAI_API_KEY"] = "sk-your-key-here"

Settings.llm = OpenAI(model="gpt-4.1")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

persist_dir = "./index_storage"

if not os.path.exists(persist_dir):
    print("Building index for the first time...")
    documents = SimpleDirectoryReader(input_dir="./documents").load_data()
    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist(persist_dir=persist_dir)
    print(f"Index persisted to {persist_dir}")
else:
    print("Loading persisted index...")
    storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
    index = VectorStoreIndex.from_existing_index(storage_context)
    print(f"Index loaded from {persist_dir}")

query_engine = index.as_query_engine()
response = query_engine.query("What are the key topics?")
print(f"Query response: {response}")

Output

Building index for the first time...
Index persisted to ./index_storage
Query response: Based on the documents provided, the key topics include...

What just happened?

The code checked if a persisted index already exists. On first run, it loaded documents, built the index with embeddings, and saved everything to disk using <code>persist()</code>. On subsequent runs, it skips all of that and loads the pre-built index from disk using <code>StorageContext.from_defaults()</code>, then immediately queries without any embedding computation. The query engine works identically in both cases: it has no way to know whether the index was just built or loaded from disk.

Common gotcha

Developers often forget to call .persist() after building the index and then wonder why a fresh run still re-embeds everything. Also: if you change your embedding model (e.g., switch from text-embedding-3-small to text-embedding-3-large), your persisted embeddings become stale and mismatched. Always rebuild the index when your embedding model changes, otherwise your vector search will compare apples to oranges.

Error recovery

FileNotFoundError when loading

Your persist_dir doesn't exist or the path is wrong. Check the exact directory path and ensure index was persisted first with .persist(persist_dir=...).

Deserialization mismatch error

Your llama-index version changed between persist and load, or the storage format is corrupted. Delete the persist_dir and rebuild from scratch.

Index initialized but no documents found

You're loading the storage context but the documents weren't persisted: you only persisted the index structure, not the raw documents. Use SimpleDirectoryReader again to load documents alongside the loaded storage context if you need document access.

Experienced dev note

In production, don't persist to local disk: persist to cloud storage (S3, GCS, Azure Blob) using a custom storage context or llama-index's cloud integrations. Local disk persistence is a liability in containerized or serverless environments where the filesystem is ephemeral. Also, version your persist_dir by embedding the embedding model name or a hash: e.g., persist_dir = f"./index_{EMBEDDING_MODEL}_{INDEX_VERSION}". This saves you from subtle bugs where old and new embeddings get mixed in the same index.

Check your understanding

If you persist an index built with text-embedding-3-small, then later load it and query using text-embedding-3-large, what happens to the query results and why?

Show answer hint

A correct answer explains that the query embedding uses a different model (text-embedding-3-large) than the persisted embeddings (text-embedding-3-small), so vector similarities become meaningless: you're comparing vectors from incompatible embedding spaces. Results will be essentially random. This shows understanding that persistence ties you to a specific embedding model.

VERSION StorageContext API is stable in llama-index-core 0.12.x. In versions < 0.10.0, use ServiceContext instead of Settings; this code requires 0.10.0 or later.

Learn how to build custom storage backends to persist indexes to your own database or cloud storage instead of the default local filesystem.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.