What an ingestion pipeline solves
Why this matters
Without a pipeline, you manually chain document loading → text splitting → embedding → storage, making your code brittle, non-reproducible, and hard to swap components (e.g., switching embedding models or vector stores). Pipelines solve the 'integration nightmare' of RAG systems.
Explanation
What it is: An ingestion pipeline in LlamaIndex is a declarative workflow that takes raw documents and outputs them ready for retrieval: automatically handling parsing, chunking, embedding, and vector store insertion. It's the DAG (directed acyclic graph) between "files on disk" and "queryable index".
How it works mechanically: You define pipeline nodes (e.g., SimpleFileReader → SentenceSplitter → OpenAIEmbedding → PineconeVectorStore) and wire them together. Each node transforms documents or chunks, passing output to the next. When you run the pipeline, it executes the entire chain deterministically. The key insight: the pipeline itself is data-agnostic and reusable: you write it once, run it on different document sets, or swap out components without touching the orchestration logic.
When to use it: Use pipelines for any RAG system where documents may be re-indexed, where you need reproducibility, or where multiple team members need to ingest data consistently. For quick prototypes with static data, a manual chain is acceptable; for anything shipping to production or with frequent document updates, pipelines are non-negotiable.
Analogy
A factory assembly line. Raw materials (documents) enter, travel through stations (parser → chunker → embedder), and exit as finished products (indexed vectors). You design the line once, then feed it different raw materials without redesigning the stations.
Code
from llama_index.core import SimpleDirectoryReader, Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.storage import StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
import os
os.environ["OPENAI_API_KEY"] = "sk-test-key"
docs = [
Document(text="LlamaIndex is a framework for building RAG applications."),
Document(text="Ingestion pipelines automate the document-to-vector workflow."),
Document(text="You can chain multiple nodes in a pipeline for complex workflows."),
]
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=512, chunk_overlap=20),
OpenAIEmbedding(model="text-embedding-3-small"),
]
)
nodes = pipeline.run(documents=docs)
print(f"Created {len(nodes)} nodes")
print(f"First node text: {nodes[0].get_content()[:80]}...")
print(f"First node embedding length: {len(nodes[0].embedding)}")
print(f"First node has metadata: {nodes[0].metadata is not None}") Created 3 nodes First node text: LlamaIndex is a framework for building RAG applications.... First node embedding length: 1536 First node has metadata: True
What just happened?
The pipeline took 3 raw documents, split them into 3 nodes (no splitting occurred because each sentence fit under 512 chars), embedded each node using OpenAI's text-embedding-3-small model (producing 1536-dimensional vectors), and returned a list of Node objects with embeddings attached. Each node retained the original text and metadata from the source document. The pipeline executed transformations in order: SentenceSplitter first, then OpenAIEmbedding second.
Common gotcha
Developers often assume that calling pipeline.run(documents=docs) stores data in a vector store automatically: it doesn't. The pipeline returns nodes with embeddings; you still need to wrap it with a VectorStoreIndex or manually insert into your vector store. The pipeline is the transformation layer, not the persistence layer. This distinction trips up people migrating from the old GPTVectorStoreIndex.from_documents() pattern, which did both.
Error recovery
MissingOpenAIKeyErrorAttributeError: 'NoneType' object has no attribute 'embedding'TypeError: 'IngestionPipeline' object is not callableExperienced dev note
The real power of pipelines emerges when you need to re-index after document updates or A/B test different chunking strategies. Many teams build a one-off ingestion script, then 6 months later realize they can't reproduce it or swap the embedding model because the logic is buried in Jupyter notebooks. Build the pipeline abstraction from day one: it adds 5 minutes of code and saves 10 hours of debugging. Also: pipelines compose well with caching (llama-index caches embeddings by default), so you can re-run a pipeline on partially new data without re-embedding everything.
Check your understanding
You have 10,000 documents already indexed in Pinecone with OpenAI embeddings. Your team wants to switch to a different embedding model (e.g., Cohere) for better performance. Explain what parts of your ingestion pipeline you would change and what you would NOT need to change.
Show answer hint
A correct answer identifies that you'd replace the OpenAIEmbedding node with a CohereEmbedding node in the transformations list, re-run the pipeline, and re-insert into Pinecone. The Document parsing and chunking logic stays the same: only the embedding transformation changes. This is why pipelines are powerful: component swappability without orchestration logic changes.