How to use LlamaIndex node parser
Quick answer
Use the
LlamaIndex node parser to split documents into manageable chunks (nodes) for indexing and retrieval. Instantiate a parser like SimpleNodeParser, then call get_nodes_from_documents() on your documents to obtain parsed nodes ready for indexing or processing.PREREQUISITES
Python 3.8+pip install llama-indexBasic knowledge of document processing in Python
Setup
Install the llama-index package and prepare your environment.
pip install llama-index Step by step
This example shows how to parse text documents into nodes using SimpleNodeParser from llama_index.node_parser. Nodes represent chunks of text suitable for indexing or further processing.
from llama_index import Document
from llama_index.node_parser import SimpleNodeParser
# Sample documents
documents = [
Document(text="This is the first document. It has multiple sentences."),
Document(text="Here is the second document. It also has several sentences.")
]
# Initialize the node parser
parser = SimpleNodeParser()
# Parse documents into nodes
nodes = parser.get_nodes_from_documents(documents)
# Print node texts
for i, node in enumerate(nodes):
print(f"Node {i+1} text:", node.get_text()) output
Node 1 text: This is the first document. It has multiple sentences. Node 2 text: Here is the second document. It also has several sentences.
Common variations
You can customize chunking by using different node parsers such as MarkdownNodeParser or SimpleNodeParser with parameters to control chunk size. Async parsing is not typically required as parsing is local and fast. Combine node parsing with llama_index indexing for retrieval tasks.
from llama_index.node_parser import SimpleNodeParser
# Customize chunk size
parser = SimpleNodeParser(chunk_size=512, chunk_overlap=50)
# Use with documents as before
nodes = parser.get_nodes_from_documents(documents)
for node in nodes:
print(node.get_text()) output
This is the first document. It has multiple sentences. Here is the second document. It also has several sentences.
Troubleshooting
- If you get import errors, ensure
llama-indexis installed and up to date. - If nodes are too large or too small, adjust
chunk_sizeandchunk_overlapparameters in the parser. - Ensure your documents are instances of
llama_index.Documentwith text content.
Key Takeaways
- Use
SimpleNodeParserto chunk documents into nodes for indexing. - Adjust
chunk_sizeandchunk_overlapto control chunk granularity. - Nodes are the fundamental units for retrieval and processing in
llama-index. - Always pass
llama_index.Documentobjects to the parser. - Parsing is synchronous and fast; async usage is generally unnecessary.