How to load multiple files with LlamaIndex
Quick answer
Use LlamaIndex's
SimpleDirectoryReader or Document loaders to load multiple files by specifying a directory or iterating over file paths. Then create an index with GPTVectorStoreIndex or other index classes to process all loaded documents together.PREREQUISITES
Python 3.8+pip install llama-index>=0.6.0OpenAI API key (free tier works)Set environment variable OPENAI_API_KEY
Setup
Install llama-index via pip and set your OpenAI API key in the environment.
pip install llama-index>=0.6.0 Step by step
This example loads multiple text files from a directory using SimpleDirectoryReader, then creates a vector index with GPTVectorStoreIndex for querying.
import os
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
# Ensure your OpenAI API key is set in environment
# export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]
# Load all documents from a directory
loader = SimpleDirectoryReader('data') # 'data' folder with multiple text files
documents = loader.load_data()
# Create an index from the loaded documents
index = GPTVectorStoreIndex.from_documents(documents)
# Query the index
query = "What is the main topic of these documents?"
response = index.query(query)
print(response.response) output
The main topic of these documents is ... (depends on your files)
Common variations
- Use
Documentclass to load files individually and combine into a list. - Load PDFs or other formats with specialized loaders like
PDFReader. - Use async methods if supported for large datasets.
- Switch index types, e.g.,
GPTListIndexorGPTTreeIndex, depending on use case.
from llama_index import Document
file_paths = ['data/file1.txt', 'data/file2.txt']
documents = []
for path in file_paths:
with open(path, 'r', encoding='utf-8') as f:
text = f.read()
documents.append(Document(text=text, doc_id=path))
index = GPTVectorStoreIndex.from_documents(documents)
response = index.query("Summarize the content.")
print(response.response) output
Summary of the content ...
Troubleshooting
- If you see
FileNotFoundError, verify your directory path and file names. - For encoding errors, ensure files are UTF-8 encoded or specify encoding explicitly.
- If the index creation fails, check your OpenAI API key and network connectivity.
Key Takeaways
- Use
SimpleDirectoryReaderto load multiple files from a folder easily. - Combine loaded documents into a list and create an index with
GPTVectorStoreIndex. - You can load files individually with
Documentfor more control. - Check file paths and encoding to avoid common loading errors.
- Choose the index type based on your querying needs.