FileNotFoundError
langchain.document_loaders.pdf.PyPDFLoader.FileNotFoundError
Stack trace
Traceback (most recent call last):
File "app.py", line 12, in <module>
loader = PyPDFLoader("./docs/sample.pdf") # triggers error
File "/usr/local/lib/python3.9/site-packages/langchain/document_loaders/pdf.py", line 45, in __init__
with open(file_path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './docs/sample.pdf' Why it happens
PyPDFLoader attempts to open the PDF file at the given path but fails because the file does not exist, the path is incorrect, or the file is corrupted and unreadable. This causes Python's built-in FileNotFoundError or an IOError during file read.
Detection
Check for FileNotFoundError exceptions when initializing PyPDFLoader or loading documents, and log the file path to verify correctness before processing.
Causes & fixes
The PDF file path provided to PyPDFLoader does not exist or is misspelled.
Verify the file path is correct, the file exists at that location, and the path is relative to the running script or absolute.
The PDF file is corrupted or unreadable by PyPDFLoader.
Open the PDF manually with a PDF reader to confirm integrity. Replace or repair the file if corrupted.
Insufficient file system permissions to read the PDF file.
Ensure the running process has read permissions on the PDF file and its parent directories.
Code: broken vs fixed
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("./docs/sample.pdf") # FileNotFoundError if file missing
pages = loader.load()
print(pages) import os
from langchain.document_loaders import PyPDFLoader
# Ensure environment variable or correct path
pdf_path = os.environ.get("PDF_PATH", "./docs/sample.pdf")
loader = PyPDFLoader(pdf_path) # Fixed: use verified path from env
pages = loader.load()
print(pages) Workaround
Wrap PyPDFLoader initialization in try/except FileNotFoundError, log the missing path, and fallback to a default or prompt user for a valid path.
Prevention
Use configuration management or environment variables to manage PDF file paths and validate file existence and permissions before loading.