FileNotFoundError
langchain_community.document_loaders.pdf.PyPDFLoader: FileNotFoundError: [Errno 2] No such file or directory
Stack trace
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/document.pdf'
File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 45, in load
with open(self.file_path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/document.pdf' Why it happens
PyPDFLoader attempts to open a PDF file at the specified path using Python's open() function. If the path doesn't exist, is misspelled, uses a relative path that doesn't resolve from the current working directory, or the file was moved/deleted between script start and load time, Python raises FileNotFoundError. This is especially common when mixing absolute and relative paths, or when running scripts from different directories than expected.
Detection
Before calling loader.load(), verify the file exists using os.path.exists(file_path) or pathlib.Path(file_path).is_file(). Log the working directory and resolved absolute path to catch path resolution issues early in your pipeline.
Causes & fixes
Using a relative file path when the current working directory is different from the script location
Use an absolute path constructed from __file__: from pathlib import Path; pdf_path = Path(__file__).parent / 'documents' / 'file.pdf'
File path contains a typo or the file doesn't exist at that location
Verify the path exists before loading: if not Path(pdf_path).is_file(): raise FileNotFoundError(f'PDF not found at {pdf_path}')
File is located in a different directory (e.g., data/ or downloads/) but path assumes it's in the current directory
Use absolute paths or construct paths relative to the script: pdf_path = Path(__file__).parent / 'data' / 'documents' / 'file.pdf'
Running the script from a different directory than where the relative path assumes (e.g., from project root vs subdirectory)
Use os.chdir() to set working directory explicitly, or always use absolute paths: os.chdir(Path(__file__).parent); loader = PyPDFLoader('./documents/file.pdf')
Code: broken vs fixed
import os
from langchain_community.document_loaders import PyPDFLoader
# BROKEN: relative path that fails if working directory is wrong
loader = PyPDFLoader('documents/report.pdf') # FileNotFoundError if not run from correct dir
docs = loader.load()
print(f'Loaded {len(docs)} pages') import os
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
# FIXED: absolute path using __file__ — works from any directory
script_dir = Path(__file__).parent
pdf_path = script_dir / 'documents' / 'report.pdf'
# Verify file exists before loading
if not pdf_path.is_file():
raise FileNotFoundError(f'PDF not found at {pdf_path}')
loader = PyPDFLoader(str(pdf_path))
docs = loader.load()
print(f'Loaded {len(docs)} pages from {pdf_path}') Workaround
If you can't immediately fix the path resolution, use try/except to catch FileNotFoundError, log the attempted path and current working directory for debugging, then check sys.argv[0] or os.getcwd() to understand where the script is actually running from. Alternatively, accept the PDF path as a command-line argument: pdf_path = sys.argv[1] if len(sys.argv) > 1 else 'documents/report.pdf'
Prevention
Always use absolute paths derived from __file__ for file operations in Python packages and command-line tools. For web applications, accept file paths as configuration (environment variables or config files) rather than hardcoding them. For data pipelines, make the input file path an explicit parameter (function argument or CLI flag) rather than assuming a fixed location.