TesseractNotFoundError
pytesseract.pytesseract.TesseractNotFoundError
Stack trace
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file at https://github.com/UB-Mannheim/pytesseract/blob/master/README.md
Why it happens
Pytesseract is only a Python wrapper around the Tesseract OCR engine. The actual OCR engine (tesseract binary) must be installed separately on your operating system. When pytesseract tries to run image text extraction, it looks for the tesseract executable in your system PATH. If it's not installed or not in PATH, pytesseract raises TesseractNotFoundError. This is common when you pip install pytesseract but forget to install the system-level Tesseract binary.
Detection
Add a quick system PATH check before importing: run `which tesseract` (macOS/Linux) or `where tesseract` (Windows) in your terminal. If no output, Tesseract is not in PATH. Also test with `python -c 'import pytesseract; pytesseract.get_tesseract_version()'` to validate the installation.
Causes & fixes
Tesseract OCR engine is not installed on your system at all
Install Tesseract using your OS package manager: macOS run `brew install tesseract`, Ubuntu/Debian run `sudo apt-get install tesseract-ocr`, Windows download installer from https://github.com/UB-Mannheim/tesseract-ocr-for-windows/releases and run the .exe.
Tesseract is installed but not in your system PATH environment variable
On Windows, add Tesseract install directory (default C:\Program Files\Tesseract-OCR) to PATH in System Environment Variables, or set it explicitly in Python with pytesseract.pytesseract.pytesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' before importing pytesseract.
Tesseract was installed in a custom location and pytesseract doesn't know where to find it
Explicitly configure pytesseract with the full path to your tesseract binary: pytesseract.pytesseract.pytesseract_cmd = '/custom/path/to/tesseract' on Unix or pytesseract.pytesseract.pytesseract_cmd = r'C:\custom\path\tesseract.exe' on Windows: add this line BEFORE calling pytesseract functions.
Using WSL (Windows Subsystem for Linux) and tesseract is installed on Windows but not in WSL environment
Install tesseract inside your WSL environment separately: `sudo apt-get install tesseract-ocr` even if it's installed on Windows, because WSL has its own isolated PATH.
Code: broken vs fixed
import pytesseract
from PIL import Image
# This line will fail with TesseractNotFoundError if tesseract is not in PATH
text = pytesseract.image_to_string(Image.open('document.png'))
print(text) import pytesseract
from PIL import Image
import os
# FIXED: Explicitly set tesseract path before using pytesseract
# Option 1: If tesseract is installed but not in PATH, point directly to the binary
if os.name == 'nt': # Windows
pytesseract.pytesseract.pytesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
else: # macOS/Linux
pytesseract.pytesseract.pytesseract_cmd = '/usr/local/bin/tesseract'
# Validate installation
try:
version = pytesseract.get_tesseract_version()
print(f'Tesseract version: {version}')
except pytesseract.TesseractNotFoundError as e:
print(f'ERROR: {e}')
exit(1)
# Now safe to use pytesseract
text = pytesseract.image_to_string(Image.open('document.png'))
print(f'Extracted text: {text}') Workaround
If you cannot install system Tesseract, use AWS Textract (cloud-based OCR) via boto3: `textract = boto3.client('textract')` then `response = textract.detect_document_text(Document={'S3Object': {'Bucket': bucket, 'Name': key}})`: this requires AWS credentials but requires no local Tesseract installation. Alternatively use EasyOCR which is pure-Python with built-in model downloads: `import easyocr; reader = easyocr.Reader(['en']); result = reader.readtext('image.png')`.
Prevention
In production, use Docker with Tesseract pre-installed: add `RUN apt-get install -y tesseract-ocr` to your Dockerfile so the binary is guaranteed present in every container. For local development, document the setup in README with copy-paste install commands for each OS. Use a setup validation script that runs on startup: write a `verify_ocr_setup.py` that calls pytesseract.get_tesseract_version() and fails loudly if missing, run it in your CI/CD pipeline.