TesseractNotFoundError or language pack missing
pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH
Stack trace
pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<your_script>.py", line 42, in <module>
text = pytesseract.image_to_string(img, lang='fra')
File "/usr/local/lib/python3.11/site-packages/pytesseract/pytesseract.py", line 394, in image_to_string
return run_tesseract(image, 'txt', config=config, nice=nice)
File "/usr/local/lib/python3.11/site-packages/pytesseract/pytesseract.py", line 258, in run_tesseract
proc = subprocess.Popen(cmd, **subprocess_args())
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract' Why it happens
pytesseract is a Python wrapper around the Tesseract OCR engine, which is a separate C++ binary that must be installed on your system. Additionally, Tesseract requires trained language data files (eng.traineddata, fra.traineddata, etc.) in its tessdata directory to recognize text in specific languages. When you call image_to_string(lang='fra'), pytesseract looks for the French language pack: if it's missing or Tesseract itself isn't installed, the call fails. This is especially common on fresh Docker containers, CI/CD pipelines, or Windows systems where Tesseract installation is manual.
Detection
Before calling image_to_string(), check if Tesseract is installed and accessible via subprocess.run(['tesseract', '--version'], capture_output=True). Log the result and available language packs via subprocess.run(['tesseract', '--list-langs'], capture_output=True). Catch TesseractNotFoundError explicitly and provide install instructions in the error message.
Causes & fixes
Tesseract OCR binary is not installed on the system or not in PATH
Install Tesseract: Ubuntu/Debian: 'sudo apt-get install tesseract-ocr', macOS: 'brew install tesseract', Windows: download installer from https://github.com/UB-Mannheim/tesseract/wiki or 'choco install tesseract'
Language pack (e.g., eng.traineddata, fra.traineddata) is missing from Tesseract's tessdata directory
Install language packs: Ubuntu/Debian: 'sudo apt-get install tesseract-ocr-fra tesseract-ocr-deu' (or language code), or manually download from https://github.com/UB-Mannheim/tesseract/wiki and place in tessdata folder
pytesseract cannot find Tesseract executable because PATH is not set or Tesseract is in a non-standard location
Explicitly tell pytesseract where Tesseract is installed by setting pytesseract.pytesseract.pytesseract_cmd = '/path/to/tesseract' before calling image_to_string()
Running in Docker/container without installing Tesseract in the image
Add 'RUN apt-get update && apt-get install -y tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra' to your Dockerfile, or use a pre-built image like 'python:3.11-slim' with Tesseract baked in
Code: broken vs fixed
import pytesseract
from PIL import Image
import os
# Load image
img = Image.open('french_document.png')
# This will fail if Tesseract or French language pack not installed
text = pytesseract.image_to_string(img, lang='fra') # ← ERROR: language pack missing or tesseract not found
print(text) import pytesseract
from PIL import Image
import subprocess
import os
# Step 1: Set Tesseract path (important on Windows or non-standard installs)
# On Windows: pytesseract.pytesseract.pytesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
# On macOS/Linux, if installed via brew/apt, this is automatic — but you can override:
pytesseract.pytesseract.pytesseract_cmd = '/usr/bin/tesseract' # or '/opt/homebrew/bin/tesseract' on M1 Mac
# Step 2: Verify Tesseract is installed and check available languages
try:
result = subprocess.run(['tesseract', '--list-langs'], capture_output=True, text=True)
if 'eng' not in result.stdout:
raise RuntimeError('English language pack not found. Install: apt-get install tesseract-ocr-eng')
if 'fra' not in result.stdout:
raise RuntimeError('French language pack not found. Install: apt-get install tesseract-ocr-fra')
print('Available languages:', result.stdout.strip())
except FileNotFoundError:
raise RuntimeError('Tesseract not installed. Install: sudo apt-get install tesseract-ocr')
# Step 3: Load image and extract text
img = Image.open('french_document.png')
text = pytesseract.image_to_string(img, lang='fra') # ← FIXED: Tesseract and language packs verified
print('Extracted text:', text) Workaround
If you cannot install Tesseract system-wide, use AWS Textract (serverless OCR) via boto3: client.detect_document_text(Document={'Bytes': image_bytes}) bypasses Tesseract entirely and supports 30+ languages natively. Alternatively, use Google Cloud Vision API (google-cloud-vision) or Azure Computer Vision, which are cloud-hosted and pre-configured with all language packs.
Prevention
In production, use Docker with Tesseract baked into the image (RUN apt-get install tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra), or migrate to cloud OCR services (AWS Textract, Google Cloud Vision, Azure Computer Vision) which eliminate local dependency management entirely. For development, document Tesseract installation in README and use environment-specific CI/CD steps (GitHub Actions, GitLab CI) that auto-install dependencies before tests run.