High severity intermediate · Fix: 5-15 min

TesseractNotFoundError or language pack missing

pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH

What this error means

pytesseract's image_to_string() fails because Tesseract OCR language data files (like eng.traineddata) are not installed or Tesseract itself is missing from PATH.

Stack trace

traceback

pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<your_script>.py", line 42, in <module>
    text = pytesseract.image_to_string(img, lang='fra')
  File "/usr/local/lib/python3.11/site-packages/pytesseract/pytesseract.py", line 394, in image_to_string
    return run_tesseract(image, 'txt', config=config, nice=nice)
  File "/usr/local/lib/python3.11/site-packages/pytesseract/pytesseract.py", line 258, in run_tesseract
    proc = subprocess.Popen(cmd, **subprocess_args())
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract'

QUICK FIX

Install Tesseract system binary + language packs (apt-get install tesseract-ocr tesseract-ocr-eng), then set pytesseract.pytesseract.pytesseract_cmd to the correct path if on Windows or non-standard install.

Why it happens

pytesseract is a Python wrapper around the Tesseract OCR engine, which is a separate C++ binary that must be installed on your system. Additionally, Tesseract requires trained language data files (eng.traineddata, fra.traineddata, etc.) in its tessdata directory to recognize text in specific languages. When you call image_to_string(lang='fra'), pytesseract looks for the French language pack: if it's missing or Tesseract itself isn't installed, the call fails. This is especially common on fresh Docker containers, CI/CD pipelines, or Windows systems where Tesseract installation is manual.

Detection

Before calling image_to_string(), check if Tesseract is installed and accessible via subprocess.run(['tesseract', '--version'], capture_output=True). Log the result and available language packs via subprocess.run(['tesseract', '--list-langs'], capture_output=True). Catch TesseractNotFoundError explicitly and provide install instructions in the error message.

Causes & fixes

Tesseract OCR binary is not installed on the system or not in PATH

✓ Fix

Install Tesseract: Ubuntu/Debian: 'sudo apt-get install tesseract-ocr', macOS: 'brew install tesseract', Windows: download installer from https://github.com/UB-Mannheim/tesseract/wiki or 'choco install tesseract'

Language pack (e.g., eng.traineddata, fra.traineddata) is missing from Tesseract's tessdata directory

✓ Fix

Install language packs: Ubuntu/Debian: 'sudo apt-get install tesseract-ocr-fra tesseract-ocr-deu' (or language code), or manually download from https://github.com/UB-Mannheim/tesseract/wiki and place in tessdata folder

pytesseract cannot find Tesseract executable because PATH is not set or Tesseract is in a non-standard location

✓ Fix

Explicitly tell pytesseract where Tesseract is installed by setting pytesseract.pytesseract.pytesseract_cmd = '/path/to/tesseract' before calling image_to_string()

Running in Docker/container without installing Tesseract in the image

✓ Fix

Add 'RUN apt-get update && apt-get install -y tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra' to your Dockerfile, or use a pre-built image like 'python:3.11-slim' with Tesseract baked in

Code: broken vs fixed

Broken - triggers the error

python

import pytesseract
from PIL import Image
import os

# Load image
img = Image.open('french_document.png')

# This will fail if Tesseract or French language pack not installed
text = pytesseract.image_to_string(img, lang='fra')  # ← ERROR: language pack missing or tesseract not found
print(text)

Fixed - works correctly

python

import pytesseract
from PIL import Image
import subprocess
import os

# Step 1: Set Tesseract path (important on Windows or non-standard installs)
# On Windows: pytesseract.pytesseract.pytesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
# On macOS/Linux, if installed via brew/apt, this is automatic — but you can override:
pytesseract.pytesseract.pytesseract_cmd = '/usr/bin/tesseract'  # or '/opt/homebrew/bin/tesseract' on M1 Mac

# Step 2: Verify Tesseract is installed and check available languages
try:
    result = subprocess.run(['tesseract', '--list-langs'], capture_output=True, text=True)
    if 'eng' not in result.stdout:
        raise RuntimeError('English language pack not found. Install: apt-get install tesseract-ocr-eng')
    if 'fra' not in result.stdout:
        raise RuntimeError('French language pack not found. Install: apt-get install tesseract-ocr-fra')
    print('Available languages:', result.stdout.strip())
except FileNotFoundError:
    raise RuntimeError('Tesseract not installed. Install: sudo apt-get install tesseract-ocr')

# Step 3: Load image and extract text
img = Image.open('french_document.png')
text = pytesseract.image_to_string(img, lang='fra')  # ← FIXED: Tesseract and language packs verified
print('Extracted text:', text)

Added explicit Tesseract path configuration, pre-flight verification of Tesseract installation and language pack availability via subprocess, and clear error messages guiding users to install missing components before calling image_to_string().

⚠

Workaround

If you cannot install Tesseract system-wide, use AWS Textract (serverless OCR) via boto3: client.detect_document_text(Document={'Bytes': image_bytes}) bypasses Tesseract entirely and supports 30+ languages natively. Alternatively, use Google Cloud Vision API (google-cloud-vision) or Azure Computer Vision, which are cloud-hosted and pre-configured with all language packs.

✓

Prevention

In production, use Docker with Tesseract baked into the image (RUN apt-get install tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra), or migrate to cloud OCR services (AWS Textract, Google Cloud Vision, Azure Computer Vision) which eliminate local dependency management entirely. For development, document Tesseract installation in README and use environment-specific CI/CD steps (GitHub Actions, GitLab CI) that auto-install dependencies before tests run.

Python 3.8+ · pytesseract >=0.3.0 · tested on 0.3.13

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.