Code beginner · 3 min read

How to use pdfplumber in Python

Q: How to use pdfplumber in Python

Use the pdfplumber library in Python to open PDF files and extract text or tables by iterating over pages and calling methods like page.extract_text() or page.extract_table().

Direct answer

Use the pdfplumber library in Python to open PDF files and extract text or tables by iterating over pages and calling methods like page.extract_text() or page.extract_table().

Setup

Install

bash

pip install pdfplumber

Imports

python

import pdfplumber
import os

Examples

inExtract text from a single-page PDF named 'sample.pdf'.

outExtracted text printed from the first page of 'sample.pdf'.

inExtract all text from a multi-page PDF 'report.pdf'.

outConcatenated text from all pages of 'report.pdf' printed.

inExtract tables from 'data.pdf' and print them as lists.

outList of tables extracted from each page printed.

Integration steps

Install pdfplumber via pip.
Import pdfplumber in your Python script.
Open the PDF file using pdfplumber.open().
Iterate over pages to extract text or tables.
Process or print the extracted content.
Close the PDF file after extraction.

Full code

python

import pdfplumber

pdf_path = "sample.pdf"

with pdfplumber.open(pdf_path) as pdf:
    # Extract text from all pages
    full_text = ""
    for page in pdf.pages:
        text = page.extract_text()
        if text:
            full_text += text + "\n"

print("Extracted Text:")
print(full_text)

# Example: Extract tables from first page
with pdfplumber.open(pdf_path) as pdf:
    first_page = pdf.pages[0]
    tables = first_page.extract_tables()
    print("Extracted Tables:")
    for table in tables:
        for row in table:
            print(row)

output

Extracted Text:
This is the text content extracted from sample.pdf.

Extracted Tables:
['Header1', 'Header2', 'Header3']
['Row1Col1', 'Row1Col2', 'Row1Col3']
['Row2Col1', 'Row2Col2', 'Row2Col3']

API trace

Request

json

No API request; pdfplumber is a local Python library that reads PDF files directly.

Response

json

Returns Python objects: page objects with methods like extract_text() returning strings, extract_tables() returning lists of lists.

ExtractCall page.extract_text() for text or page.extract_tables() for tables on each pdfplumber page object.

Variants

Extract text from a specific page only ›

When you only need text from a specific page instead of the entire document.

python

import pdfplumber

pdf_path = "sample.pdf"
page_number = 2  # zero-based index

with pdfplumber.open(pdf_path) as pdf:
    if page_number < len(pdf.pages):
        page = pdf.pages[page_number]
        text = page.extract_text()
        print(f"Text from page {page_number + 1}:")
        print(text)
    else:
        print("Page number out of range.")

Extract tables from all pages ›

When you want to extract and process tables from every page in a PDF.

python

import pdfplumber

pdf_path = "data.pdf"

with pdfplumber.open(pdf_path) as pdf:
    for i, page in enumerate(pdf.pages):
        tables = page.extract_tables()
        print(f"Tables on page {i + 1}:")
        for table in tables:
            for row in table:
                print(row)
            print("---")

Extract text with layout information ›

When you need detailed character-level layout or position data from the PDF.

python

import pdfplumber

pdf_path = "sample.pdf"

with pdfplumber.open(pdf_path) as pdf:
    page = pdf.pages[0]
    chars = page.chars  # list of character dicts with position info
    for char in chars:
        print(f"Char: {char['text']} at ({char['x0']}, {char['top']})")

Performance

Latency~100-500ms per page depending on PDF complexity and system speed

CostFree, open-source library with no API usage costs

Rate limitsNone, runs locally without network calls

Extract only needed pages to reduce processing time.
Avoid extracting images or complex objects if only text is needed.
Cache extracted text if processing the same PDF multiple times.

Approach	Latency	Cost/call	Best for
Full document text extraction	~300ms per page	Free	Complete text extraction
Single page extraction	~100ms	Free	Quick access to specific page text
Table extraction	~400ms per page	Free	Extracting structured tables from PDFs

✓

Quick tip

Use <code>with pdfplumber.open()</code> context manager to ensure files are properly closed after extraction.

⚠

Common mistake

Forgetting to check if <code>page.extract_text()</code> returns None, which happens if the page has no extractable text.

Verified 2026-04

Verify ↗