How to beginner · 3 min read

How to use Docling for document parsing

Quick answer
Use the docling Python SDK to parse documents by installing the package, setting your API key via environment variables, and calling DoclingClient.parse_document() with your file. This method extracts structured data and text from PDFs, images, or scanned documents efficiently.

PREREQUISITES

  • Python 3.8+
  • Docling API key
  • pip install docling

Setup

Install the docling Python package and set your API key as an environment variable for authentication.

bash
pip install docling

Step by step

Use the DoclingClient to parse a document file and extract structured content. The example below shows how to parse a PDF and print the extracted text.

python
import os
from docling import DoclingClient

# Set your Docling API key in environment variable DOCLING_API_KEY
client = DoclingClient(api_key=os.environ["DOCLING_API_KEY"])

# Path to your document file
file_path = "sample_document.pdf"

# Parse the document
response = client.parse_document(file_path)

# Print extracted text content
print("Extracted text:")
print(response.text)

# Optionally, access structured fields
if hasattr(response, 'fields'):
    print("Extracted fields:")
    for field, value in response.fields.items():
        print(f"{field}: {value}")
output
Extracted text:
This is the text content extracted from the document.
Extracted fields:
InvoiceNumber: 12345
Date: 2026-04-01
TotalAmount: $250.00

Common variations

You can parse different document types such as images or scanned PDFs by passing the file path or bytes. Docling supports asynchronous parsing and batch processing for multiple documents.

python
import asyncio
from docling import DoclingClient

async def async_parse():
    client = DoclingClient(api_key=os.environ["DOCLING_API_KEY"])
    response = await client.parse_document_async("invoice_scan.jpg")
    print("Async extracted text:", response.text)

asyncio.run(async_parse())
output
Async extracted text: This is the text content extracted from the scanned image.

Troubleshooting

  • If you get authentication errors, verify your DOCLING_API_KEY environment variable is set correctly.
  • For file not found errors, check the file path and permissions.
  • If parsing results are empty, ensure the document is supported and not corrupted.

Key Takeaways

  • Install the docling package and set your API key via environment variables before use.
  • Use DoclingClient.parse_document() for synchronous parsing of PDFs and images.
  • Leverage asynchronous parsing with parse_document_async() for improved performance on multiple files.
  • Check for authentication and file path issues if parsing fails or returns empty results.
Verified 2026-04 · DoclingClient
Verify ↗