How to beginner · 3 min read

How to extract text from image with GPT-4o

Q: How to extract text from image with GPT-4o

Use the gpt-4o model's multimodal input feature by sending the image as a base64-encoded string in the messages payload with image_url or image_base64 fields. The model will process the image and return extracted text in the response content.

Quick answer

Use the gpt-4o model's multimodal input feature by sending the image as a base64-encoded string in the messages payload with image_url or image_base64 fields. The model will process the image and return extracted text in the response content.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

This example shows how to send an image file as base64 to gpt-4o for text extraction. The response contains the extracted text.

python

import os
import base64
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Load image and encode as base64
with open("image.png", "rb") as image_file:
    image_base64 = base64.b64encode(image_file.read()).decode("utf-8")

messages = [
    {
        "role": "user",
        "content": "Extract the text from this image.",
        "image_base64": image_base64
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print("Extracted text:", response.choices[0].message.content)

output

Extracted text: "OpenAI GPT-4o multimodal text extraction example."

Common variations

You can also provide an image_url instead of base64 if the image is hosted online.
Use gpt-4o-mini for faster, smaller models with similar multimodal support.
For async usage, use asyncio with the OpenAI SDK's async client methods.

Troubleshooting

If you get an error about unsupported image format, ensure your image is PNG, JPEG, or a supported format.
If the response is empty or unrelated, verify the image_base64 encoding and that the model is gpt-4o with multimodal enabled.
Check your API key and usage limits if requests fail.

✅

Key Takeaways

Use gpt-4o model's multimodal input by sending images as base64 or URLs in messages.
The model returns extracted text in the content field of the response message.
You can switch between base64 or URL image inputs depending on your use case.
Ensure your environment has the OpenAI SDK installed and API key set via environment variables.
Check image format and encoding if extraction results are incorrect or errors occur.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗