How to extract text from image with GPT-4o
Quick answer
Use the
gpt-4o model's multimodal input feature by sending the image as a base64-encoded string in the messages payload with image_url or image_base64 fields. The model will process the image and return extracted text in the response content.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable.
pip install openai>=1.0 Step by step
This example shows how to send an image file as base64 to gpt-4o for text extraction. The response contains the extracted text.
import os
import base64
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Load image and encode as base64
with open("image.png", "rb") as image_file:
image_base64 = base64.b64encode(image_file.read()).decode("utf-8")
messages = [
{
"role": "user",
"content": "Extract the text from this image.",
"image_base64": image_base64
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print("Extracted text:", response.choices[0].message.content) output
Extracted text: "OpenAI GPT-4o multimodal text extraction example."
Common variations
- You can also provide an
image_urlinstead of base64 if the image is hosted online. - Use
gpt-4o-minifor faster, smaller models with similar multimodal support. - For async usage, use
asynciowith the OpenAI SDK's async client methods.
Troubleshooting
- If you get an error about unsupported image format, ensure your image is PNG, JPEG, or a supported format.
- If the response is empty or unrelated, verify the
image_base64encoding and that the model isgpt-4owith multimodal enabled. - Check your API key and usage limits if requests fail.
Key Takeaways
- Use
gpt-4omodel's multimodal input by sending images as base64 or URLs inmessages. - The model returns extracted text in the
contentfield of the response message. - You can switch between base64 or URL image inputs depending on your use case.
- Ensure your environment has the OpenAI SDK installed and API key set via environment variables.
- Check image format and encoding if extraction results are incorrect or errors occur.