Code beginner · 3 min read

How to use GPT-4o vision in Python

Q: How to use GPT-4o vision in Python

Use the gpt-4o model with the OpenAI Python SDK by sending a chat completion request including both text and image inputs in the messages array to leverage GPT-4o's vision capabilities.

Direct answer

Use the gpt-4o model with the OpenAI Python SDK by sending a chat completion request including both text and image inputs in the messages array to leverage GPT-4o's vision capabilities.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

import os
from openai import OpenAI

Examples

inAnalyze the content of this image and describe it.

outThe image shows a scenic mountain landscape with a clear blue sky and a lake in the foreground.

inWhat objects are in this photo?

outThe photo contains a dog playing with a ball in a grassy park.

inIs there any text in this image? If yes, transcribe it.

outYes, the image contains the text 'Welcome to AI Conference 2026'.

Integration steps

Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY.
Import OpenAI from the openai package and initialize the client with your API key.
Prepare the messages list including a user message with text and an image URL or base64-encoded image data.
Call client.chat.completions.create with model='gpt-4o' and the prepared messages.
Extract the response text from response.choices[0].message.content to get the model's interpretation of the image.
Handle or display the multimodal output as needed in your application.

Full code

python

import os
from openai import OpenAI

# Initialize client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example image URL to analyze
image_url = "https://images.unsplash.com/photo-1506744038136-46273834b3fb"

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe the content of this image."},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print("Model response:")
print(response.choices[0].message.content)

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": [{"type": "text", "text": "Describe the content of this image."}, {"type": "image_url", "image_url": {"url": "https://images.unsplash.com/photo-1506744038136-46273834b3fb"}}]}]}

Response

json

{"choices": [{"message": {"content": "The image depicts a beautiful mountain landscape with a clear blue sky..."}}], "usage": {"total_tokens": 150}}

Extractresponse.choices[0].message.content

Variants

Streaming GPT-4o Vision Response ›

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

image_url = "https://images.unsplash.com/photo-1506744038136-46273834b3fb"

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe the content of this image."},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }
]

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

print("Streaming response:")
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
print()

Async GPT-4o Vision Call ›

python

import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    image_url = "https://images.unsplash.com/photo-1506744038136-46273834b3fb"
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the content of this image."},
                {"type": "image_url", "image_url": {"url": image_url}}
            ]
        }
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages
    )
    print("Async response:")
    print(response.choices[0].message.content)

asyncio.run(main())

Use GPT-4o-mini Vision Model ›

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

image_url = "https://images.unsplash.com/photo-1506744038136-46273834b3fb"

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Summarize this image."},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

print("Mini model response:")
print(response.choices[0].message.content)

Performance

Latency~1.2 seconds per request for typical image + text input on gpt-4o

Cost~$0.003 per 1K tokens plus a small surcharge for image input on gpt-4o

Rate limitsTier 1: 300 RPM / 18K TPM for gpt-4o

Compress or resize images before encoding to reduce payload size.
Use concise prompts to minimize token usage.
Cache frequent image analyses to avoid repeated calls.

Approach	Latency	Cost/call	Best for
Standard gpt-4o vision call	~1.2s	~$0.003 per 1K tokens + image	High-quality multimodal understanding
Streaming response	~1.2s start + incremental	Same as standard	Interactive apps needing token-by-token output
Async call	~1.2s concurrent	Same as standard	Concurrent or async frameworks
gpt-4o-mini vision	~0.6s	~$0.001 per 1K tokens + image	Cost-sensitive or lower latency use cases

✓

Quick tip

Always include the image as a structured object in the <code>messages</code> array with <code>type": "image_url"</code> or <code>type": "image_base64"</code> to enable GPT-4o vision processing.

⚠

Common mistake

Beginners often send images as plain text URLs instead of embedding them as structured message content with the correct <code>type</code> field, causing the model to ignore the image.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.