How to Intermediate · 3 min read

Fix vision model giving wrong description

Quick answer

To fix a vision model giving wrong descriptions, ensure you use a capable multimodal model like gpt-4o with image input support, provide clear and specific prompts, and preprocess images for clarity. Also, verify the model's context window and update to the latest model version for improved accuracy.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
Basic knowledge of image processing

Setup

Install the openai Python package and set your API key as an environment variable to access the latest multimodal models.

bash

pip install openai>=1.0

Step by step

Use the gpt-4o model with image input. Preprocess the image to improve quality, then send it with a clear prompt asking for a description. Check the response for accuracy.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Load and preprocess image
from PIL import Image

image_path = "input.jpg"
image = Image.open(image_path).convert("RGB")
# Optional: resize or enhance image for clarity
image = image.resize((512, 512))
image.save("processed.jpg")

# Read image bytes
with open("processed.jpg", "rb") as f:
    image_bytes = f.read()

# Create chat completion with image input
messages = [
    {"role": "user", "content": "Describe the content of this image accurately."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    files=[
        {
            "name": "image.jpg",
            "data": image_bytes,
            "mime_type": "image/jpeg"
        }
    ]
)

print(response.choices[0].message.content)

output

A detailed and accurate description of the image content.

Common variations

Use gpt-4o-mini for faster but less detailed descriptions.
Try asynchronous calls with asyncio for batch processing.
Use streaming to get partial descriptions as the model generates them.

python

import asyncio
from openai import OpenAI

async def async_describe_image():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Describe this image."}]
    with open("processed.jpg", "rb") as f:
        image_bytes = f.read()
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        files=[{"name": "image.jpg", "data": image_bytes, "mime_type": "image/jpeg"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_describe_image())

output

A concise description of the image content.

Troubleshooting

If descriptions are inaccurate, improve image quality by preprocessing (resize, enhance contrast).
Make prompts more specific, e.g., "Describe the objects and colors in this image."
Ensure you use the latest gpt-4o or equivalent multimodal model with image input support.
Check for API errors or token limits that might truncate responses.

✅

Key Takeaways

Use a capable multimodal model like gpt-4o for accurate image descriptions.
Preprocess images to improve clarity before sending to the model.
Craft clear, specific prompts to guide the model's description output.
Update to the latest model versions to benefit from improved vision capabilities.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗