How to Intermediate · 3 min read

How to build vision chatbot with GPT-4o

Q: How to build vision chatbot with GPT-4o

Use the gpt-4o model from the OpenAI API with chat.completions.create to build a vision chatbot by sending messages that include both text and image inputs. The model supports multimodal inputs, allowing you to pass images as URLs or base64-encoded data alongside text prompts for interactive vision-based conversations.

Quick answer

Use the gpt-4o model from the OpenAI API with chat.completions.create to build a vision chatbot by sending messages that include both text and image inputs. The model supports multimodal inputs, allowing you to pass images as URLs or base64-encoded data alongside text prompts for interactive vision-based conversations.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.

bash

pip install openai>=1.0

Step by step

This example shows how to send a text message with an image URL to gpt-4o for a vision chatbot interaction.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "user", "content": "What is in this image?"},
    {"role": "user", "content": {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print(response.choices[0].message.content)

output

A detailed description of the image content, objects, and context.

Common variations

Use base64-encoded images by replacing the image_url with image_base64 in the message content.
Stream responses by adding stream=True to chat.completions.create and iterating over chunks.
Use other multimodal capable models like gpt-4o-mini for smaller footprint.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example with base64 image input
with open("image.jpg", "rb") as f:
    image_data = f.read()
    import base64
    image_b64 = base64.b64encode(image_data).decode("utf-8")

messages = [
    {"role": "user", "content": "Describe this image."},
    {"role": "user", "content": {"type": "image_base64", "image_base64": image_b64}}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print(response.choices[0].message.content)

output

A detailed description of the base64-encoded image content.

Troubleshooting

If you get an authentication error, verify your OPENAI_API_KEY environment variable is set correctly.
If the model does not recognize the image input, ensure the message content uses the correct type field (image_url or image_base64).
For large images, consider resizing or compressing before encoding to avoid request size limits.

Key Takeaways

Use gpt-4o with multimodal message content to build vision chatbots.
Pass images as URLs or base64-encoded data in the messages array alongside text.
Set OPENAI_API_KEY in your environment and install the latest openai SDK.
Streaming and smaller models like gpt-4o-mini offer flexible interaction modes.
Validate image input format and size to avoid API errors.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.