How to build vision chatbot with GPT-4o
Quick answer
Use the
gpt-4o model from the OpenAI API with chat.completions.create to build a vision chatbot by sending messages that include both text and image inputs. The model supports multimodal inputs, allowing you to pass images as URLs or base64-encoded data alongside text prompts for interactive vision-based conversations.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.
pip install openai>=1.0 Step by step
This example shows how to send a text message with an image URL to gpt-4o for a vision chatbot interaction.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [
{"role": "user", "content": "What is in this image?"},
{"role": "user", "content": {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print(response.choices[0].message.content) output
A detailed description of the image content, objects, and context.
Common variations
- Use base64-encoded images by replacing the
image_urlwithimage_base64in the message content. - Stream responses by adding
stream=Truetochat.completions.createand iterating over chunks. - Use other multimodal capable models like
gpt-4o-minifor smaller footprint.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example with base64 image input
with open("image.jpg", "rb") as f:
image_data = f.read()
import base64
image_b64 = base64.b64encode(image_data).decode("utf-8")
messages = [
{"role": "user", "content": "Describe this image."},
{"role": "user", "content": {"type": "image_base64", "image_base64": image_b64}}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print(response.choices[0].message.content) output
A detailed description of the base64-encoded image content.
Troubleshooting
- If you get an authentication error, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If the model does not recognize the image input, ensure the message content uses the correct
typefield (image_urlorimage_base64). - For large images, consider resizing or compressing before encoding to avoid request size limits.
Key Takeaways
- Use
gpt-4owith multimodal message content to build vision chatbots. - Pass images as URLs or base64-encoded data in the messages array alongside text.
- Set
OPENAI_API_KEYin your environment and install the latestopenaiSDK. - Streaming and smaller models like
gpt-4o-minioffer flexible interaction modes. - Validate image input format and size to avoid API errors.