How to Intermediate · 3 min read

Multimodal AI for product support

Quick answer
Use a multimodal model like gpt-4o to process both text and images for product support, enabling AI to understand customer queries with screenshots or photos. This approach enhances troubleshooting by combining natural language understanding with visual context.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your OpenAI API key as an environment variable for secure access.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to send a text query along with an image URL to gpt-4o for multimodal product support. The model can analyze the image and provide relevant troubleshooting advice.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "user", "content": "My product screen shows an error. See the image below."},
    {"role": "user", "content": {"type": "image_url", "image_url": {"url": "https://example.com/error_screenshot.png"}}}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print("Response:", response.choices[0].message.content)
output
Response: The error on your product screen indicates a connectivity issue. Please check your network settings and restart the device.

Common variations

You can use asynchronous calls for better performance or stream responses for real-time feedback. Also, other multimodal-capable models like gemini-2.5-pro can be used similarly by adjusting the model parameter.

python
import asyncio
import os
from openai import OpenAI

async def async_multimodal_support():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "user", "content": "Help with product error, see image."},
        {"role": "user", "content": {"type": "image_url", "image_url": {"url": "https://example.com/error.png"}}}
    ]
    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
    print("Async response:", response.choices[0].message.content)

asyncio.run(async_multimodal_support())
output
Async response: The image shows a hardware fault. Please contact support with the error code displayed.

Troubleshooting

  • If the model does not recognize the image, ensure the image URL is publicly accessible and in a supported format (JPEG, PNG).
  • For authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If responses are incomplete, try increasing max_tokens in the API call.

Key Takeaways

  • Use gpt-4o for multimodal product support combining text and images.
  • Send images as image_url objects in the chat messages for visual context.
  • Async and streaming calls improve responsiveness in production environments.
  • Ensure image URLs are accessible and API keys are properly configured.
  • Adjust max_tokens to control response length for detailed support.
Verified 2026-04 · gpt-4o, gemini-2.5-pro
Verify ↗