How to Intermediate · 3 min read

How to use Llama multimodal

Q: How to use Llama multimodal

Use the OpenAI Python SDK with a multimodal-capable Llama model by sending chat.completions.create requests that include both text and image messages. Specify the model like llama-3.3-70b-versatile and include images as base64-encoded strings or URLs in the message content.

Quick answer

Use the OpenAI Python SDK with a multimodal-capable Llama model by sending chat.completions.create requests that include both text and image messages. Specify the model like llama-3.3-70b-versatile and include images as base64-encoded strings or URLs in the message content.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package version 1.0 or higher and set your OpenAI API key as an environment variable.

Install package: pip install openai>=1.0
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai>=1.0

Step by step

Use the OpenAI SDK to send a chat completion request to a Llama multimodal model like llama-3.3-70b-versatile. Include both text and image messages in the messages array. Images should be base64-encoded or URLs wrapped in a JSON object with type and image_url or image_base64 keys.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example base64 image placeholder (replace with actual base64 string)
image_base64 = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."

messages = [
    {"role": "user", "content": "Describe this image and answer my question."},
    {
        "role": "user",
        "content": {
            "type": "image_base64",
            "image_base64": image_base64
        }
    }
]

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages
)

print(response.choices[0].message.content)

output

A detailed description of the image and answer to the question.

Common variations

Use image URLs: Instead of base64, send images as URLs with {"type": "image_url", "image_url": "https://example.com/image.png"}.
Streaming: Use stream=True in chat.completions.create to receive partial responses.
Async calls: Use async/await with the OpenAI SDK's async client methods for concurrency.
Different models: Use other Llama multimodal models like llama-3.1-8b-instruct if available.

python

import asyncio
import os
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "user", "content": "Analyze this image."},
        {"role": "user", "content": {"type": "image_url", "image_url": "https://example.com/image.png"}}
    ]
    response = await client.chat.completions.acreate(
        model="llama-3.3-70b-versatile",
        messages=messages,
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].message.content, end="")

asyncio.run(main())

output

Partial streamed output printed progressively.

Troubleshooting

If you get InvalidRequestError about message format, ensure images are wrapped in {"type": "image_url"} or {"type": "image_base64"} JSON objects.
If the model is not found, verify your API key has access to Llama multimodal models and the model name is correct.
For large images, reduce size or use URLs instead of base64 to avoid payload limits.

✅

Key Takeaways

Use the OpenAI Python SDK with model "llama-3.3-70b-versatile" for multimodal input.
Include images as base64 or URLs in message content with explicit type keys.
Streaming and async calls improve responsiveness for large multimodal requests.

Verified 2026-04 · llama-3.3-70b-versatile, llama-3.1-8b-instruct

Verify ↗