How to Intermediate · 3 min read

How to use Llama multimodal

Quick answer
Use the OpenAI Python SDK with a multimodal-capable Llama model by sending chat.completions.create requests that include both text and image messages. Specify the model like llama-3.3-70b-versatile and include images as base64-encoded strings or URLs in the message content.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official openai Python package version 1.0 or higher and set your OpenAI API key as an environment variable.

  • Install package: pip install openai>=1.0
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai>=1.0

Step by step

Use the OpenAI SDK to send a chat completion request to a Llama multimodal model like llama-3.3-70b-versatile. Include both text and image messages in the messages array. Images should be base64-encoded or URLs wrapped in a JSON object with type and image_url or image_base64 keys.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example base64 image placeholder (replace with actual base64 string)
image_base64 = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."

messages = [
    {"role": "user", "content": "Describe this image and answer my question."},
    {
        "role": "user",
        "content": {
            "type": "image_base64",
            "image_base64": image_base64
        }
    }
]

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages
)

print(response.choices[0].message.content)
output
A detailed description of the image and answer to the question.

Common variations

  • Use image URLs: Instead of base64, send images as URLs with {"type": "image_url", "image_url": "https://example.com/image.png"}.
  • Streaming: Use stream=True in chat.completions.create to receive partial responses.
  • Async calls: Use async/await with the OpenAI SDK's async client methods for concurrency.
  • Different models: Use other Llama multimodal models like llama-3.1-8b-instruct if available.
python
import asyncio
import os
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "user", "content": "Analyze this image."},
        {"role": "user", "content": {"type": "image_url", "image_url": "https://example.com/image.png"}}
    ]
    response = await client.chat.completions.acreate(
        model="llama-3.3-70b-versatile",
        messages=messages,
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].message.content, end="")

asyncio.run(main())
output
Partial streamed output printed progressively.

Troubleshooting

  • If you get InvalidRequestError about message format, ensure images are wrapped in {"type": "image_url"} or {"type": "image_base64"} JSON objects.
  • If the model is not found, verify your API key has access to Llama multimodal models and the model name is correct.
  • For large images, reduce size or use URLs instead of base64 to avoid payload limits.

Key Takeaways

  • Use the OpenAI Python SDK with model "llama-3.3-70b-versatile" for multimodal input.
  • Include images as base64 or URLs in message content with explicit type keys.
  • Streaming and async calls improve responsiveness for large multimodal requests.
Verified 2026-04 · llama-3.3-70b-versatile, llama-3.1-8b-instruct
Verify ↗