How to use Llama multimodal
Quick answer
Use the
OpenAI Python SDK with a multimodal-capable Llama model by sending chat.completions.create requests that include both text and image messages. Specify the model like llama-3.3-70b-versatile and include images as base64-encoded strings or URLs in the message content.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official openai Python package version 1.0 or higher and set your OpenAI API key as an environment variable.
- Install package:
pip install openai>=1.0 - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai>=1.0 Step by step
Use the OpenAI SDK to send a chat completion request to a Llama multimodal model like llama-3.3-70b-versatile. Include both text and image messages in the messages array. Images should be base64-encoded or URLs wrapped in a JSON object with type and image_url or image_base64 keys.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example base64 image placeholder (replace with actual base64 string)
image_base64 = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
messages = [
{"role": "user", "content": "Describe this image and answer my question."},
{
"role": "user",
"content": {
"type": "image_base64",
"image_base64": image_base64
}
}
]
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=messages
)
print(response.choices[0].message.content) output
A detailed description of the image and answer to the question.
Common variations
- Use image URLs: Instead of base64, send images as URLs with
{"type": "image_url", "image_url": "https://example.com/image.png"}. - Streaming: Use
stream=Trueinchat.completions.createto receive partial responses. - Async calls: Use async/await with the OpenAI SDK's async client methods for concurrency.
- Different models: Use other Llama multimodal models like
llama-3.1-8b-instructif available.
import asyncio
import os
from openai import OpenAI
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [
{"role": "user", "content": "Analyze this image."},
{"role": "user", "content": {"type": "image_url", "image_url": "https://example.com/image.png"}}
]
response = await client.chat.completions.acreate(
model="llama-3.3-70b-versatile",
messages=messages,
stream=True
)
async for chunk in response:
print(chunk.choices[0].message.content, end="")
asyncio.run(main()) output
Partial streamed output printed progressively.
Troubleshooting
- If you get
InvalidRequestErrorabout message format, ensure images are wrapped in{"type": "image_url"}or{"type": "image_base64"}JSON objects. - If the model is not found, verify your API key has access to Llama multimodal models and the model name is correct.
- For large images, reduce size or use URLs instead of base64 to avoid payload limits.
Key Takeaways
- Use the OpenAI Python SDK with model "llama-3.3-70b-versatile" for multimodal input.
- Include images as base64 or URLs in message content with explicit type keys.
- Streaming and async calls improve responsiveness for large multimodal requests.