How to beginner · 3 min read

How to use Together AI with LiteLLM

Q: How to use Together AI with LiteLLM

Use the openai Python SDK with base_url set to Together AI's endpoint and your API key from environment variables. Instantiate the OpenAI client with base_url="https://api.together.xyz/v1" and call chat.completions.create with the Together AI model name to generate completions.

Quick answer

Use the openai Python SDK with base_url set to Together AI's endpoint and your API key from environment variables. Instantiate the OpenAI client with base_url="https://api.together.xyz/v1" and call chat.completions.create with the Together AI model name to generate completions.

PREREQUISITES

Python 3.8+
Together AI API key
pip install openai>=1.0

Setup

Install the openai Python package (version 1.0 or higher) and set your Together AI API key as an environment variable TOGETHER_API_KEY. This setup uses the OpenAI-compatible SDK to interact with Together AI's API endpoint.

bash

pip install openai>=1.0

Step by step

Use the OpenAI SDK with base_url set to Together AI's API endpoint. Create a client, then call chat.completions.create with the Together AI model and your prompt. The example below shows a complete runnable script.

python

import os
from openai import OpenAI

# Initialize client with Together AI base URL and API key from environment
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")

# Define the model and messages
model_name = "meta-llama/Llama-3.3-70B-Instruct-Turbo"
messages = [{"role": "user", "content": "Explain how to use Together AI with LiteLLM."}]

# Create chat completion
response = client.chat.completions.create(model=model_name, messages=messages)

# Extract and print the response text
print(response.choices[0].message.content)

output

Explain how to use Together AI with LiteLLM by configuring the OpenAI-compatible SDK with Together's API endpoint and your API key. Instantiate the client, specify the model, and send your prompt to receive completions.

Common variations

You can use asynchronous calls with async and await if your environment supports it. Also, you can switch models by changing the model parameter to other Together AI models. Streaming responses are supported by setting stream=True in the request.

python

import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
    model_name = "meta-llama/Llama-3.3-70B-Instruct-Turbo"
    messages = [{"role": "user", "content": "Stream a response from Together AI."}]

    # Streaming example
    stream = client.chat.completions.create(model=model_name, messages=messages, stream=True)
    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())

output

Streaming AI response text printed token by token in real time...

Troubleshooting

If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
If you receive model not found errors, confirm the model name matches Together AI's current offerings.
For network issues, check your internet connection and firewall settings.

✅

Key Takeaways

Use the OpenAI SDK with base_url="https://api.together.xyz/v1" to connect to Together AI.
Always load your API key from environment variables for security.
Together AI supports streaming and async calls via the OpenAI-compatible SDK.
Model names must match Together AI's current catalog exactly.
Check environment and network settings if you encounter authentication or connectivity errors.

Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo

Verify ↗