How to beginner · 3 min read

How to use Together AI with LiteLLM

Quick answer
Use the openai Python SDK with base_url set to Together AI's endpoint and your API key from environment variables. Instantiate the OpenAI client with base_url="https://api.together.xyz/v1" and call chat.completions.create with the Together AI model name to generate completions.

PREREQUISITES

  • Python 3.8+
  • Together AI API key
  • pip install openai>=1.0

Setup

Install the openai Python package (version 1.0 or higher) and set your Together AI API key as an environment variable TOGETHER_API_KEY. This setup uses the OpenAI-compatible SDK to interact with Together AI's API endpoint.

bash
pip install openai>=1.0

Step by step

Use the OpenAI SDK with base_url set to Together AI's API endpoint. Create a client, then call chat.completions.create with the Together AI model and your prompt. The example below shows a complete runnable script.

python
import os
from openai import OpenAI

# Initialize client with Together AI base URL and API key from environment
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")

# Define the model and messages
model_name = "meta-llama/Llama-3.3-70B-Instruct-Turbo"
messages = [{"role": "user", "content": "Explain how to use Together AI with LiteLLM."}]

# Create chat completion
response = client.chat.completions.create(model=model_name, messages=messages)

# Extract and print the response text
print(response.choices[0].message.content)
output
Explain how to use Together AI with LiteLLM by configuring the OpenAI-compatible SDK with Together's API endpoint and your API key. Instantiate the client, specify the model, and send your prompt to receive completions.

Common variations

You can use asynchronous calls with async and await if your environment supports it. Also, you can switch models by changing the model parameter to other Together AI models. Streaming responses are supported by setting stream=True in the request.

python
import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
    model_name = "meta-llama/Llama-3.3-70B-Instruct-Turbo"
    messages = [{"role": "user", "content": "Stream a response from Together AI."}]

    # Streaming example
    stream = client.chat.completions.create(model=model_name, messages=messages, stream=True)
    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())
output
Streaming AI response text printed token by token in real time...

Troubleshooting

  • If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
  • If you receive model not found errors, confirm the model name matches Together AI's current offerings.
  • For network issues, check your internet connection and firewall settings.

Key Takeaways

  • Use the OpenAI SDK with base_url="https://api.together.xyz/v1" to connect to Together AI.
  • Always load your API key from environment variables for security.
  • Together AI supports streaming and async calls via the OpenAI-compatible SDK.
  • Model names must match Together AI's current catalog exactly.
  • Check environment and network settings if you encounter authentication or connectivity errors.
Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo
Verify ↗