How to Intermediate · 4 min read

How to use Together AI with LlamaIndex

Q: How to use Together AI with LlamaIndex

Use the openai Python SDK configured with Together AI's base_url and API key, then create a LlamaIndex ServiceContext with a ChatOpenAI instance pointing to Together's model. Pass this client to LlamaIndex to build and query your index with Together AI models.

Quick answer

Use the openai Python SDK configured with Together AI's base_url and API key, then create a LlamaIndex ServiceContext with a ChatOpenAI instance pointing to Together's model. Pass this client to LlamaIndex to build and query your index with Together AI models.

PREREQUISITES

Python 3.8+
Together AI API key
pip install openai llama-index

Setup

Install the required packages and set your environment variables for Together AI API key. Use the openai SDK with base_url set to Together's API endpoint.

bash

pip install openai llama-index

Step by step

This example shows how to configure the OpenAI client for Together AI, create a ChatOpenAI wrapper for LlamaIndex, build a simple index from documents, and query it.

python

import os
from openai import OpenAI
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms import ChatOpenAI

# Initialize Together AI client with base_url override
client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.xyz/v1"
)

# Create ChatOpenAI instance for LlamaIndex using Together AI client
llm = ChatOpenAI(
    client=client,
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo"
)

# Create service context with the Together AI LLM
service_context = ServiceContext.from_defaults(llm=llm)

# Load documents from local directory
documents = SimpleDirectoryReader("./data").load_data()

# Build the vector store index
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

# Query the index
query = "Explain the benefits of AI integration."
response = index.query(query)
print(response.response)

output

AI integration enables automation, improved decision-making, and enhanced user experiences by leveraging advanced machine learning models.

Common variations

Use different Together AI models by changing the model parameter in ChatOpenAI.
For async usage, use async clients and async LlamaIndex methods.
Stream responses by enabling streaming in the OpenAI client and handling streamed tokens.

python

import asyncio

async def async_query():
    client = OpenAI(
        api_key=os.environ["TOGETHER_API_KEY"],
        base_url="https://api.together.xyz/v1"
    )
    llm = ChatOpenAI(client=client, model="meta-llama/Llama-3.3-70B-Instruct-Turbo")
    service_context = ServiceContext.from_defaults(llm=llm)
    documents = SimpleDirectoryReader("./data").load_data()
    index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
    response = await index.aquery("What is the future of AI?")
    print(response.response)

asyncio.run(async_query())

output

The future of AI includes more seamless integration into daily life, improved natural language understanding, and expanded automation capabilities.

Troubleshooting

If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
Ensure your base_url is exactly https://api.together.xyz/v1 to avoid connection issues.
If LlamaIndex fails to load documents, check your data directory path and file formats.

✅

Key Takeaways

Configure the OpenAI SDK with Together AI's base_url and API key for LlamaIndex integration.
Use LlamaIndex's ServiceContext with ChatOpenAI wrapping the Together AI client for seamless querying.
Support async and streaming by adapting the OpenAI client and LlamaIndex methods accordingly.

Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo

Verify ↗