How to Intermediate · 4 min read

How to use Together AI with LlamaIndex

Quick answer
Use the openai Python SDK configured with Together AI's base_url and API key, then create a LlamaIndex ServiceContext with a ChatOpenAI instance pointing to Together's model. Pass this client to LlamaIndex to build and query your index with Together AI models.

PREREQUISITES

  • Python 3.8+
  • Together AI API key
  • pip install openai llama-index

Setup

Install the required packages and set your environment variables for Together AI API key. Use the openai SDK with base_url set to Together's API endpoint.

bash
pip install openai llama-index

Step by step

This example shows how to configure the OpenAI client for Together AI, create a ChatOpenAI wrapper for LlamaIndex, build a simple index from documents, and query it.

python
import os
from openai import OpenAI
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms import ChatOpenAI

# Initialize Together AI client with base_url override
client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.xyz/v1"
)

# Create ChatOpenAI instance for LlamaIndex using Together AI client
llm = ChatOpenAI(
    client=client,
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo"
)

# Create service context with the Together AI LLM
service_context = ServiceContext.from_defaults(llm=llm)

# Load documents from local directory
documents = SimpleDirectoryReader("./data").load_data()

# Build the vector store index
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

# Query the index
query = "Explain the benefits of AI integration."
response = index.query(query)
print(response.response)
output
AI integration enables automation, improved decision-making, and enhanced user experiences by leveraging advanced machine learning models.

Common variations

  • Use different Together AI models by changing the model parameter in ChatOpenAI.
  • For async usage, use async clients and async LlamaIndex methods.
  • Stream responses by enabling streaming in the OpenAI client and handling streamed tokens.
python
import asyncio

async def async_query():
    client = OpenAI(
        api_key=os.environ["TOGETHER_API_KEY"],
        base_url="https://api.together.xyz/v1"
    )
    llm = ChatOpenAI(client=client, model="meta-llama/Llama-3.3-70B-Instruct-Turbo")
    service_context = ServiceContext.from_defaults(llm=llm)
    documents = SimpleDirectoryReader("./data").load_data()
    index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
    response = await index.aquery("What is the future of AI?")
    print(response.response)

asyncio.run(async_query())
output
The future of AI includes more seamless integration into daily life, improved natural language understanding, and expanded automation capabilities.

Troubleshooting

  • If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
  • Ensure your base_url is exactly https://api.together.xyz/v1 to avoid connection issues.
  • If LlamaIndex fails to load documents, check your data directory path and file formats.

Key Takeaways

  • Configure the OpenAI SDK with Together AI's base_url and API key for LlamaIndex integration.
  • Use LlamaIndex's ServiceContext with ChatOpenAI wrapping the Together AI client for seamless querying.
  • Support async and streaming by adapting the OpenAI client and LlamaIndex methods accordingly.
Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo
Verify ↗