How to use Together AI with LlamaIndex
Quick answer
Use the
openai Python SDK configured with Together AI's base_url and API key, then create a LlamaIndex ServiceContext with a ChatOpenAI instance pointing to Together's model. Pass this client to LlamaIndex to build and query your index with Together AI models.PREREQUISITES
Python 3.8+Together AI API keypip install openai llama-index
Setup
Install the required packages and set your environment variables for Together AI API key. Use the openai SDK with base_url set to Together's API endpoint.
pip install openai llama-index Step by step
This example shows how to configure the OpenAI client for Together AI, create a ChatOpenAI wrapper for LlamaIndex, build a simple index from documents, and query it.
import os
from openai import OpenAI
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.llms import ChatOpenAI
# Initialize Together AI client with base_url override
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
# Create ChatOpenAI instance for LlamaIndex using Together AI client
llm = ChatOpenAI(
client=client,
model="meta-llama/Llama-3.3-70B-Instruct-Turbo"
)
# Create service context with the Together AI LLM
service_context = ServiceContext.from_defaults(llm=llm)
# Load documents from local directory
documents = SimpleDirectoryReader("./data").load_data()
# Build the vector store index
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
# Query the index
query = "Explain the benefits of AI integration."
response = index.query(query)
print(response.response) output
AI integration enables automation, improved decision-making, and enhanced user experiences by leveraging advanced machine learning models.
Common variations
- Use different Together AI models by changing the
modelparameter inChatOpenAI. - For async usage, use async clients and async LlamaIndex methods.
- Stream responses by enabling streaming in the
OpenAIclient and handling streamed tokens.
import asyncio
async def async_query():
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
llm = ChatOpenAI(client=client, model="meta-llama/Llama-3.3-70B-Instruct-Turbo")
service_context = ServiceContext.from_defaults(llm=llm)
documents = SimpleDirectoryReader("./data").load_data()
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
response = await index.aquery("What is the future of AI?")
print(response.response)
asyncio.run(async_query()) output
The future of AI includes more seamless integration into daily life, improved natural language understanding, and expanded automation capabilities.
Troubleshooting
- If you get authentication errors, verify your
TOGETHER_API_KEYenvironment variable is set correctly. - Ensure your
base_urlis exactlyhttps://api.together.xyz/v1to avoid connection issues. - If LlamaIndex fails to load documents, check your data directory path and file formats.
Key Takeaways
- Configure the OpenAI SDK with Together AI's base_url and API key for LlamaIndex integration.
- Use LlamaIndex's ServiceContext with ChatOpenAI wrapping the Together AI client for seamless querying.
- Support async and streaming by adapting the OpenAI client and LlamaIndex methods accordingly.