How to install LlamaIndex in python
Direct answer
Install
llama-index in Python using the pip command pip install llama-index to integrate LlamaIndex into your AI projects.Setup
Install
pip install llama-index Imports
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, PromptHelper
from langchain_openai import OpenAI
import os Examples
inpip install llama-index
outSuccessfully installed llama-index-0.6.15
infrom llama_index import GPTVectorStoreIndex
index = GPTVectorStoreIndex.from_documents(documents)
outIndex created successfully from documents
inimport llama_index
print(llama_index.__version__)
out0.6.15
Integration steps
- Run the pip install command to add llama-index to your Python environment
- Import necessary classes from llama_index and langchain_openai
- Set up your OpenAI API key in the environment variables
- Load or create documents to index
- Create an index instance using GPTVectorStoreIndex or other index classes
- Use the index to query or retrieve information
Full code
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI
# Ensure your OpenAI API key is set in environment variables
# export OPENAI_API_KEY=os.environ["ANTHROPIC_API_KEY"]
# Load documents from a directory
documents = SimpleDirectoryReader('data').load_data()
# Initialize OpenAI client
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Create an index from documents
index = GPTVectorStoreIndex.from_documents(documents, llm=client)
# Query the index
query = "What is LlamaIndex?"
response = index.query(query)
print("Query:", query)
print("Response:", response.response) output
Query: What is LlamaIndex? Response: LlamaIndex is a data framework that helps you connect your external data to large language models for efficient retrieval and querying.
API trace
Request
{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is LlamaIndex?"}]} Response
{"choices": [{"message": {"content": "LlamaIndex is a data framework that helps you connect your external data to large language models."}}]} Extract
response.choices[0].message.contentVariants
Async version ›
Use when integrating LlamaIndex in asynchronous Python applications for non-blocking calls.
import os
import asyncio
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI
async def main():
documents = SimpleDirectoryReader('data').load_data()
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
index = GPTVectorStoreIndex.from_documents(documents, llm=client)
query = "Explain LlamaIndex asynchronously."
response = await index.aquery(query)
print("Async Query:", query)
print("Async Response:", response.response)
if __name__ == '__main__':
asyncio.run(main()) Streaming response version ›
Use streaming to provide real-time token-by-token output for better user experience.
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI
# Load documents
documents = SimpleDirectoryReader('data').load_data()
# Initialize OpenAI client with streaming enabled
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Create index
index = GPTVectorStoreIndex.from_documents(documents, llm=client)
# Query with streaming
query = "Describe LlamaIndex with streaming."
for token in index.stream_query(query):
print(token, end='', flush=True)
print() Alternative model usage ›
Use when you want to reduce cost or latency by using a smaller model.
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI
# Load documents
documents = SimpleDirectoryReader('data').load_data()
# Use a smaller or cheaper model
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Create index with gpt-4o-mini
index = GPTVectorStoreIndex.from_documents(documents, llm=client)
query = "What is LlamaIndex?"
response = index.query(query, model='gpt-4o-mini')
print("Response:", response.response) Performance
Latency~800ms for typical GPTVectorStoreIndex query with gpt-4o
Cost~$0.002 per 500 tokens for gpt-4o model usage
Rate limitsDepends on OpenAI API tier; typically 500 RPM and 30K TPM
- Limit document size before indexing to reduce tokens
- Use smaller models like gpt-4o-mini for cheaper queries
- Cache index results to avoid repeated calls
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard GPTVectorStoreIndex | ~800ms | ~$0.002 | Balanced accuracy and cost |
| Streaming Query | Token-by-token output | ~$0.002 | Interactive applications |
| Async Query | ~800ms but non-blocking | ~$0.002 | Concurrent workflows |
| gpt-4o-mini Model | ~400ms | ~$0.001 | Cost-sensitive or faster responses |
Quick tip
Always set your OpenAI API key in environment variables before using LlamaIndex to avoid authentication errors.
Common mistake
Beginners often forget to install the correct package name <code>llama-index</code> and try installing <code>llamaindex</code> which does not exist.