Code beginner · 3 min read

How to install LlamaIndex in python

Direct answer
Install llama-index in Python using the pip command pip install llama-index to integrate LlamaIndex into your AI projects.

Setup

Install
bash
pip install llama-index
Imports
python
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, PromptHelper
from langchain_openai import OpenAI
import os

Examples

inpip install llama-index
outSuccessfully installed llama-index-0.6.15
infrom llama_index import GPTVectorStoreIndex index = GPTVectorStoreIndex.from_documents(documents)
outIndex created successfully from documents
inimport llama_index print(llama_index.__version__)
out0.6.15

Integration steps

  1. Run the pip install command to add llama-index to your Python environment
  2. Import necessary classes from llama_index and langchain_openai
  3. Set up your OpenAI API key in the environment variables
  4. Load or create documents to index
  5. Create an index instance using GPTVectorStoreIndex or other index classes
  6. Use the index to query or retrieve information

Full code

python
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

# Ensure your OpenAI API key is set in environment variables
# export OPENAI_API_KEY=os.environ["ANTHROPIC_API_KEY"]

# Load documents from a directory
documents = SimpleDirectoryReader('data').load_data()

# Initialize OpenAI client
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Create an index from documents
index = GPTVectorStoreIndex.from_documents(documents, llm=client)

# Query the index
query = "What is LlamaIndex?"
response = index.query(query)

print("Query:", query)
print("Response:", response.response)
output
Query: What is LlamaIndex?
Response: LlamaIndex is a data framework that helps you connect your external data to large language models for efficient retrieval and querying.

API trace

Request
json
{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is LlamaIndex?"}]}
Response
json
{"choices": [{"message": {"content": "LlamaIndex is a data framework that helps you connect your external data to large language models."}}]}
Extractresponse.choices[0].message.content

Variants

Async version

Use when integrating LlamaIndex in asynchronous Python applications for non-blocking calls.

python
import os
import asyncio
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

async def main():
    documents = SimpleDirectoryReader('data').load_data()
    client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
    index = GPTVectorStoreIndex.from_documents(documents, llm=client)
    query = "Explain LlamaIndex asynchronously."
    response = await index.aquery(query)
    print("Async Query:", query)
    print("Async Response:", response.response)

if __name__ == '__main__':
    asyncio.run(main())
Streaming response version

Use streaming to provide real-time token-by-token output for better user experience.

python
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

# Load documents
documents = SimpleDirectoryReader('data').load_data()

# Initialize OpenAI client with streaming enabled
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Create index
index = GPTVectorStoreIndex.from_documents(documents, llm=client)

# Query with streaming
query = "Describe LlamaIndex with streaming."
for token in index.stream_query(query):
    print(token, end='', flush=True)
print()
Alternative model usage

Use when you want to reduce cost or latency by using a smaller model.

python
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

# Load documents
documents = SimpleDirectoryReader('data').load_data()

# Use a smaller or cheaper model
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Create index with gpt-4o-mini
index = GPTVectorStoreIndex.from_documents(documents, llm=client)

query = "What is LlamaIndex?"
response = index.query(query, model='gpt-4o-mini')
print("Response:", response.response)

Performance

Latency~800ms for typical GPTVectorStoreIndex query with gpt-4o
Cost~$0.002 per 500 tokens for gpt-4o model usage
Rate limitsDepends on OpenAI API tier; typically 500 RPM and 30K TPM
  • Limit document size before indexing to reduce tokens
  • Use smaller models like gpt-4o-mini for cheaper queries
  • Cache index results to avoid repeated calls
ApproachLatencyCost/callBest for
Standard GPTVectorStoreIndex~800ms~$0.002Balanced accuracy and cost
Streaming QueryToken-by-token output~$0.002Interactive applications
Async Query~800ms but non-blocking~$0.002Concurrent workflows
gpt-4o-mini Model~400ms~$0.001Cost-sensitive or faster responses

Quick tip

Always set your OpenAI API key in environment variables before using LlamaIndex to avoid authentication errors.

Common mistake

Beginners often forget to install the correct package name <code>llama-index</code> and try installing <code>llamaindex</code> which does not exist.

Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗