Code beginner · 3 min read

How to install LlamaIndex in python

Q: How to install LlamaIndex in python

Install llama-index in Python using the pip command pip install llama-index to integrate LlamaIndex into your AI projects.

Direct answer

Install llama-index in Python using the pip command pip install llama-index to integrate LlamaIndex into your AI projects.

Setup

Install

bash

pip install llama-index

Imports

python

from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, PromptHelper
from langchain_openai import OpenAI
import os

Examples

inpip install llama-index

outSuccessfully installed llama-index-0.6.15

infrom llama_index import GPTVectorStoreIndex index = GPTVectorStoreIndex.from_documents(documents)

outIndex created successfully from documents

inimport llama_index print(llama_index.__version__)

out0.6.15

Integration steps

Run the pip install command to add llama-index to your Python environment
Import necessary classes from llama_index and langchain_openai
Set up your OpenAI API key in the environment variables
Load or create documents to index
Create an index instance using GPTVectorStoreIndex or other index classes
Use the index to query or retrieve information

Full code

python

import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

# Ensure your OpenAI API key is set in environment variables
# export OPENAI_API_KEY=os.environ["ANTHROPIC_API_KEY"]

# Load documents from a directory
documents = SimpleDirectoryReader('data').load_data()

# Initialize OpenAI client
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Create an index from documents
index = GPTVectorStoreIndex.from_documents(documents, llm=client)

# Query the index
query = "What is LlamaIndex?"
response = index.query(query)

print("Query:", query)
print("Response:", response.response)

output

Query: What is LlamaIndex?
Response: LlamaIndex is a data framework that helps you connect your external data to large language models for efficient retrieval and querying.

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is LlamaIndex?"}]}

Response

json

{"choices": [{"message": {"content": "LlamaIndex is a data framework that helps you connect your external data to large language models."}}]}

Extractresponse.choices[0].message.content

Variants

Async version ›

Use when integrating LlamaIndex in asynchronous Python applications for non-blocking calls.

python

import os
import asyncio
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

async def main():
    documents = SimpleDirectoryReader('data').load_data()
    client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
    index = GPTVectorStoreIndex.from_documents(documents, llm=client)
    query = "Explain LlamaIndex asynchronously."
    response = await index.aquery(query)
    print("Async Query:", query)
    print("Async Response:", response.response)

if __name__ == '__main__':
    asyncio.run(main())

Streaming response version ›

Use streaming to provide real-time token-by-token output for better user experience.

python

import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

# Load documents
documents = SimpleDirectoryReader('data').load_data()

# Initialize OpenAI client with streaming enabled
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Create index
index = GPTVectorStoreIndex.from_documents(documents, llm=client)

# Query with streaming
query = "Describe LlamaIndex with streaming."
for token in index.stream_query(query):
    print(token, end='', flush=True)
print()

Alternative model usage ›

Use when you want to reduce cost or latency by using a smaller model.

python

import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex
from langchain_openai import OpenAI

# Load documents
documents = SimpleDirectoryReader('data').load_data()

# Use a smaller or cheaper model
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Create index with gpt-4o-mini
index = GPTVectorStoreIndex.from_documents(documents, llm=client)

query = "What is LlamaIndex?"
response = index.query(query, model='gpt-4o-mini')
print("Response:", response.response)

Performance

Latency~800ms for typical GPTVectorStoreIndex query with gpt-4o

Cost~$0.002 per 500 tokens for gpt-4o model usage

Rate limitsDepends on OpenAI API tier; typically 500 RPM and 30K TPM

Limit document size before indexing to reduce tokens
Use smaller models like gpt-4o-mini for cheaper queries
Cache index results to avoid repeated calls

Approach	Latency	Cost/call	Best for
Standard GPTVectorStoreIndex	~800ms	~$0.002	Balanced accuracy and cost
Streaming Query	Token-by-token output	~$0.002	Interactive applications
Async Query	~800ms but non-blocking	~$0.002	Concurrent workflows
gpt-4o-mini Model	~400ms	~$0.001	Cost-sensitive or faster responses

✓

Quick tip

Always set your OpenAI API key in environment variables before using LlamaIndex to avoid authentication errors.

⚠

Common mistake

Beginners often forget to install the correct package name <code>llama-index</code> and try installing <code>llamaindex</code> which does not exist.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗