How to create a VectorStoreIndex in LlamaIndex
Quick answer
Use
VectorStoreIndex from llama_index by first loading your documents, then creating a vector store with an embedding model, and finally initializing the index with those documents. This enables efficient semantic search over your data using vector similarity.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install llama-index openai
Setup
Install the llama-index package and set your OpenAI API key as an environment variable to enable embedding generation.
pip install llama-index openai Step by step
This example loads simple text documents, creates a VectorStoreIndex using OpenAI embeddings, and queries the index.
import os
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.embeddings.openai import OpenAIEmbedding
# Set your OpenAI API key in environment variable
# export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]
# Load documents from a directory (replace 'data' with your folder)
docs = SimpleDirectoryReader('data').load_data()
# Initialize embedding model
embedding_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])
# Create service context with embedding model
service_context = ServiceContext.from_defaults(embed_model=embedding_model)
# Create VectorStoreIndex with documents and service context
index = VectorStoreIndex.from_documents(docs, service_context=service_context)
# Query the index
query = "What is the main topic of the documents?"
response = index.query(query)
print(response.response) output
The main topic of the documents is ... (depends on your data)
Common variations
- Use different embedding models by swapping
OpenAIEmbeddingwith other supported embeddings. - Create the index asynchronously using async methods if your environment supports it.
- Integrate with other vector stores like FAISS or Pinecone by customizing the vector store backend.
Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If documents fail to load, check the path and file formats supported by
SimpleDirectoryReader. - For slow queries, consider reducing document size or using a smaller embedding model.
Key Takeaways
- Use
VectorStoreIndex.from_documentswith aServiceContextembedding model to create the index. - Load documents with
SimpleDirectoryReaderor other loaders compatible with LlamaIndex. - Set your OpenAI API key in
os.environ["OPENAI_API_KEY"]before running the code. - You can customize embeddings and vector stores for different use cases and performance needs.