How to integrate LlamaIndex with FastAPI
Quick answer
Use
LlamaIndex to build an index over your documents and expose it via a FastAPI endpoint. Initialize the index in your app, then create a route that queries the index with user input and returns the AI-generated response.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install llama-index fastapi uvicorn openai
Setup
Install the required packages and set your OpenAI API key as an environment variable.
pip install llama-index fastapi uvicorn openai Step by step
This example shows how to create a simple FastAPI app that loads documents into a LlamaIndex index and exposes a query endpoint.
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from llama_index import SimpleDirectoryReader, GPTSimpleVectorIndex
# Load OpenAI API key from environment
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
app = FastAPI()
# Load documents and create index
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex(documents)
class QueryRequest(BaseModel):
query: str
@app.post("/query")
async def query_index(request: QueryRequest):
try:
response = index.query(request.query)
return {"response": response.response}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# To run: uvicorn main:app --reload Common variations
- Use
GPTListIndexorGPTTreeIndexfor different indexing strategies. - Switch to async endpoints in
FastAPIif your index supports async queries. - Use other LLM providers by configuring
LlamaIndexwith their API keys.
Troubleshooting
- If you get API key errors, ensure
OPENAI_API_KEYis set correctly in your environment. - For slow responses, check your document size and consider using smaller indexes or caching.
- If
FastAPIreturns 500 errors, inspect the exception message for details.
Key Takeaways
- Initialize LlamaIndex with your documents before starting the FastAPI app.
- Expose a POST endpoint that accepts queries and returns index responses.
- Set your OpenAI API key in the environment to enable LLM calls.
- Choose the index type based on your document structure and query needs.
- Handle exceptions in FastAPI to provide clear error messages.