How to Intermediate · 4 min read

How to integrate LlamaIndex with FastAPI

Q: How to integrate LlamaIndex with FastAPI

Use LlamaIndex to build an index over your documents and expose it via a FastAPI endpoint. Initialize the index in your app, then create a route that queries the index with user input and returns the AI-generated response.

Quick answer

Use LlamaIndex to build an index over your documents and expose it via a FastAPI endpoint. Initialize the index in your app, then create a route that queries the index with user input and returns the AI-generated response.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install llama-index fastapi uvicorn openai

Setup

Install the required packages and set your OpenAI API key as an environment variable.

bash

pip install llama-index fastapi uvicorn openai

Step by step

This example shows how to create a simple FastAPI app that loads documents into a LlamaIndex index and exposes a query endpoint.

python

import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from llama_index import SimpleDirectoryReader, GPTSimpleVectorIndex

# Load OpenAI API key from environment
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

app = FastAPI()

# Load documents and create index
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex(documents)

class QueryRequest(BaseModel):
    query: str

@app.post("/query")
async def query_index(request: QueryRequest):
    try:
        response = index.query(request.query)
        return {"response": response.response}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# To run: uvicorn main:app --reload

Common variations

Use GPTListIndex or GPTTreeIndex for different indexing strategies.
Switch to async endpoints in FastAPI if your index supports async queries.
Use other LLM providers by configuring LlamaIndex with their API keys.

Troubleshooting

If you get API key errors, ensure OPENAI_API_KEY is set correctly in your environment.
For slow responses, check your document size and consider using smaller indexes or caching.
If FastAPI returns 500 errors, inspect the exception message for details.

Key Takeaways

Initialize LlamaIndex with your documents before starting the FastAPI app.
Expose a POST endpoint that accepts queries and returns index responses.
Set your OpenAI API key in the environment to enable LLM calls.
Choose the index type based on your document structure and query needs.
Handle exceptions in FastAPI to provide clear error messages.

Verified 2026-04 · gpt-4o, GPTSimpleVectorIndex

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.