How to Intermediate · 3 min read

How to use LlamaIndex with Ollama

Q: How to use LlamaIndex with Ollama

Use LlamaIndex by configuring it to connect with Ollama via its local API or CLI interface. This involves setting up LlamaIndex to call Ollama models for embedding or completion tasks, enabling local AI model usage without cloud API calls.

Quick answer

Use LlamaIndex by configuring it to connect with Ollama via its local API or CLI interface. This involves setting up LlamaIndex to call Ollama models for embedding or completion tasks, enabling local AI model usage without cloud API calls.

PREREQUISITES

Python 3.8+
pip install llama-index ollama
Ollama installed and configured locally
Basic knowledge of Python and LLM usage

Setup

Install the required Python packages and ensure Ollama is installed and running locally. Ollama provides local access to large language models without cloud dependencies.

bash

pip install llama-index ollama

Step by step

This example shows how to create a simple LlamaIndex index and query it using Ollama's local LLM model through the ollama Python client.

python

import os
from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
import ollama

# Define a function to query Ollama model
# Replace 'llama2' with your local Ollama model name

def query_ollama(prompt: str) -> str:
    response = ollama.chat(model="llama2", messages=[{"role": "user", "content": prompt}])
    return response['choices'][0]['message']['content']

# Load documents from a directory
documents = SimpleDirectoryReader('data').load_data()

# Create an index with documents
index = GPTSimpleVectorIndex(documents)

# Query the index
query = "What is the main topic of the documents?"

# Use Ollama model to generate answer based on index query
answer = query_ollama(query)
print("Ollama model answer:", answer)

output

Ollama model answer: The main topic of the documents is ...

Common variations

Use different Ollama models by changing the model parameter in ollama.chat().
Integrate Ollama completions directly into LlamaIndex's query pipeline by subclassing or customizing the predictor.
Use asynchronous calls if Ollama client supports async for better performance.

Troubleshooting

If you get connection errors, ensure Ollama is running locally and accessible.
Verify the model name matches one installed in Ollama.
Check Python package versions for compatibility.

✅

Key Takeaways

Use Ollama's local API to integrate LlamaIndex with local LLMs without cloud calls.
Install both llama-index and ollama Python packages for seamless integration.
Customize Ollama model and query logic to fit your specific AI application needs.

Verified 2026-04 · llama2, gpt-4o, claude-3-5-sonnet-20241022

Verify ↗