How to beginner · 3 min read

How to integrate Ollama into web app

Q: How to integrate Ollama into web app

To integrate Ollama into a web app, use its local API by sending HTTP requests to the Ollama server running on your machine or server. You can call the API from Python or JavaScript by posting prompts and receiving AI-generated completions in JSON format.

Quick answer

To integrate Ollama into a web app, use its local API by sending HTTP requests to the Ollama server running on your machine or server. You can call the API from Python or JavaScript by posting prompts and receiving AI-generated completions in JSON format.

PREREQUISITES

Python 3.8+
Ollama installed and running locally
pip install requests
Basic knowledge of HTTP APIs

Setup

Install Ollama on your local machine from https://ollama.com and ensure the Ollama daemon is running. Install the requests Python package to make HTTP calls.

Run this command to install requests:

bash

pip install requests

Step by step

Use Python to send a POST request to the Ollama local API endpoint http://localhost:11434/ollama with your prompt and model name. The API returns the generated text in JSON.

python

import requests

# Define the Ollama local API URL
OLLAMA_API_URL = "http://localhost:11434/ollama"

# Example prompt
prompt = "Write a short poem about spring."

# Model to use (e.g., 'llama2')
model = "llama2"

# Prepare the payload
payload = {
    "model": model,
    "prompt": prompt
}

# Send POST request to Ollama API
response = requests.post(OLLAMA_API_URL, json=payload)

# Check response status
if response.status_code == 200:
    data = response.json()
    # Extract generated text
    generated_text = data.get("completion", "")
    print("Generated text:\n", generated_text)
else:
    print(f"Error: {response.status_code} - {response.text}")

output

Generated text:
Spring whispers softly, blooms awake,
Colors dance on every lake.

Common variations

You can integrate Ollama into JavaScript web apps by using fetch to call the local API. Also, you can specify different models or adjust parameters like temperature if supported.

javascript

async function callOllama(prompt) {
  const response = await fetch('http://localhost:11434/ollama', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ model: 'llama2', prompt: prompt })
  });
  if (response.ok) {
    const data = await response.json();
    console.log('Generated text:', data.completion);
  } else {
    console.error('Error:', response.status, await response.text());
  }
}

callOllama('Explain quantum computing in simple terms.');

output

Generated text: Quantum computing uses quantum bits to perform complex calculations faster than classical computers.

Troubleshooting

If you get connection errors, ensure the Ollama daemon is running locally and listening on port 11434.
Check firewall or network settings that might block localhost requests.
Verify the model name is correct and installed in Ollama.
Use curl to test the API endpoint manually for debugging.

bash

curl -X POST http://localhost:11434/ollama -H "Content-Type: application/json" -d '{"model":"llama2","prompt":"Hello"}'

output

{
  "completion": "Hello! How can I assist you today?"
}

✅

Key Takeaways

Use Ollama's local HTTP API at http://localhost:11434/ollama to integrate AI into your web app.
Send POST requests with JSON payloads specifying the model and prompt to get completions.
Test connectivity and model availability before integrating into production.
You can call Ollama from Python, JavaScript, or any HTTP-capable client.
Ensure Ollama daemon is running locally and accessible on the expected port.

Verified 2026-04 · llama2

Verify ↗