How to beginner · 3 min read

How to integrate Ollama into web app

Quick answer
To integrate Ollama into a web app, use its local API by sending HTTP requests to the Ollama server running on your machine or server. You can call the API from Python or JavaScript by posting prompts and receiving AI-generated completions in JSON format.

PREREQUISITES

  • Python 3.8+
  • Ollama installed and running locally
  • pip install requests
  • Basic knowledge of HTTP APIs

Setup

Install Ollama on your local machine from https://ollama.com and ensure the Ollama daemon is running. Install the requests Python package to make HTTP calls.

Run this command to install requests:

bash
pip install requests

Step by step

Use Python to send a POST request to the Ollama local API endpoint http://localhost:11434/ollama with your prompt and model name. The API returns the generated text in JSON.

python
import requests

# Define the Ollama local API URL
OLLAMA_API_URL = "http://localhost:11434/ollama"

# Example prompt
prompt = "Write a short poem about spring."

# Model to use (e.g., 'llama2')
model = "llama2"

# Prepare the payload
payload = {
    "model": model,
    "prompt": prompt
}

# Send POST request to Ollama API
response = requests.post(OLLAMA_API_URL, json=payload)

# Check response status
if response.status_code == 200:
    data = response.json()
    # Extract generated text
    generated_text = data.get("completion", "")
    print("Generated text:\n", generated_text)
else:
    print(f"Error: {response.status_code} - {response.text}")
output
Generated text:
Spring whispers softly, blooms awake,
Colors dance on every lake.

Common variations

You can integrate Ollama into JavaScript web apps by using fetch to call the local API. Also, you can specify different models or adjust parameters like temperature if supported.

javascript
async function callOllama(prompt) {
  const response = await fetch('http://localhost:11434/ollama', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ model: 'llama2', prompt: prompt })
  });
  if (response.ok) {
    const data = await response.json();
    console.log('Generated text:', data.completion);
  } else {
    console.error('Error:', response.status, await response.text());
  }
}

callOllama('Explain quantum computing in simple terms.');
output
Generated text: Quantum computing uses quantum bits to perform complex calculations faster than classical computers.

Troubleshooting

  • If you get connection errors, ensure the Ollama daemon is running locally and listening on port 11434.
  • Check firewall or network settings that might block localhost requests.
  • Verify the model name is correct and installed in Ollama.
  • Use curl to test the API endpoint manually for debugging.
bash
curl -X POST http://localhost:11434/ollama -H "Content-Type: application/json" -d '{"model":"llama2","prompt":"Hello"}'
output
{
  "completion": "Hello! How can I assist you today?"
}

Key Takeaways

  • Use Ollama's local HTTP API at http://localhost:11434/ollama to integrate AI into your web app.
  • Send POST requests with JSON payloads specifying the model and prompt to get completions.
  • Test connectivity and model availability before integrating into production.
  • You can call Ollama from Python, JavaScript, or any HTTP-capable client.
  • Ensure Ollama daemon is running locally and accessible on the expected port.
Verified 2026-04 · llama2
Verify ↗