How to beginner · 3 min read

How to run Phi-3 locally

Q: How to run Phi-3 locally

To run Phi-3 locally, install the Ollama CLI and pull the phi-3 model using ollama pull phi-3. Then run the model locally with ollama run phi-3 or integrate it via the Ollama API.

Quick answer

To run Phi-3 locally, install the Ollama CLI and pull the phi-3 model using ollama pull phi-3. Then run the model locally with ollama run phi-3 or integrate it via the Ollama API.

PREREQUISITES

macOS or Linux machine
Ollama CLI installed (https://ollama.com/docs/installation)
Python 3.8+ for API integration
pip install requests

Setup Ollama CLI

Install the Ollama CLI on your local machine to manage and run models like Phi-3. Ollama supports macOS and Linux.

Visit the official installation guide at https://ollama.com/docs/installation for the latest instructions.

bash

brew install ollama

output

==> Downloading https://github.com/ollama/ollama/releases/download/vX.Y.Z/ollama-darwin.zip
==> Installing Ollama CLI
==> Installation successful

Step by step to run Phi-3 locally

After installing Ollama, pull the phi-3 model and run it locally via CLI or API.

bash

ollama pull phi-3
ollama run phi-3 --prompt "Hello, how are you?"

output

Pulling phi-3 model...
Model phi-3 downloaded successfully.

> Hello, how are you?
Hello! I'm Phi-3, your local AI assistant. How can I help you today?

Run Phi-3 locally via Python API

Use Python requests to call the Ollama local API endpoint after running the model.

python

import os
import requests

# Ensure Ollama daemon is running: ollama daemon

url = "http://localhost:11434/completions"
headers = {"Content-Type": "application/json"}
data = {
    "model": "phi-3",
    "prompt": "Write a short poem about AI.",
    "max_tokens": 100
}

response = requests.post(url, json=data, headers=headers)
print(response.json()["completion"])

output

AI whispers softly,
In circuits and in code,
Dreams of silicon skies,
Where thoughts freely flow.

Common variations

Run Ollama daemon in background with ollama daemon for API access.
Use different prompts or adjust max_tokens in API calls.
Run models interactively via CLI with ollama run phi-3.

Troubleshooting

If ollama pull phi-3 fails, check your internet connection and Ollama version.
If API calls to localhost:11434 fail, ensure ollama daemon is running.
For permission errors, run CLI commands with appropriate user privileges.

✅

Key Takeaways

Install Ollama CLI to manage and run Phi-3 locally.
Use ollama pull phi-3 to download the model before running.
Run the model via CLI or call the local API at http://localhost:11434.
Ensure ollama daemon is running for API access.
Use Python requests to integrate Phi-3 into your applications.

Verified 2026-04 · phi-3

Verify ↗