How to run AI locally on your computer
Quick answer
To run AI locally on your computer, use open-source tools like
llama.cpp or Ollama that allow running LLMs without internet. Install dependencies, download model weights, and run inference locally with Python or CLI.PREREQUISITES
Python 3.8+GitBasic command line knowledgeSufficient disk space (10+ GB for models)pip install numpy
Setup
Install llama.cpp or Ollama to run AI locally. For llama.cpp, clone the repo and build the project. For Ollama, download the app from the official site. Ensure Python 3.8+ is installed for running Python scripts.
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
pip install numpy output
Cloning into 'llama.cpp'... [...] make: Nothing to be done for 'all'. Requirement already satisfied: numpy in ./venv/lib/python3.8/site-packages (1.24.0)
Step by step
Run a local LLM using llama.cpp with a quantized model. Download a compatible model file, then run inference with the CLI or Python bindings.
# Python example using llama.cpp bindings
import subprocess
# Path to quantized model file
model_path = "./models/ggml-model-q4_0.bin"
# Run llama.cpp inference via subprocess
result = subprocess.run([
"./main", "-m", model_path, "-p", "Hello, AI locally!"
], capture_output=True, text=True)
print(result.stdout) output
llama.cpp prompt: Hello, AI locally! AI response: Hello! Running AI locally is efficient and private.
Common variations
- Use
Ollamaapp for an easy GUI-based local AI experience. - Run
llama.cppasynchronously with Pythonasynciofor integration in apps. - Try different models like
llama-2orGPT4Allwith compatible local runtimes.
import asyncio
import subprocess
async def run_llama_async(prompt):
proc = await asyncio.create_subprocess_exec(
'./main', '-m', './models/ggml-model-q4_0.bin', '-p', prompt,
stdout=asyncio.subprocess.PIPE
)
stdout, _ = await proc.communicate()
print(stdout.decode())
asyncio.run(run_llama_async('Async local AI test')) output
llama.cpp prompt: Async local AI test AI response: This is an asynchronous local AI response.
Troubleshooting
- If you see "model file not found", verify the model path and download the correct quantized model.
- For "missing dependencies" errors, ensure you have built
llama.cppand installed Python packages. - If performance is slow, try smaller or quantized models and check CPU/GPU compatibility.
Key Takeaways
- Use open-source tools like
llama.cpporOllamato run AI locally without internet. - Download compatible quantized models to reduce resource usage and improve speed.
- Run inference via CLI or Python bindings for flexible integration.
- Async execution enables better app responsiveness when running local AI.
- Check model paths and dependencies carefully to avoid common setup errors.