How to beginner · 3 min read

How to use GGUF models with Ollama

Quick answer
To use GGUF models with Ollama, first download or convert your model into the GGUF format, then run ollama pull <model-name> to import it locally. Finally, use ollama run <model-name> to interact with the quantized model efficiently on your machine.

PREREQUISITES

  • Ollama installed (https://ollama.com)
  • GGUF model file or compatible model converted to GGUF format
  • macOS or Linux system (Ollama supports these OSes)
  • Basic terminal/command line usage knowledge

Setup Ollama

Install Ollama on your local machine by downloading it from the official site or using Homebrew on macOS. Ensure you have a compatible environment (macOS or Linux) and terminal access.

bash
brew install ollama
output
==> Downloading ollama
==> Installing ollama
==> Installation successful

Step by step usage

1. Obtain a GGUF model file. You can download pre-quantized GGUF models or convert existing models using tools like llama.cpp converters.
2. Import the model into Ollama with ollama pull <model-path-or-name>.
3. Run the model locally using ollama run <model-name> to start an interactive session.

bash
ollama pull ./my-gguf-model.gguf
ollama run my-gguf-model
output
Pulling model from ./my-gguf-model.gguf...
Model my-gguf-model pulled successfully.
Starting interactive session with my-gguf-model...
> Hello, Ollama!

Common variations

You can specify quantization levels when converting models to GGUF format (e.g., 4-bit, 8-bit) to optimize performance and memory usage. Ollama supports running multiple GGUF models side-by-side. For scripting, use ollama run <model-name> --prompt "Your prompt" to get output directly without interactive mode.

bash
ollama run my-gguf-model --prompt "Explain quantization"
output
Quantization reduces model size by encoding weights with fewer bits, improving speed and lowering memory use.

Troubleshooting

If ollama pull fails, verify the model path and GGUF format compatibility. Ensure your Ollama version supports GGUF models by running ollama version. For performance issues, try a lower-bit quantized GGUF model or close other heavy applications. Check Ollama logs for detailed errors.

bash
ollama version
ollama logs
output
Ollama version 0.0.12
Logs located at ~/.ollama/logs/ollama.log

Key Takeaways

  • Use Ollama to run GGUF quantized models locally without cloud dependencies.
  • Pull GGUF models with ollama pull and run them interactively or via prompt.
  • Adjust quantization levels during conversion for optimal speed and memory trade-offs.
Verified 2026-04 · gguf, ollama
Verify ↗