How to beginner · 3 min read

How to use GGUF models with Ollama

Q: How to use GGUF models with Ollama

To use GGUF models with Ollama, first download or convert your model into the GGUF format, then run ollama pull <model-name> to import it locally. Finally, use ollama run <model-name> to interact with the quantized model efficiently on your machine.

Quick answer

To use GGUF models with Ollama, first download or convert your model into the GGUF format, then run ollama pull <model-name> to import it locally. Finally, use ollama run <model-name> to interact with the quantized model efficiently on your machine.

PREREQUISITES

Ollama installed (https://ollama.com)
GGUF model file or compatible model converted to GGUF format
macOS or Linux system (Ollama supports these OSes)
Basic terminal/command line usage knowledge

Setup Ollama

Install Ollama on your local machine by downloading it from the official site or using Homebrew on macOS. Ensure you have a compatible environment (macOS or Linux) and terminal access.

bash

brew install ollama

output

==> Downloading ollama
==> Installing ollama
==> Installation successful

Step by step usage

1. Obtain a GGUF model file. You can download pre-quantized GGUF models or convert existing models using tools like llama.cpp converters.
2. Import the model into Ollama with ollama pull <model-path-or-name>.
3. Run the model locally using ollama run <model-name> to start an interactive session.

bash

ollama pull ./my-gguf-model.gguf
ollama run my-gguf-model

output

Pulling model from ./my-gguf-model.gguf...
Model my-gguf-model pulled successfully.
Starting interactive session with my-gguf-model...
> Hello, Ollama!

Common variations

You can specify quantization levels when converting models to GGUF format (e.g., 4-bit, 8-bit) to optimize performance and memory usage. Ollama supports running multiple GGUF models side-by-side. For scripting, use ollama run <model-name> --prompt "Your prompt" to get output directly without interactive mode.

bash

ollama run my-gguf-model --prompt "Explain quantization"

output

Quantization reduces model size by encoding weights with fewer bits, improving speed and lowering memory use.

Troubleshooting

If ollama pull fails, verify the model path and GGUF format compatibility. Ensure your Ollama version supports GGUF models by running ollama version. For performance issues, try a lower-bit quantized GGUF model or close other heavy applications. Check Ollama logs for detailed errors.

bash

ollama version
ollama logs

output

Ollama version 0.0.12
Logs located at ~/.ollama/logs/ollama.log

✅

Key Takeaways

Use Ollama to run GGUF quantized models locally without cloud dependencies.
Pull GGUF models with ollama pull and run them interactively or via prompt.
Adjust quantization levels during conversion for optimal speed and memory trade-offs.

Verified 2026-04 · gguf, ollama

Verify ↗