How to beginner to intermediate · 4 min read

How to compare open source LLMs

Quick answer
Use Ollama to run and compare open source LLMs locally by loading different models and evaluating their outputs on the same prompts. Compare metrics like response quality, latency, and resource usage programmatically or manually to select the best fit for your application.

PREREQUISITES

  • Python 3.8+
  • Ollama installed (https://ollama.com/docs/installation)
  • Basic command line usage knowledge
  • pip install requests

Setup Ollama and environment

Install Ollama on your machine following the official instructions. Ensure Python 3.8+ is installed and set up a virtual environment. Install requests to interact with Ollama's local API.

bash
pip install requests

Step by step comparison code

This example shows how to query two different open source LLMs hosted locally by Ollama and compare their outputs on the same prompt.

python
import os
import requests

# Ollama local API endpoint
OLLAMA_API_URL = "http://localhost:11434"

# Models to compare
models = ["llama2", "mistral-large"]

# Prompt to test
prompt = "Explain the benefits of using open source LLMs."

# Function to query Ollama model

def query_ollama(model_name, prompt):
    url = f"{OLLAMA_API_URL}/api/generate"
    headers = {"Content-Type": "application/json"}
    data = {
        "model": model_name,
        "prompt": prompt,
        "max_tokens": 150
    }
    response = requests.post(url, json=data, headers=headers)
    response.raise_for_status()
    return response.json()["completion"]

# Run comparison
for model in models:
    print(f"\nModel: {model}")
    output = query_ollama(model, prompt)
    print(output)
output
Model: llama2
Open source LLMs provide transparency, flexibility, and community-driven improvements...

Model: mistral-large
Using open source LLMs allows developers to customize models, reduce costs, and foster innovation...

Common variations

  • Use different models by changing the models list.
  • Adjust max_tokens or add parameters like temperature for output diversity.
  • Use asynchronous requests with httpx for faster batch comparisons.
  • Integrate evaluation metrics like BLEU or ROUGE for automated quality scoring.

Troubleshooting common issues

  • If you get connection errors, ensure Ollama is running locally on port 11434.
  • Check model names with ollama list CLI command to confirm availability.
  • For slow responses, verify system resource usage and consider smaller models.
  • Ensure your firewall or antivirus is not blocking local API calls.

Key Takeaways

  • Use Ollama's local API to run and compare multiple open source LLMs on identical prompts.
  • Evaluate models based on output quality, latency, and resource consumption for informed selection.
  • Automate comparisons with scripts and integrate standard NLP metrics for objective scoring.
Verified 2026-04 · llama2, mistral-large
Verify ↗