How to beginner · 3 min read

Structured outputs with Ollama

Q: Structured outputs with Ollama

Use ollama.chat with a prompt instructing the model to output structured data like JSON. Then parse the response["message"]["content"] string in Python to extract the structured output reliably.

Quick answer

Use ollama.chat with a prompt instructing the model to output structured data like JSON. Then parse the response["message"]["content"] string in Python to extract the structured output reliably.

PREREQUISITES

Python 3.8+
pip install ollama
Ollama model llama3.2 installed locally

Setup

Install the ollama Python package and ensure you have the llama3.2 model downloaded locally. Ollama runs locally without API keys.

Install with:

bash

pip install ollama

Step by step

Call ollama.chat with a prompt that instructs the model to respond in JSON format. Then parse the JSON string from the response content.

python

import json
import ollama

prompt = '''Generate a JSON object with keys \"name\" and \"age\" only.\nRespond ONLY with the JSON.\n'''

response = ollama.chat(model="llama3.2", messages=[{"role": "user", "content": prompt}])

json_str = response["message"]["content"]

try:
    data = json.loads(json_str)
    print("Parsed JSON output:", data)
except json.JSONDecodeError:
    print("Failed to parse JSON. Raw output:", json_str)

output

Parsed JSON output: {'name': 'Alice', 'age': 30}

Common variations

Use other structured formats like XML or CSV by changing the prompt instructions.
Use different Ollama models like llama3.3-70b for larger context or better accuracy.
Integrate with async Python by running ollama.chat in a thread or async wrapper since Ollama client is synchronous.

Troubleshooting

If JSON parsing fails, verify the prompt strictly instructs the model to output only JSON without extra text.
Check that the Ollama daemon is running locally on port 11434.
Use print(response) to debug raw model output.

✅

Key Takeaways

Use explicit prompt instructions to get structured outputs from Ollama models.
Parse the response["message"]["content"] string as JSON or your target format in Python.
Ollama runs locally with zero authentication, simplifying integration.
Test and debug raw outputs to ensure the model follows structured output constraints.

Verified 2026-04 · llama3.2, llama3.3-70b

Verify ↗