How to Intermediate · 3 min read

How to use AutoGen with Ollama

Quick answer
Use AutoGen by configuring it to connect with the Ollama local AI model server via its API endpoint. This involves setting up AutoGen to call Ollama models for generating responses, enabling seamless orchestration of AI workflows locally.

PREREQUISITES

  • Python 3.8+
  • Ollama installed and running locally
  • pip install autogen
  • pip install requests

Setup

Install AutoGen and ensure Ollama is installed and running locally. You need Python 3.8 or higher. Use pip to install the required packages.

bash
pip install autogen requests

Step by step

This example shows how to configure AutoGen to use Ollama as a local AI model backend by calling its REST API. The code sends a prompt to an Ollama model and prints the generated response.

python
import requests

class OllamaClient:
    def __init__(self, model_name='llama2', base_url='http://localhost:11434'):
        self.model_name = model_name
        self.base_url = base_url

    def generate(self, prompt):
        url = f"{self.base_url}/api/generate"
        payload = {
            "model": self.model_name,
            "prompt": prompt,
            "max_tokens": 256
        }
        response = requests.post(url, json=payload)
        response.raise_for_status()
        data = response.json()
        return data.get('results', [{}])[0].get('text', '')

# Example usage
if __name__ == '__main__':
    client = OllamaClient(model_name='llama2')
    prompt = "Explain how AutoGen integrates with Ollama."
    output = client.generate(prompt)
    print("Ollama response:", output)
output
Ollama response: AutoGen integrates with Ollama by sending prompts to the local Ollama model server and receiving generated text responses, enabling local AI orchestration.

Common variations

  • Use different Ollama models by changing the model_name parameter.
  • Integrate AutoGen with asynchronous calls using httpx instead of requests.
  • Combine AutoGen orchestration with other AI APIs like OpenAI or Anthropic for hybrid workflows.
python
import httpx
import asyncio

class AsyncOllamaClient:
    def __init__(self, model_name='llama2', base_url='http://localhost:11434'):
        self.model_name = model_name
        self.base_url = base_url

    async def generate(self, prompt):
        url = f"{self.base_url}/api/generate"
        payload = {
            "model": self.model_name,
            "prompt": prompt,
            "max_tokens": 256
        }
        async with httpx.AsyncClient() as client:
            response = await client.post(url, json=payload)
            response.raise_for_status()
            data = response.json()
            return data.get('results', [{}])[0].get('text', '')

async def main():
    client = AsyncOllamaClient(model_name='llama2')
    prompt = "Async call to Ollama with AutoGen."
    output = await client.generate(prompt)
    print("Async Ollama response:", output)

if __name__ == '__main__':
    asyncio.run(main())
output
Async Ollama response: Async call to Ollama with AutoGen enables non-blocking AI orchestration for improved performance.

Troubleshooting

  • If you get connection errors, verify Ollama is running locally on port 11434.
  • Check your firewall or network settings to allow local API calls.
  • If the response is empty, confirm the model name is correct and available in Ollama.
  • Use response.raise_for_status() to catch HTTP errors early.

Key Takeaways

  • Use AutoGen with Ollama by calling Ollama's local REST API for model inference.
  • Customize model selection and request parameters to fit your AI orchestration needs.
  • Implement async calls for better performance in production environments.
Verified 2026-04 · llama2
Verify ↗