How to beginner · 3 min read

How to use CrewAI with local LLMs

Q: How to use CrewAI with local LLMs

Use CrewAI by installing its Python SDK and configuring it to connect with your local LLM instance, such as llama.cpp or GPT4All. Initialize the local model client within CrewAI to run inference offline without API calls.

Quick answer

Use CrewAI by installing its Python SDK and configuring it to connect with your local LLM instance, such as llama.cpp or GPT4All. Initialize the local model client within CrewAI to run inference offline without API calls.

PREREQUISITES

Python 3.8+
pip install crewai
A local LLM installed (e.g., llama.cpp, GPT4All)
Basic knowledge of Python

Setup

Install the crewai Python package and ensure your local LLM is properly installed and accessible. Set up environment variables if needed for configuration.

bash

pip install crewai

Step by step

Here is a complete example to initialize CrewAI with a local LLM client and generate text:

python

import os
from crewai import CrewAI, LocalLLMClient

# Initialize local LLM client (example for llama.cpp or GPT4All)
local_llm = LocalLLMClient(model_path="/path/to/local/model.bin")

# Initialize CrewAI with the local LLM client
client = CrewAI(llm_client=local_llm)

# Generate text
prompt = "Explain the benefits of using local LLMs with CrewAI."
response = client.generate(prompt=prompt, max_tokens=150)
print(response.text)

output

Explain the benefits of using local LLMs with CrewAI.

Using local LLMs with CrewAI allows you to run AI models offline, ensuring data privacy, reducing latency, and avoiding API costs while maintaining flexible integration.

Common variations

Use different local LLMs by changing the model_path in LocalLLMClient.
Enable streaming output if supported by your local LLM.
Integrate CrewAI with remote LLMs by swapping LocalLLMClient with API clients.

python

from crewai import CrewAI, LocalLLMClient

# Async example
import asyncio

async def async_generate():
    local_llm = LocalLLMClient(model_path="/path/to/local/model.bin")
    client = CrewAI(llm_client=local_llm)
    response = await client.generate_async(prompt="Hello from async local LLM!", max_tokens=50)
    print(response.text)

asyncio.run(async_generate())

output

Hello from async local LLM! This demonstrates asynchronous generation using CrewAI with a local model.

Troubleshooting

If you see Model not found, verify the model_path is correct and accessible.
For Permission denied errors, check file permissions on the local model files.
If generation is slow, ensure your hardware supports the local LLM requirements.

✅

Key Takeaways

Install CrewAI and a compatible local LLM to run AI models offline.
Initialize CrewAI with a local LLM client by specifying the model path.
Use async generation for improved performance with local models.
Check file paths and permissions if local model loading fails.

Verified 2026-04 · llama.cpp, GPT4All

Verify ↗