How to beginner · 3 min read

How to use CrewAI with local LLMs

Quick answer
Use CrewAI by installing its Python SDK and configuring it to connect with your local LLM instance, such as llama.cpp or GPT4All. Initialize the local model client within CrewAI to run inference offline without API calls.

PREREQUISITES

  • Python 3.8+
  • pip install crewai
  • A local LLM installed (e.g., llama.cpp, GPT4All)
  • Basic knowledge of Python

Setup

Install the crewai Python package and ensure your local LLM is properly installed and accessible. Set up environment variables if needed for configuration.

bash
pip install crewai

Step by step

Here is a complete example to initialize CrewAI with a local LLM client and generate text:

python
import os
from crewai import CrewAI, LocalLLMClient

# Initialize local LLM client (example for llama.cpp or GPT4All)
local_llm = LocalLLMClient(model_path="/path/to/local/model.bin")

# Initialize CrewAI with the local LLM client
client = CrewAI(llm_client=local_llm)

# Generate text
prompt = "Explain the benefits of using local LLMs with CrewAI."
response = client.generate(prompt=prompt, max_tokens=150)
print(response.text)
output
Explain the benefits of using local LLMs with CrewAI.

Using local LLMs with CrewAI allows you to run AI models offline, ensuring data privacy, reducing latency, and avoiding API costs while maintaining flexible integration.

Common variations

  • Use different local LLMs by changing the model_path in LocalLLMClient.
  • Enable streaming output if supported by your local LLM.
  • Integrate CrewAI with remote LLMs by swapping LocalLLMClient with API clients.
python
from crewai import CrewAI, LocalLLMClient

# Async example
import asyncio

async def async_generate():
    local_llm = LocalLLMClient(model_path="/path/to/local/model.bin")
    client = CrewAI(llm_client=local_llm)
    response = await client.generate_async(prompt="Hello from async local LLM!", max_tokens=50)
    print(response.text)

asyncio.run(async_generate())
output
Hello from async local LLM! This demonstrates asynchronous generation using CrewAI with a local model.

Troubleshooting

  • If you see Model not found, verify the model_path is correct and accessible.
  • For Permission denied errors, check file permissions on the local model files.
  • If generation is slow, ensure your hardware supports the local LLM requirements.

Key Takeaways

  • Install CrewAI and a compatible local LLM to run AI models offline.
  • Initialize CrewAI with a local LLM client by specifying the model path.
  • Use async generation for improved performance with local models.
  • Check file paths and permissions if local model loading fails.
Verified 2026-04 · llama.cpp, GPT4All
Verify ↗