How to Intermediate · 3 min read

How to run DeepSeek-R1 locally

Quick answer
You can run DeepSeek-R1 locally by deploying the model on your machine with a compatible serving setup and querying it via the OpenAI API interface pointing to your local server. Use the openai Python SDK with the base_url set to your local server address to send requests to the locally hosted deepseek-reasoner model.

PREREQUISITES

  • Python 3.8+
  • pip install openai>=1.0
  • DeepSeek-R1 model files and local serving environment
  • Basic knowledge of running local AI model servers

Setup local server

To run DeepSeek-R1 locally, you first need to set up a local server that hosts the model. This typically involves downloading the model weights and running a compatible inference server that exposes an OpenAI-compatible API endpoint.

Ensure you have the model files and a serving framework like vLLM or a DeepSeek-provided server that supports deepseek-reasoner. The server should listen on a local port (e.g., http://localhost:8000).

bash
pip install openai

Step by step usage

Once your local server is running and serving deepseek-reasoner, use the openai Python SDK to query it by specifying the base_url parameter pointing to your local server.

This example shows how to send a prompt to the locally hosted deepseek-reasoner model and print the response.

python
import os
from openai import OpenAI

# Initialize client with local server URL
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],  # API key can be empty if local server does not require auth
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[{"role": "user", "content": "Explain the benefits of using reasoning models."}]
)

print(response.choices[0].message.content)
output
The benefits of using reasoning models include improved problem-solving capabilities, better understanding of complex queries, and enhanced decision-making support.

Common variations

  • Async calls: Use async Python clients or frameworks to query the local server for non-blocking calls.
  • Streaming: If your local server supports streaming, enable streaming in the SDK to receive partial outputs.
  • Different models: You can switch model parameter to other DeepSeek models like deepseek-chat if available locally.
python
import asyncio
from openai import OpenAI

async def async_query():
    client = OpenAI(api_key="", base_url="http://localhost:8000/v1")
    response = await client.chat.completions.acreate(
        model="deepseek-reasoner",
        messages=[{"role": "user", "content": "What is the capital of France?"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_query())
output
Paris is the capital of France.

Troubleshooting

  • If you get connection errors, verify your local server is running and accessible at the specified base_url.
  • If authentication fails, check if your local server requires an API key or token and provide it accordingly.
  • For model loading errors, ensure the deepseek-reasoner model files are correctly installed and compatible with your serving framework.

Key Takeaways

  • Run DeepSeek-R1 locally by hosting the model on a local server exposing an OpenAI-compatible API.
  • Use the OpenAI Python SDK with the base_url parameter pointed to your local server for inference.
  • Ensure your local server is properly configured with model files and network accessibility.
  • Async and streaming calls are supported if your local server implements those features.
  • Troubleshoot connection, authentication, and model loading issues by verifying server status and configuration.
Verified 2026-04 · deepseek-reasoner, deepseek-chat
Verify ↗