How to Intermediate · 3 min read

How to run DeepSeek-R1 locally

Q: How to run DeepSeek-R1 locally

You can run DeepSeek-R1 locally by deploying the model on your machine with a compatible serving setup and querying it via the OpenAI API interface pointing to your local server. Use the openai Python SDK with the base_url set to your local server address to send requests to the locally hosted deepseek-reasoner model.

Quick answer

You can run DeepSeek-R1 locally by deploying the model on your machine with a compatible serving setup and querying it via the OpenAI API interface pointing to your local server. Use the openai Python SDK with the base_url set to your local server address to send requests to the locally hosted deepseek-reasoner model.

PREREQUISITES

Python 3.8+
pip install openai>=1.0
DeepSeek-R1 model files and local serving environment
Basic knowledge of running local AI model servers

Setup local server

To run DeepSeek-R1 locally, you first need to set up a local server that hosts the model. This typically involves downloading the model weights and running a compatible inference server that exposes an OpenAI-compatible API endpoint.

Ensure you have the model files and a serving framework like vLLM or a DeepSeek-provided server that supports deepseek-reasoner. The server should listen on a local port (e.g., http://localhost:8000).

bash

pip install openai

Step by step usage

Once your local server is running and serving deepseek-reasoner, use the openai Python SDK to query it by specifying the base_url parameter pointing to your local server.

This example shows how to send a prompt to the locally hosted deepseek-reasoner model and print the response.

python

import os
from openai import OpenAI

# Initialize client with local server URL
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],  # API key can be empty if local server does not require auth
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[{"role": "user", "content": "Explain the benefits of using reasoning models."}]
)

print(response.choices[0].message.content)

output

The benefits of using reasoning models include improved problem-solving capabilities, better understanding of complex queries, and enhanced decision-making support.

Common variations

Async calls: Use async Python clients or frameworks to query the local server for non-blocking calls.
Streaming: If your local server supports streaming, enable streaming in the SDK to receive partial outputs.
Different models: You can switch model parameter to other DeepSeek models like deepseek-chat if available locally.

python

import asyncio
from openai import OpenAI

async def async_query():
    client = OpenAI(api_key="", base_url="http://localhost:8000/v1")
    response = await client.chat.completions.acreate(
        model="deepseek-reasoner",
        messages=[{"role": "user", "content": "What is the capital of France?"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_query())

output

Paris is the capital of France.

Troubleshooting

If you get connection errors, verify your local server is running and accessible at the specified base_url.
If authentication fails, check if your local server requires an API key or token and provide it accordingly.
For model loading errors, ensure the deepseek-reasoner model files are correctly installed and compatible with your serving framework.

✅

Key Takeaways

Run DeepSeek-R1 locally by hosting the model on a local server exposing an OpenAI-compatible API.
Use the OpenAI Python SDK with the base_url parameter pointed to your local server for inference.
Ensure your local server is properly configured with model files and network accessibility.
Async and streaming calls are supported if your local server implements those features.
Troubleshoot connection, authentication, and model loading issues by verifying server status and configuration.

Verified 2026-04 · deepseek-reasoner, deepseek-chat

Verify ↗