Code beginner · 3 min read

How to use LM Studio API with python

Q: How to use LM Studio API with python

Use the ollama Python client to send prompts to LM Studio models by initializing the client and calling client.chat() with your prompt and model name.

Direct answer

Use the ollama Python client to send prompts to LM Studio models by initializing the client and calling client.chat() with your prompt and model name.

Setup

Install

bash

pip install ollama

Imports

python

import ollama

Examples

inHello, how are you?

outI'm doing great, thanks for asking! How can I assist you today?

inSummarize the benefits of AI.

outAI improves efficiency, automates tasks, enhances decision-making, and enables new innovations across industries.

outPlease provide a prompt to generate a response.

Integration steps

Install the Ollama Python package with pip.
Ollama runs locally and requires no API key or environment variable.
Import ollama and call ollama.chat() with the model name and your prompt.
Receive and process the response text from the API call.

Full code

python

import ollama

# Define the prompt and model
prompt = "Hello, how are you?"
model = "llama2"

# Call the LM Studio API via Ollama client
response = ollama.chat(model=model, messages=[{"role": "user", "content": prompt}])

# Print the generated response
print("Response:", response['choices'][0]['message']['content'])

output

Response: I'm doing great, thanks for asking! How can I assist you today?

API trace

Request

json

{"model": "llama2", "messages": [{"role": "user", "content": "Hello, how are you?"}]}

Response

json

{"choices": [{"message": {"content": "I'm doing great, thanks for asking! How can I assist you today?"}}]}

Extractresponse['choices'][0]['message']['content']

Variants

Streaming response ›

Use streaming to display partial results immediately for better user experience with long outputs.

python

import ollama

prompt = "Tell me a story about a robot."
model = "llama2"

# Stream the response tokens
for chunk in ollama.chat_stream(model=model, messages=[{"role": "user", "content": prompt}]):
    print(chunk['choices'][0]['delta'].get('content', ''), end='', flush=True)
print()

Async version ›

Use async calls to handle multiple concurrent requests efficiently in asynchronous applications.

python

import asyncio
import ollama

async def main():
    prompt = "Explain quantum computing in simple terms."
    model = "llama2"
    response = await ollama.chat_async(model=model, messages=[{"role": "user", "content": prompt}])
    print("Response:", response['choices'][0]['message']['content'])

asyncio.run(main())

Alternative model ›

Use a chat-optimized model like llama2-chat for conversational or dialogue-based tasks.

python

import ollama

prompt = "Write a poem about spring."
model = "llama2-chat"

response = ollama.chat(model=model, messages=[{"role": "user", "content": prompt}])
print("Response:", response['choices'][0]['message']['content'])

Performance

Latency~500ms to 1s per request depending on model size and prompt length

CostOllama is free and open-source software running locally, so no usage costs apply.

Rate limitsNo rate limits as Ollama runs locally on your hardware.

Keep prompts concise to reduce token usage.
Use shorter model names or lighter models for faster responses.
Cache frequent queries to avoid repeated calls.

Approach	Latency	Cost/call	Best for
Standard sync call	~500ms-1s	Free	Simple synchronous use cases
Streaming response	Starts immediately, total ~1s	Free	Long outputs with better UX
Async call	~500ms-1s	Free	Concurrent requests in async apps

✓

Quick tip

Ollama runs locally and requires no API key or environment variable, so no credentials need to be set.

⚠

Common mistake

Beginners often try to set an <code>OLLAMA_API_KEY</code> environment variable, but Ollama requires no authentication and runs locally.

Verified 2026-04 · llama2, llama2-chat

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.