Code beginner · 3 min read

How to build a chatbot with OpenAI Assistants API

Direct answer

Use the OpenAI SDK to create a client and call client.chat.completions.create with the gpt-4o model and a messages array to build a chatbot with the OpenAI Assistants API.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

import os
from openai import OpenAI

Examples

inHello, who are you?

outI am an AI assistant powered by OpenAI. How can I help you today?

inCan you help me write a Python function?

outSure! What kind of function do you need help with?

inTell me a joke about computers.

outWhy do programmers prefer dark mode? Because light attracts bugs!

Integration steps

Import the OpenAI SDK and initialize the client with the API key from os.environ.
Construct a messages list with user input and optionally system or assistant messages.
Call client.chat.completions.create with the model 'gpt-4o' and the messages array.
Extract the chatbot's reply from response.choices[0].message.content.
Display or process the chatbot's response as needed.

Full code

python

import os
from openai import OpenAI

# Initialize OpenAI client with API key from environment
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def chat_with_openai(user_input: str) -> str:
    messages = [
        {"role": "user", "content": user_input}
    ]
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    user_message = input("You: ")
    bot_reply = chat_with_openai(user_message)
    print(f"Assistant: {bot_reply}")

output

You: Hello, who are you?
Assistant: I am an AI assistant powered by OpenAI. How can I help you today?

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello, who are you?"}]}

Response

json

{"choices": [{"message": {"content": "I am an AI assistant powered by OpenAI. How can I help you today?"}}], "usage": {"total_tokens": 20}}

Extractresponse.choices[0].message.content

Variants

Streaming Chatbot ›

Use streaming to provide real-time token-by-token responses for better user experience in chat interfaces.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def chat_stream(user_input: str):
    messages = [{"role": "user", "content": user_input}]
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    for chunk in response:
        print(chunk.choices[0].delta.get("content", ""), end='', flush=True)

if __name__ == "__main__":
    user_message = input("You: ")
    print("Assistant: ", end='')
    chat_stream(user_message)

Async Chatbot ›

Use async calls when integrating the chatbot into asynchronous applications or frameworks to improve concurrency.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def chat_async(user_input: str) -> str:
    messages = [{"role": "user", "content": user_input}]
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages
    )
    return response.choices[0].message.content

async def main():
    user_message = input("You: ")
    bot_reply = await chat_async(user_message)
    print(f"Assistant: {bot_reply}")

if __name__ == "__main__":
    asyncio.run(main())

Use a Smaller Model for Cost Efficiency ›

Use smaller models like gpt-4o-mini to reduce cost and latency when high fidelity is not critical.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def chat_with_smaller_model(user_input: str) -> str:
    messages = [{"role": "user", "content": user_input}]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    user_message = input("You: ")
    bot_reply = chat_with_smaller_model(user_message)
    print(f"Assistant: {bot_reply}")

Performance

Latency~800ms for gpt-4o non-streaming calls

Cost~$0.003 per 500 tokens exchanged with gpt-4o

Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute

Limit conversation history to recent relevant messages to reduce tokens.
Use smaller models like gpt-4o-mini for less critical tasks.
Avoid unnecessary system messages or verbose prompts.

Approach	Latency	Cost/call	Best for
Standard Chat (gpt-4o)	~800ms	~$0.003	High-quality chatbot responses
Streaming Chat	Starts immediately, ~800ms total	~$0.003	Real-time user interaction
Async Chat	~800ms	~$0.003	Concurrent or async app integration
Smaller Model (gpt-4o-mini)	~400ms	~$0.001	Cost-sensitive or lightweight tasks

✓

Quick tip

Always include a clear user message role and keep conversation history concise to optimize token usage and context relevance.

⚠

Common mistake

Beginners often forget to set the API key in the environment or use deprecated SDK methods like openai.ChatCompletion.create().

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.