Code beginner · 4 min read

How to use OpenAI API with Flask in python

Q: How to use OpenAI API with Flask in python

Use the openai SDK v1+ with Flask by initializing the OpenAI client using your API key from os.environ, then call client.chat.completions.create inside a Flask route to handle requests.

Direct answer

Use the openai SDK v1+ with Flask by initializing the OpenAI client using your API key from os.environ, then call client.chat.completions.create inside a Flask route to handle requests.

Setup

Install

bash

pip install flask openai

Env vars

OPENAI_API_KEY

Imports

python

import os
from flask import Flask, request, jsonify
from openai import OpenAI

Examples

inUser sends POST request to /chat with JSON {"message": "Hello, AI!"}

out{"response": "Hello! How can I assist you today?"}

inUser sends POST request to /chat with JSON {"message": "Write a haiku about spring."}

out{"response": "Spring breeze softly blows,\nCherry blossoms dance in light,\nNature's breath renewed."}

inUser sends POST request to /chat with JSON {"message": ""}

out{"error": "No message provided"}

Integration steps

Install Flask and OpenAI SDK using pip
Set your OpenAI API key in the environment variable OPENAI_API_KEY
Import Flask and OpenAI client in your Python script
Initialize the OpenAI client with the API key from os.environ
Create a Flask app and define a route to accept user input
Inside the route, call client.chat.completions.create with the user message
Return the AI-generated response as JSON to the client

Full code

python

import os
from flask import Flask, request, jsonify
from openai import OpenAI

app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.route('/chat', methods=['POST'])
def chat():
    data = request.get_json(force=True)
    user_message = data.get('message', '').strip()
    if not user_message:
        return jsonify({"error": "No message provided"}), 400

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}]
    )
    ai_text = response.choices[0].message.content
    return jsonify({"response": ai_text})

if __name__ == '__main__':
    app.run(debug=True, port=5000)

output

 * Serving Flask app 'app' (lazy loading)
 * Environment: development
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

# Example POST request to http://127.0.0.1:5000/chat with JSON {"message": "Hello, AI!"}
# Response:
{"response": "Hello! How can I assist you today?"}

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": "<user message>"}]}

Response

json

{"choices": [{"message": {"content": "<AI response>"}}], "usage": {"total_tokens": 42}}

Extractresponse.choices[0].message.content

Variants

Streaming response with Flask ›

Use streaming to send partial AI responses in real-time for better user experience on long outputs.

python

import os
from flask import Flask, request, Response
from openai import OpenAI

app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.route('/chat_stream', methods=['POST'])
def chat_stream():
    data = request.get_json(force=True)
    user_message = data.get('message', '').strip()
    if not user_message:
        return Response("No message provided", status=400)

    def generate():
        stream = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": user_message}],
            stream=True
        )
        for chunk in stream:
            yield chunk.choices[0].delta.get('content', '')

    return Response(generate(), mimetype='text/plain')

if __name__ == '__main__':
    app.run(debug=True, port=5000)

Async Flask with OpenAI ›

Use async Flask routes to handle multiple concurrent AI requests efficiently.

python

import os
import asyncio
from flask import Flask, request, jsonify
from openai import OpenAI

app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.route('/chat_async', methods=['POST'])
async def chat_async():
    data = await request.get_json(force=True)
    user_message = data.get('message', '').strip()
    if not user_message:
        return jsonify({"error": "No message provided"}), 400

    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}]
    )
    ai_text = response.choices[0].message.content
    return jsonify({"response": ai_text})

if __name__ == '__main__':
    app.run(debug=True, port=5000)

Use Gemini-1.5-pro model instead of GPT-4o ›

Use Gemini-1.5-pro for a balance of speed and cost with strong general AI capabilities.

python

import os
from flask import Flask, request, jsonify
from openai import OpenAI

app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.route('/chat', methods=['POST'])
def chat():
    data = request.get_json(force=True)
    user_message = data.get('message', '').strip()
    if not user_message:
        return jsonify({"error": "No message provided"}), 400

    response = client.chat.completions.create(
        model="gemini-1.5-pro",
        messages=[{"role": "user", "content": user_message}]
    )
    ai_text = response.choices[0].message.content
    return jsonify({"response": ai_text})

if __name__ == '__main__':
    app.run(debug=True, port=5000)

Performance

Latency~800ms for gpt-4o non-streaming calls

Cost~$0.002 per 500 tokens exchanged with gpt-4o

Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute

Keep user messages concise to reduce token usage
Use shorter system prompts or none if possible
Cache frequent responses to avoid repeated calls

Approach	Latency	Cost/call	Best for
Standard sync call	~800ms	~$0.002	Simple Flask apps with moderate traffic
Streaming response	Starts immediately, total varies	Similar	Apps needing real-time partial output
Async calls	~800ms (concurrent)	~$0.002	High concurrency Flask apps

✓

Quick tip

Always load your OpenAI API key securely from environment variables and never hardcode it in your Flask app.

⚠

Common mistake

Beginners often forget to set the API key in the environment or use deprecated SDK methods like openai.ChatCompletion.create().

Verified 2026-04 · gpt-4o, gemini-1.5-pro

Verify ↗