How to Intermediate · 3 min read

How to count tokens in streamed response

Quick answer

To count tokens in a streamed response, accumulate the content from each chunk's delta.content and use a tokenizer like tiktoken to count tokens incrementally. This approach lets you track token usage live during streaming with the OpenAI SDK.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 tiktoken

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install packages: pip install openai tiktoken
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai tiktoken

output

Collecting openai
Collecting tiktoken
Successfully installed openai-1.x.x tiktoken-x.x.x

Step by step

This example streams a chat completion from gpt-4o, accumulates the streamed text, and counts tokens in real-time using tiktoken. It prints the partial response and the current token count after each chunk.

python

import os
from openai import OpenAI
import tiktoken

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def count_tokens(text: str, encoding_name: str = "cl100k_base") -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    return len(encoding.encode(text))

messages = [{"role": "user", "content": "Explain the benefits of AI streaming."}]

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

accumulated_text = ""
for chunk in stream:
    delta = chunk.choices[0].delta
    content = delta.get("content", "")
    accumulated_text += content
    token_count = count_tokens(accumulated_text)
    print(f"Partial response: {content}")
    print(f"Tokens so far: {token_count}\n")

output

Partial response: AI streaming offers real-time
Tokens so far: 5

Partial response:  benefits such as lower latency,
Tokens so far: 11

Partial response:  improved user experience, and
Tokens so far: 17

Partial response:  efficient resource utilization.
Tokens so far: 23

Common variations

You can adapt token counting for different models by changing the encoding_name in tiktoken.get_encoding(). For example, use r50k_base for older GPT-3 models.

For asynchronous streaming, use an async client and async for to iterate chunks.

python

import os
import asyncio
from openai import OpenAI
import tiktoken

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def count_tokens_async():
    encoding = tiktoken.get_encoding("cl100k_base")
    messages = [{"role": "user", "content": "Explain token counting asynchronously."}]
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    accumulated_text = ""
    async for chunk in stream:
        content = chunk.choices[0].delta.get("content", "")
        accumulated_text += content
        token_count = len(encoding.encode(accumulated_text))
        print(f"Partial response: {content}")
        print(f"Tokens so far: {token_count}\n")

asyncio.run(count_tokens_async())

output

Partial response: Token counting asynchronously
Tokens so far: 4

Partial response:  allows real-time monitoring
Tokens so far: 9

Partial response:  of usage during streaming.
Tokens so far: 14

Troubleshooting

If token counts seem off: Ensure you use the correct tiktoken encoding matching your model (e.g., cl100k_base for GPT-4o).
If streaming hangs or errors: Check your API key and network connection; streaming requires stable connectivity.
If no content in chunk: Some chunks may only contain metadata; skip those when accumulating text.

Key Takeaways

Use tiktoken to count tokens incrementally on streamed content.
Accumulate delta.content from each chunk to track full response tokens.
Match the tokenizer encoding to your model for accurate token counts.
Streaming requires handling chunks without content gracefully.
Async streaming token counting uses async for with the OpenAI async client.

Verified 2026-04 · gpt-4o

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.