How to Intermediate · 3 min read

How to count tokens in streamed response

Quick answer
To count tokens in a streamed response, accumulate the content from each chunk's delta.content and use a tokenizer like tiktoken to count tokens incrementally. This approach lets you track token usage live during streaming with the OpenAI SDK.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 tiktoken

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install packages: pip install openai tiktoken
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai tiktoken
output
Collecting openai
Collecting tiktoken
Successfully installed openai-1.x.x tiktoken-x.x.x

Step by step

This example streams a chat completion from gpt-4o, accumulates the streamed text, and counts tokens in real-time using tiktoken. It prints the partial response and the current token count after each chunk.

python
import os
from openai import OpenAI
import tiktoken

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def count_tokens(text: str, encoding_name: str = "cl100k_base") -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    return len(encoding.encode(text))

messages = [{"role": "user", "content": "Explain the benefits of AI streaming."}]

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

accumulated_text = ""
for chunk in stream:
    delta = chunk.choices[0].delta
    content = delta.get("content", "")
    accumulated_text += content
    token_count = count_tokens(accumulated_text)
    print(f"Partial response: {content}")
    print(f"Tokens so far: {token_count}\n")
output
Partial response: AI streaming offers real-time
Tokens so far: 5

Partial response:  benefits such as lower latency,
Tokens so far: 11

Partial response:  improved user experience, and
Tokens so far: 17

Partial response:  efficient resource utilization.
Tokens so far: 23

Common variations

You can adapt token counting for different models by changing the encoding_name in tiktoken.get_encoding(). For example, use r50k_base for older GPT-3 models.

For asynchronous streaming, use an async client and async for to iterate chunks.

python
import os
import asyncio
from openai import OpenAI
import tiktoken

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def count_tokens_async():
    encoding = tiktoken.get_encoding("cl100k_base")
    messages = [{"role": "user", "content": "Explain token counting asynchronously."}]
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    accumulated_text = ""
    async for chunk in stream:
        content = chunk.choices[0].delta.get("content", "")
        accumulated_text += content
        token_count = len(encoding.encode(accumulated_text))
        print(f"Partial response: {content}")
        print(f"Tokens so far: {token_count}\n")

asyncio.run(count_tokens_async())
output
Partial response: Token counting asynchronously
Tokens so far: 4

Partial response:  allows real-time monitoring
Tokens so far: 9

Partial response:  of usage during streaming.
Tokens so far: 14

Troubleshooting

  • If token counts seem off: Ensure you use the correct tiktoken encoding matching your model (e.g., cl100k_base for GPT-4o).
  • If streaming hangs or errors: Check your API key and network connection; streaming requires stable connectivity.
  • If no content in chunk: Some chunks may only contain metadata; skip those when accumulating text.

Key Takeaways

  • Use tiktoken to count tokens incrementally on streamed content.
  • Accumulate delta.content from each chunk to track full response tokens.
  • Match the tokenizer encoding to your model for accurate token counts.
  • Streaming requires handling chunks without content gracefully.
  • Async streaming token counting uses async for with the OpenAI async client.
Verified 2026-04 · gpt-4o
Verify ↗