What Are AI Tokens? Complete Guide for Developers

Last updated: April 15, 2026 · 8 min read

If you work with AI APIs like OpenAI, Anthropic, or Google, you've seen the word "tokens" everywhere. Tokens directly affect how much you pay, how fast your responses come, and whether your prompt fits within the model's context window. This guide explains everything a developer needs to know.

What is a token?

A token is a chunk of text that an AI model processes as a single unit. It's not exactly a word — it's closer to a syllable or a common character sequence. The tokenizer breaks your text into these chunks before the model sees it.

For example, with OpenAI's cl100k_base tokenizer:

"Hello world" → 2 tokens (Hello, world)
"tokenization" → 1 token (common word, single token)
"supercalifragilistic" → 5 tokens (rare word, split into pieces)
"{"key": "value"}" → 7 tokens (JSON syntax uses many)

Why do tokens matter?

Three reasons:

Cost — Every API charges per token. GPT-4o costs $5 per million input tokens. A 1,000-token prompt costs $0.005.
Context window — Each model has a maximum token limit (e.g., GPT-4o: 128K, Claude: 200K, Gemini: 2M). Your prompt + response must fit within this limit.
Speed — More output tokens = longer response time. Models generate 30-200 tokens/second depending on the model.

Token costs across popular models

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context
GPT-4o	$5.00	$15.00	128K
GPT-4o mini	$0.15	$0.60	128K
Claude 3.5 Sonnet	$3.00	$15.00	200K
Gemini 1.5 Flash	$0.075	$0.30	1M
Llama 3.1 70B	$0.88	$0.88	128K

How to count tokens

The most accurate way is to use the tokenizer library for your provider:

import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode("Your prompt here")
print(len(tokens))  # exact token count

Or use WeighMyPrompt — it runs the same tiktoken tokenizer directly in your browser via WebAssembly. No data is sent anywhere.

Tips to reduce token usage

Remove filler words: "basically", "actually", "just", "very"
Shorten verbose phrases: "in order to" → "to", "due to the fact that" → "because"
Use structured formats (JSON, XML tags) instead of prose for complex instructions
Put examples at the end, not the beginning — models weight earlier tokens more
Use a cheaper model for simple tasks: GPT-4o mini is 33x cheaper than GPT-4o

Different tokenizers for different models

Each provider uses a different tokenizer:

OpenAI: cl100k_base (GPT-3.5, GPT-4 Turbo) or o200k_base (GPT-4o, o1)
Anthropic: Custom tokenizer (not publicly available)
Google: SentencePiece-based tokenizer
Mistral: Custom BPE tokenizer

This means the same text has a different token count on each provider. WeighMyPrompt uses exact counting for OpenAI and smart approximation for others.

Start counting

Try WeighMyPrompt — paste your prompt, see exact tokens, compare costs across 30 models, and optimize with one click. 100% free, 100% private.