If you work with AI APIs like OpenAI, Anthropic, or Google, you've seen the word "tokens" everywhere. Tokens directly affect how much you pay, how fast your responses come, and whether your prompt fits within the model's context window. This guide explains everything a developer needs to know.
A token is a chunk of text that an AI model processes as a single unit. It's not exactly a word — it's closer to a syllable or a common character sequence. The tokenizer breaks your text into these chunks before the model sees it.
For example, with OpenAI's cl100k_base tokenizer:
"Hello world" → 2 tokens (Hello, world)"tokenization" → 1 token (common word, single token)"supercalifragilistic" → 5 tokens (rare word, split into pieces)"{"key": "value"}" → 7 tokens (JSON syntax uses many)Three reasons:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context |
|---|---|---|---|
| GPT-4o | $5.00 | $15.00 | 128K |
| GPT-4o mini | $0.15 | $0.60 | 128K |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K |
| Gemini 1.5 Flash | $0.075 | $0.30 | 1M |
| Llama 3.1 70B | $0.88 | $0.88 | 128K |
The most accurate way is to use the tokenizer library for your provider:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode("Your prompt here")
print(len(tokens)) # exact token count
Or use WeighMyPrompt — it runs the same tiktoken tokenizer directly in your browser via WebAssembly. No data is sent anywhere.
Each provider uses a different tokenizer:
cl100k_base (GPT-3.5, GPT-4 Turbo) or o200k_base (GPT-4o, o1)This means the same text has a different token count on each provider. WeighMyPrompt uses exact counting for OpenAI and smart approximation for others.
Try WeighMyPrompt — paste your prompt, see exact tokens, compare costs across 16 models, and optimize with one click. 100% free, 100% private.