Tokens: Why 'Hello World' Doesn't Cost the Same Everywhere
If you were to send the same prompt to three different LLM providers, you'd expect to pay roughly the same amount, right? After all, it's the same text, same length, same intent.
Try it. Send this to three different APIs:
hello world
Check your usage. You'll see three different numbers:
- Provider A: 3 input tokens
- Provider B: 5 input tokens
- Provider C: 11 input tokens
Same prompt. Different bill. Over thousands of API calls, "3 vs 11 tokens" becomes a real line item on your cloud costs.
This post is for developers who call LLM APIs but don't fully understand why prompts cost what they cost. We'll build from first principles using simple examples, not heavy math.
What Models Actually See
At a low level, a language model doesn't understand text. It only understands sequences of integers.
When you send "hello world" to an API, the provider doesn't feed those characters directly to the model. Instead:
- Your text gets split into chunks (tokens)
- Each chunk maps to a unique integer in the model's vocabulary
- That integer sequence is what the model actually processes
For example, "hello world" might become:
"hello world" → ["hello", " world"] → [1234, 5678]
The model never sees the string. It only sees [1234, 5678].
Those integers are the currency the model works with—and the same currency you're billed for.
The Pipeline: Text to Numbers and Back
Here's the full flow:
Your prompt → Tokenizer → Integer IDs → Model → Integer IDs → Detokenizer → Response
In TypeScript terms, you can think of it like:
function encode(text: string): number[] {
// Split text into tokens, map to IDs
return [1234, 5678];
}
function decode(tokens: number[]): string {
// Map IDs back to text
return "hello world";
}
The tokenizer is the part that decides how to split your text. Different providers use different tokenizers, which means the same text can split into different tokens.
That's the first reason why token counts vary.
Character-Level Tokenizers (and Why They're Expensive)
The simplest tokenizer splits text into individual characters:
"cat" → ["c", "a", "t"]
One character equals one token.
For a sentence like "cat sat on the mat" (including spaces):
c, a, t, ␣, s, a, t, ␣, o, n, ␣, t, h, e, ␣, m, a, t
That's 18 tokens for a very short sentence.
Character tokenizers are:
- Easy to implement
- Very flexible (they can handle any text)
- But they produce a lot of tokens
More tokens means:
- Higher API costs (you pay per token)
- Shorter context windows (models have token limits)
- Slower responses (more tokens to process)
No modern LLM uses pure character-level tokenization for this reason.
Subword Tokenizers: The Modern Approach
Modern LLMs use subword tokenizers like BPE (Byte Pair Encoding) or SentencePiece.
The idea: instead of splitting by character or word, learn common chunks from training data. Frequent sequences like "ing", "tion", "under" become single tokens.
For example:
- "understanding" → ["under", "stand", "ing"]
- "tokenization" → ["token", "ization"]
- "the" → ["the"] (common word, single token)
This leads to two important implications:
- Fewer tokens per sentence (better for users)
- Larger vocabulary (the model needs more embeddings)
There's a tradeoff here:
- Small vocabulary → more tokens per sentence → cheaper model, more expensive prompts
- Large vocabulary → fewer tokens per sentence → larger model, cheaper prompts
Different providers make different choices, which affects how many tokens your prompt becomes.
Rare Words, Code, and Non-English Text
Tokenizers are data-driven. They learn frequent chunks from their training data, which leads to some non-obvious behavior:
Common English words tend to be single tokens:
"the" → ["the"]
"function" → ["function"]
Rare or made-up words get split into multiple pieces:
"frabjous" → ["fr", "abj", "ous"]
Code with uncommon identifiers can split unexpectedly:
const functionName = "getUserProfileById";
// might tokenize as: ["get", "User", "Profile", "By", "Id"]
Less common languages often get broken into more tokens than English. A Chinese prompt might use significantly more tokens than an English prompt of the same "meaning."
Two strings with the same character length can have very different token counts. A prompt full of rare terms or low-resource languages will be more "expensive" in tokens than plain English.
Why Providers Disagree on Token Counts
Back to our opening mystery: why does "hello world" produce different token counts across providers?
Several reasons:
1. Different vocabularies
Each provider trains their own tokenizer on different data. One might have "hello" as a single token; another might split it as ["hel", "lo"].
2. Different tokenization algorithms
Even similar algorithms differ in how they handle whitespace, punctuation, and normalization.
3. Hidden prompt wrappers
You send "hello world", but the provider might silently add system prompts, role markers, and separators. What you see isn't what gets tokenized.
4. Billing rules
Providers charge different rates for input vs output tokens. Some round up; others count special tokens differently.
The result: text + tokenizer + wrappers + billing = different counts for the same visible string
What You Can Do About It
Now that you understand what's happening, here's what you can actually do.
Use Your Provider's Tokenizer
Most providers offer tokenizer tools (CLI, web UI, or API). Use them to inspect how your prompts split.
Treat tokens per request as a first-class metric, not an afterthought.
Estimate Costs in Your Code
For English, a rough heuristic:
- 1 token ≈ 3–4 characters
- 1 token ≈ ¾ of a word
You can implement a quick estimator:
export function roughTokenEstimate(text: string): number {
return Math.ceil(text.length / 4);
}
Use the real tokenizer for billing-critical paths. Use the rough estimate for fast UI feedback.
Design Prompts with Tokens in Mind
- De-duplicate boilerplate system prompts across requests
- Prefer concise formatting over decorative formatting
- Keep few-shot examples short when possible
- Avoid pasting large code blocks if a short description would work
Monitor Token Usage in Production
Treat tokens like any other resource:
- Log input tokens, output tokens, and total cost per request
- Add alerts when usage spikes or average tokens per request drifts upward
- Correlate tokens with latency and error rate
Checklist: Before Your Next API Call
- ✓ Check your provider's tokenizer docs
- ✓ Measure token counts on real prompts
- ✓ Estimate cost: tokens per 1K × price per 1K
- ✓ Simplify wording where possible
- ✓ Monitor input/output tokens in production
- ✓ Experiment with alternate phrasings
Tokens are the hidden currency of LLMs. Once you understand how they're created and billed, you can design prompts and systems that aren't just smarter—they're faster and cheaper too.