Tokens: Why 'Hello World' Doesn't Cost the Same Everywhere

November 14, 2025

If you were to send the same prompt to three different LLM providers, you'd expect to pay roughly the same amount, right? After all, it's the same text, same length, same intent.

Try it. Send this to three different APIs:

hello world

Check your usage. You'll see three different numbers:

Provider A: 3 input tokens
Provider B: 5 input tokens
Provider C: 11 input tokens

Same prompt. Different bill. Over thousands of API calls, "3 vs 11 tokens" becomes a real line item on your cloud costs.

This post is for developers who call LLM APIs but don't fully understand why prompts cost what they cost. We'll build from first principles using simple examples, not heavy math.

What Models Actually See

At a low level, a language model doesn't understand text. It only understands sequences of integers.

When you send "hello world" to an API, the provider doesn't feed those characters directly to the model. Instead:

Your text gets split into chunks (tokens)
Each chunk maps to a unique integer in the model's vocabulary
That integer sequence is what the model actually processes

For example, "hello world" might become:

"hello world" → ["hello", " world"] → [1234, 5678]

The model never sees the string. It only sees [1234, 5678].

Those integers are the currency the model works with—and the same currency you're billed for.

The Pipeline: Text to Numbers and Back

Here's the full flow:

Your prompt → Tokenizer → Integer IDs → Model → Integer IDs → Detokenizer → Response

In TypeScript terms, you can think of it like:

function encode(text: string): number[] {
  // Split text into tokens, map to IDs
  return [1234, 5678];
}

function decode(tokens: number[]): string {
  // Map IDs back to text
  return "hello world";
}

The tokenizer is the part that decides how to split your text. Different providers use different tokenizers, which means the same text can split into different tokens.

That's the first reason why token counts vary.

Character-Level Tokenizers (and Why They're Expensive)

The simplest tokenizer splits text into individual characters:

"cat" → ["c", "a", "t"]

One character equals one token.

For a sentence like "cat sat on the mat" (including spaces):

c, a, t, ␣, s, a, t, ␣, o, n, ␣, t, h, e, ␣, m, a, t

That's 18 tokens for a very short sentence.

Character tokenizers are:

Easy to implement
Very flexible (they can handle any text)
But they produce a lot of tokens

More tokens means:

Higher API costs (you pay per token)
Shorter context windows (models have token limits)
Slower responses (more tokens to process)

No modern LLM uses pure character-level tokenization for this reason.

Subword Tokenizers: The Modern Approach

Modern LLMs use subword tokenizers like BPE (Byte Pair Encoding) or SentencePiece.

The idea: instead of splitting by character or word, learn common chunks from training data. Frequent sequences like "ing", "tion", "under" become single tokens.

For example:

"understanding" → ["under", "stand", "ing"]
"tokenization" → ["token", "ization"]
"the" → ["the"] (common word, single token)

This leads to two important implications:

Fewer tokens per sentence (better for users)
Larger vocabulary (the model needs more embeddings)

There's a tradeoff here:

Small vocabulary → more tokens per sentence → cheaper model, more expensive prompts
Large vocabulary → fewer tokens per sentence → larger model, cheaper prompts

Different providers make different choices, which affects how many tokens your prompt becomes.

Rare Words, Code, and Non-English Text

Tokenizers are data-driven. They learn frequent chunks from their training data, which leads to some non-obvious behavior:

Common English words tend to be single tokens:

"the" → ["the"]
"function" → ["function"]

Rare or made-up words get split into multiple pieces:

"frabjous" → ["fr", "abj", "ous"]

Code with uncommon identifiers can split unexpectedly:

const functionName = "getUserProfileById";
// might tokenize as: ["get", "User", "Profile", "By", "Id"]

Less common languages often get broken into more tokens than English. A Chinese prompt might use significantly more tokens than an English prompt of the same "meaning."

Two strings with the same character length can have very different token counts. A prompt full of rare terms or low-resource languages will be more "expensive" in tokens than plain English.

Why Providers Disagree on Token Counts

Back to our opening mystery: why does "hello world" produce different token counts across providers?

Several reasons:

1. Different vocabularies
Each provider trains their own tokenizer on different data. One might have "hello" as a single token; another might split it as ["hel", "lo"].

2. Different tokenization algorithms
Even similar algorithms differ in how they handle whitespace, punctuation, and normalization.

3. Hidden prompt wrappers
You send "hello world", but the provider might silently add system prompts, role markers, and separators. What you see isn't what gets tokenized.

4. Billing rules
Providers charge different rates for input vs output tokens. Some round up; others count special tokens differently.

The result: text + tokenizer + wrappers + billing = different counts for the same visible string

What You Can Do About It

Now that you understand what's happening, here's what you can actually do.

Use Your Provider's Tokenizer

Most providers offer tokenizer tools (CLI, web UI, or API). Use them to inspect how your prompts split.

Treat tokens per request as a first-class metric, not an afterthought.

Estimate Costs in Your Code

For English, a rough heuristic:

1 token ≈ 3–4 characters
1 token ≈ ¾ of a word

You can implement a quick estimator:

export function roughTokenEstimate(text: string): number {
  return Math.ceil(text.length / 4);
}

Use the real tokenizer for billing-critical paths. Use the rough estimate for fast UI feedback.

Design Prompts with Tokens in Mind

De-duplicate boilerplate system prompts across requests
Prefer concise formatting over decorative formatting
Keep few-shot examples short when possible
Avoid pasting large code blocks if a short description would work

Monitor Token Usage in Production

Treat tokens like any other resource:

Log input tokens, output tokens, and total cost per request
Add alerts when usage spikes or average tokens per request drifts upward
Correlate tokens with latency and error rate

Checklist: Before Your Next API Call

✓ Check your provider's tokenizer docs
✓ Measure token counts on real prompts
✓ Estimate cost: tokens per 1K × price per 1K
✓ Simplify wording where possible
✓ Monitor input/output tokens in production
✓ Experiment with alternate phrasings

Tokens are the hidden currency of LLMs. Once you understand how they're created and billed, you can design prompts and systems that aren't just smarter—they're faster and cheaper too.