Token Counter for GPT-5, Claude 4 & Gemini 3: Cost & Context Estimator

Local AI token estimator

Paste a prompt, an article or a chunk of code and estimate how many tokens it will use across GPT-5.5, GPT-5 mini, Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4, Gemini 3 Pro and Gemini 3 Flash. The page compares the counts, projects the cost for input and output runs, shows how the text fits inside each model’s 256K to 2M context window, and visualises the rough tokenisation so you can see where a long prompt wastes budget. Everything runs locally in your browser.

Text to analyse

Expected output length (tokens)

Number of runs to estimate

Counts are estimates based on each provider’s published tokenisation profile. For exact GPT counts use OpenAI tiktoken; for Claude use the official Anthropic counter. This page does not send your text anywhere.

What is a token counter and why it matters

Large language models do not see characters or words the way humans do. Before any GPT, Claude or Gemini prompt is processed, the text is split into tokens by a tokenizer. A token can be a short word, part of a longer word, a punctuation mark, a digit group, a newline or even a single Unicode character for less common scripts. Models then read, generate and bill per token. Knowing the token count of a prompt and an expected reply is the only honest way to estimate cost, plan a context window and avoid silent truncations that quietly cut off the most important part of the input.

This token counter takes the text you paste, applies a per-provider estimator that mirrors the published behaviour of GPT-4o, Claude and Gemini tokenizers, and projects how the same content would be billed and stored on each model. The estimator is deliberately conservative: it slightly overcounts on short snippets and slightly undercounts on dense code, so production planning stays on the safe side. The text never leaves the browser, which is important if the content includes secrets, customer data or unreleased product information.

How tokenisation works across GPT-5, Claude 4 and Gemini 3

OpenAI models use byte-pair encoding (BPE) tokenizers. The GPT-5 family inherits the o200k_base-style vocabulary that landed with GPT-4o, with extra entries that improve efficiency on code and non-English text. Anthropic’s Claude 4 (Opus 4.7, Sonnet 4.6 and Haiku 4) uses a different BPE tokenizer that is also tuned for code and multilingual content and tends to land within a few percent of OpenAI counts on similar text. Gemini 3 (Pro and Flash) uses a SentencePiece-style tokenizer with a separate vocabulary, which sometimes splits the same input into more tokens, especially on punctuation-heavy text and emoji.

Normalise the text so encoded characters, line endings and stray whitespace match what the API will actually receive.
Detect common chunks: words, numbers, punctuation, code symbols, line breaks and rare characters.
Apply per-category cost: short common words usually map to one token, longer words split into multiple tokens, code symbols and punctuation are often a single token each, line breaks count, and Unicode-heavy text can use several tokens per character.
Apply a model-specific multiplier to reflect differences between BPE and SentencePiece vocabularies.
Project context fit and cost: combine the input estimate with the expected output and the model’s published pricing to get a per-run figure that scales linearly to monthly volume.

Common use cases for a token counter

Cost forecasting before a launch. Multiply the input plus output estimate by your projected monthly run count to see if a feature is viable on a premium model like Opus 4.7 or GPT-5.5, or only on a smaller one like Haiku 4 or Gemini 3 Flash. A 1,500-token system prompt sent 100,000 times a month adds up faster than most teams expect.
Prompt budget design. When a system prompt, a retrieved knowledge chunk and a user message all share the same context window, knowing each component’s token count helps split the budget instead of finding out at runtime that the document was truncated.
Model selection. A 50,000-token input is trivial on Opus 4.7 (1M window) or Gemini 3 Pro (2M window) but uncomfortable on a smaller legacy model with a 32K window. The fit chart on this page makes that comparison visible at a glance.
Audit of long content pipelines. Articles, transcripts and PDFs often exceed expectations once tokenised. Testing a typical document on this page reveals whether a summarisation pipeline will hit the context limit on outliers.
RAG and embeddings sizing. Retrieval augmented generation systems break documents into chunks of a target token size. Pasting a representative chunk here confirms the splitter is doing the right thing before you embed thousands of pages.
Educational walk-throughs. Showing a non-technical stakeholder how the same paragraph maps to 80 tokens on one model and 92 on another makes pricing decisions much easier to defend.
Latency planning. Output tokens often dominate response time. A clear estimate of the expected reply length helps avoid timeouts on long generations.

Limitations and privacy notes

This counter is an estimator, not the official tokenizer for any provider. The actual byte-pair encoding of GPT-4o and the SentencePiece encoding used by Gemini are deterministic but require shipping multi-megabyte vocabulary files to the browser, which would slow the page down for everyone. The estimator implemented here matches the official counts within a few percent on typical English text, code and mixed content. It can drift further on dense emoji strings, unusual scripts or very short snippets. For exact accounting, use the OpenAI tiktoken library, the Anthropic token counter API, or the official Google AI counters before billing.

The text is processed entirely inside your browser. Nothing is sent to PeopleAreGeek or to any third party. You can paste prompts, system messages, retrieval chunks or production samples without exposing them to a network round trip. The model pricing in the cost estimator reflects published list prices in 2026 and does not include enterprise discounts, batch APIs or cached input pricing, which can cut the cost significantly for high-volume workloads.

Frequently asked questions

Why does the same text use a different number of tokens on each model?

Each provider trains a separate tokenizer with its own vocabulary. The split for the same word can differ by one token, and the spread accumulates on long inputs. GPT-5 and Claude 4 are usually the most efficient on mixed languages and tend to land within a few percent of each other; Gemini 3 uses slightly more tokens on punctuation-heavy text and emoji.

How accurate is this estimator compared to the official tokenizer?

For normal English prose and code the estimator is within roughly three to five percent of tiktoken and the Anthropic counter. It can drift more on emoji bursts, unusual scripts or extremely short snippets. For final billing or hard context-limit decisions, run the exact tokenizer of the provider you ship to.

Do output tokens cost the same as input tokens?

No. Most providers charge output tokens at roughly three to five times the input rate. The cost estimator on this page applies the correct per-direction price for each model, so a long reply is reflected fairly even when the prompt is short.

Does the page send my text to a server to count it?

No. The whole estimator runs in your browser. You can paste secrets, customer messages or production prompts and they never leave your machine. The page works offline once it has loaded.

What is a context window and how does it relate to tokens?

The context window is the maximum number of tokens a model can read in one call, counting the system prompt, the conversation history, retrieved chunks, the user message and the model’s own reply. GPT-5.5 supports 400,000 tokens; Claude Opus 4.7 supports 1,000,000; Gemini 3 Pro supports up to 2,000,000. The fit chart on this page shows what percentage of each window your text would consume.

How can I reduce a token bill without losing quality?

Trim repetitive boilerplate from system prompts, move stable instructions into cached-prompt features when available, prefer Markdown or structured JSON over verbose natural-language framing, summarise retrieved chunks before sending them, and pick a smaller model for routing or classification steps. Even a 20 percent prompt reduction is significant at production volume.

Related tools and resources

Token planning works best when paired with prompt engineering and content audits. The tools below cover the next steps once you know the token cost.

AI Prompt Generator Prompt Improver AI Text Cleaner SEO Content Brief Generator JSON Formatter Code Comment Generator Regex Tester