AI Token Counter
Count tokens and estimate cost for GPT-5, Claude 4 and Gemini 3 before you call the API.
This AI token counter estimates how many tokens your text becomes across GPT-5.5, GPT-5 mini, Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4, Gemini 3 Pro and Gemini 3 Flash, then projects the cost per run and how the input fits inside each context window. It is the tool to keep open while you design a prompt, because tokens are how every model reads, generates and bills, and a system prompt that looks short can quietly cost real money at production volume. Paste a prompt, an article or a chunk of code, set the expected reply length, and compare the models side by side. Everything runs in your browser, so secrets and customer data never leave the page.
100% in your browser. Nothing you type ever leaves this page.
Local AI token estimator
Paste a prompt, an article or a chunk of code and estimate how many tokens it will use across GPT-5.5, GPT-5 mini, Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4, Gemini 3 Pro and Gemini 3 Flash. The page compares the counts, projects the cost for input and output runs, shows how the text fits inside each model's 256K to 2M context window, and visualises the rough tokenisation so you can see where a long prompt wastes budget. Everything runs locally in your browser.
Counts are estimates based on each provider's published tokenisation profile. For exact GPT counts use OpenAI tiktoken; for Claude use the official Anthropic counter. This page does not send your text anywhere.
What an AI token counter does
An AI token counter takes the text you would send to a model and estimates how many tokens it becomes, because models read, generate and bill per token rather than per word or character. A token can be a short word, part of a longer word, a punctuation mark, a digit group, a newline or a single Unicode character. Knowing the token count of a prompt and its expected reply is the only honest way to estimate cost, plan a context window and avoid silent truncations that quietly cut off the most important part of the input. This tool applies a per-provider estimate for GPT-5.5, GPT-5 mini, Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4, Gemini 3 Pro and Gemini 3 Flash, then projects how the same content is billed and stored on each one.
When to use it
- Cost forecasting before a launch. Multiply the input plus output estimate by your projected run count to see whether a feature is viable on a premium model like Opus 4.7 or GPT-5.5, or only on a smaller one like Haiku 4 or Gemini 3 Flash. A 1,500 token system prompt sent 100,000 times a month adds up faster than most teams expect.
- Prompt budget design. When a system prompt, a retrieved chunk and a user message share one context window, knowing each part's token count lets you split the budget instead of finding out at runtime that the document was truncated.
- Model selection. A 50,000 token input is trivial on Opus 4.7 or Gemini 3 Pro but uncomfortable on a smaller legacy model. The fit chart makes that comparison visible at a glance.
- RAG and embeddings sizing. Paste a representative chunk to confirm your splitter targets the right token size before you embed thousands of pages.
Accuracy and privacy
This is an estimator, not the official tokenizer for any provider. Shipping the real byte-pair and SentencePiece vocabularies to the browser would mean multi-megabyte downloads that slow the page down for everyone, so the estimate here is tuned to match official counts within a few percent on typical English prose, code and mixed content. It can drift further on dense emoji strings, unusual scripts or very short snippets, so for final billing or a hard context-limit decision, run the OpenAI tiktoken library, the Anthropic token counter or the Google AI counters. The text is processed entirely inside your browser. Nothing is sent to PeopleAreGeek or any third party, so you can paste prompts, system messages or production samples without a network round trip. The prices reflect published 2026 list rates and exclude batch APIs, cached input and enterprise discounts.
Frequently asked questions
Why does the same text use a different number of tokens on each model?
Each provider trains a separate tokenizer with its own vocabulary. The split for the same word can differ by one token, and the spread adds up on long inputs. GPT-5 and Claude 4 are usually the most efficient on mixed languages and tend to land within a few percent of each other. Gemini 3 uses slightly more tokens on punctuation-heavy text and emoji.
How accurate is this estimator compared to the official tokenizer?
For normal English prose and code the estimate is within roughly three to five percent of tiktoken and the Anthropic counter. It can drift more on emoji bursts, unusual scripts or extremely short snippets. For final billing or hard context-limit decisions, run the exact tokenizer of the provider you ship to.
Do output tokens cost the same as input tokens?
No. Most providers charge output tokens at roughly three to five times the input rate. The cost estimator on this page applies the correct per-direction price for each model, so a long reply is reflected fairly even when the prompt is short.
Does the page send my text to a server to count it?
No. The whole estimator runs in your browser. You can paste secrets, customer messages or production prompts and they never leave your machine. The page works offline once it has loaded.