LLM output hallucination risk estimator
Paste a Large Language Model response (and optionally the original prompt) and the estimator scores how likely the answer contains hallucinated facts. Twelve heuristics scan for unsupported citations, precise dates and numbers, confident absolutes without hedging, named-person attributions, dollar figures, percentages, URLs that look fabricated, and other typical patterns that correlate with model errors. The result is a 0-100 risk score, a per-heuristic breakdown, the exact spans that triggered each flag, and an actionable list of things to verify before shipping the answer.
No data leaves your browser. The estimator runs locally and is heuristic; treat the score as a triage signal, not as ground truth.
Why a hallucination risk estimator matters in 2026
Large Language Models still hallucinate. Even the strongest 2026 models (GPT-5.5, Claude Opus 4.7, Gemini 3 Pro) produce confidently wrong outputs when the user asks a precise factual question without grounding context, when the prompt requires recalling a specific date or statistic, or when the question lies near the training cut-off. The hallucination is rarely random: it follows predictable patterns. Models invent plausible-looking URLs that 404, attribute quotes to people who never said them, cite studies that do not exist, fabricate percentages that round nicely, and double down with confident absolutes when they are least sure. A pre-shipping triage step that flags those patterns before a human review prevents most embarrassments.
This estimator runs twelve fast heuristics on the model response you paste in. Each heuristic looks for a known hallucination signal and adds a weighted contribution to the overall risk score. The result is not a probability that the answer is wrong, it is a triage indicator that tells you which sentences deserve manual verification. A score below 25 suggests the answer is mostly safe to ship; 25-55 means a few claims should be verified; above 55 means the answer is risky and should not be shipped without a careful fact-check.
How the estimator scores a response
- Tokenise the response into sentences and rough phrases.
- Run the heuristics in parallel: precise dates, precise numbers and percentages, currency figures, named entities, “according to” attributions without URL, URLs that look fabricated (e.g. arxiv.org links with non-existent IDs), confident absolutes (always, never, all, none), invented book / paper titles, lab studies without citation, vague hedging absent, prompt-grounded vs ungrounded claims.
- Compute a weighted sum. Each heuristic carries a per-occurrence weight (e.g. an unsupported percentage is worth 6 risk points, a fabricated-looking URL 12). The total is capped at 100.
- Apply prompt-grounding bonus: if the original prompt was provided and contains the same facts, the risk drops because the model is repeating, not inventing.
- Render the verdict: green / amber / red bands, list of triggering spans, and remediation suggestions ranked by impact.
Common use cases for the estimator
- Pre-publish review of AI-assisted content. Blog posts, marketing copy, internal docs that are drafted by an LLM benefit from a quick triage before a human signs off.
- RAG pipeline QA. Even with retrieval, the model can drift away from the source. Running the estimator on a sample of generations catches drift and lets you tune the retriever or the system prompt.
- Customer-facing chatbot guardrail. When the chatbot is about to send a long answer, run the estimator first; if the score is above 55, surface a confidence warning or route to a human.
- Educational use. Show students or new team members the typical hallucination patterns so they learn to spot them naturally.
- Compliance and risk. In regulated industries, a documented triage step is part of the audit trail for AI-generated content.
- Prompt iteration. Compare the risk score of two prompt variants on the same question and pick the one with the lower score; usually that means adding hedging instructions or grounding context.
Limitations and honesty notes
The estimator is a heuristic. It does not check facts against the real world, does not run web searches, does not consult a vector database. It looks at surface patterns in the response. A confident, well-hedged answer that is completely wrong will pass with a low score. A correct answer that quotes lots of dates will score amber. The output is a triage signal that tells humans where to look, not a verdict on truth. Two patterns are particularly hard to catch without a fact-check oracle: invented citations that look real (a plausible journal and year combination), and confident statements about niche topics where the model has no training data but no hedge either. Both deserve manual review regardless of the estimator score.
The whole analysis runs in your browser. Prompt, response and intermediate scoring data never reach the PeopleAreGeek server. The estimator is safe to use with confidential drafts, customer transcripts and proprietary content. The patterns and weights are documented in the Heuristics breakdown tab and you can audit the source code by viewing the page source.
Frequently asked questions
How accurate is the estimator?
It is heuristic, not a fact-checker. The correlation between high estimator scores and real hallucinations on the test set we built (around 400 LLM responses, 50/50 ground-truth right/wrong) is roughly 0.7. Good enough as a triage signal; not enough to skip a human review on critical content.
Why does my hedged answer still score amber?
Hedging reduces the score but does not zero it out. If the response also contains precise dates, percentages or named entities, those still trigger heuristics. The amber band tells you the answer is acceptable but a few specific claims still deserve a check.
Can I use this on responses in other languages?
Yes, but the heuristics are tuned for English and French. Patterns like precise dates, percentages and URLs detect cross-language. Hedging-word detection and citation-attribution patterns are less accurate in other languages, so the score may slightly under-count risk.
What is the difference between an “unsupported claim” and a “hedged claim”?
An unsupported claim states a fact without qualifier: “France has 67.4 million inhabitants.” A hedged claim signals uncertainty: “France has approximately 68 million inhabitants as of the most recent estimate.” The same number, but the hedge tells the reader to verify. The estimator rewards hedging.
Should I run the estimator before or after my RAG retrieval?
After. The retrieved context is part of the prompt; once the model has produced its final response, run the estimator on the response. Optionally include the retrieved context in the prompt field so the estimator applies its grounding bonus.
Is my input stored?
No. The estimator runs entirely in your browser. The prompt and response you paste are processed locally with JavaScript. No HTTP request is sent during the analysis. Refresh the page and the inputs are gone.













