AI Crawler Blocker

Generate a robots.txt that blocks GPTBot, ClaudeBot and 20 other AI bots, plus Nginx and Apache rules.

This AI crawler blocker lets you block AI crawlers from your site by building the robots.txt rules, plus matching Nginx and Apache directives, that tell each bot to stay out. You toggle individual bots or pick a preset: block training crawlers like GPTBot, ClaudeBot, Google-Extended and CCBot, block in-product fetchers like ChatGPT-User and PerplexityBot, or apply the EU AI Act Article 4(3) opt-out. We leave search engines like Googlebot and Bingbot untouched by default, so your organic ranking stays intact. Add your sitemap, custom Disallow paths and any existing rules to merge, then copy or download the file. Everything runs in your browser, so nothing you paste is uploaded.

100% in your browser. Nothing you type ever leaves this page.

Local robots.txt generator for AI crawlers

This AI crawler blocker builds a robots.txt file that blocks the AI training crawlers and in-product fetchers used by ChatGPT, Claude, Gemini, Perplexity, Meta AI, Apple Intelligence, Common Crawl and 15 other AI bots. You choose which ones to block, keep search engines like Googlebot and Bingbot crawling, and add your sitemap URL and custom paths. We then generate the file for you to copy or download, along with matching Apache and Nginx rules and an EU AI Act opt-out preset. It helps any site owner who wants to stay out of AI training datasets without touching their search ranking. Everything runs in your browser, so nothing you paste is uploaded.

Sitemap URL (optional)

Crawl-delay (optional)

Additional Disallow paths (one per line, applied to ALL user-agents)

Paste your existing robots.txt to merge (optional)

robots.txt is voluntary. Reputable AI vendors (OpenAI, Anthropic, Google, Perplexity, Apple, Meta) honour it. For hard enforcement, also add server-level blocks (.htaccess or Nginx) which are generated above.

What an AI crawler blocker does for your site

An AI crawler blocker lets you block AI crawlers from harvesting your pages, by generating the robots.txt rules and server directives that tell each bot to stay out. Generative AI systems collect web content in two distinct ways. Training crawlers, like OpenAI GPTBot, Anthropic ClaudeBot, Google-Extended and Common Crawl CCBot, sweep the open web to build the datasets that future model versions will learn from. In-product agents, like ChatGPT-User, Perplexity-User and Claude web fetcher, do live retrieval when a chatbot needs an up-to-date page to answer a user prompt. Each kind of bot has its own user-agent string and its own purpose, and many of them now honour robots.txt as the standard opt-out signal.

This generator builds a robots.txt file that addresses every documented AI bot we know about as of 2026: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, claude-web, anthropic-ai, Google-Extended, Applebot-Extended, PerplexityBot, Perplexity-User, Meta-ExternalAgent, FacebookBot, Bytespider, CCBot, Amazonbot, Diffbot, omgilibot, YouBot, Kagibot, Cohere-AI, Timpibot and a few legacy variants. You toggle each bot individually or use a preset like Block AI training or EU AI Act Article 4(3) opt-out. The tool keeps Googlebot, Bingbot and other search crawlers untouched by default so your organic visibility stays intact.

How AI crawler blocking with robots.txt works

The Robots Exclusion Standard, originally drafted in 1994 and formalised in RFC 9309 in 2022, defines a plain-text file at /robots.txt on the root of a domain. The file lists one or more User-agent blocks, each followed by Allow and Disallow directives. When a crawler arrives, it fetches the file, finds the block that matches its user-agent name, and obeys the rules. The mechanism is voluntary: there is no technical enforcement, only a public convention that reputable crawler operators follow. The largest AI vendors publish their bot names and have committed to honouring robots.txt.

Pick the bots you want to block. Training crawlers are the obvious target if you want to opt out of being used as model fuel. In-product fetchers matter if you do not want your live pages summarised by other people chatbots.
Decide on the path scope. Disallow: / blocks the entire site for that user-agent. You can also block sub-paths only, for example /blog/ or /archive/, and leave the rest crawlable.
Keep search engines crawling by not adding their user-agents to the block list. Googlebot, Bingbot, DuckDuckBot, and others should remain explicitly or implicitly allowed.
Add server-level enforcement when the legal or commercial stakes are high. The Nginx and Apache blocks generated by this tool reject the listed user-agents with HTTP 403 even if they ignore robots.txt.
Deploy and verify by uploading the file to your web root and fetching https://yourdomain.com/robots.txt. The impact panel suggests the exact lines to check.

Common use cases for blocking AI crawlers

Publisher protecting editorial work. If your business model depends on visits to your articles, having a language model that summarises everything you publish without sending traffic back is a long-term threat. Blocking training crawlers is the cleanest signal that your content is not free training material.
SaaS hiding paid documentation. Knowledge bases that sit behind a paywall or a login should not show up in scraped training corpora. Blocking GPTBot and CCBot reduces the risk that paying customers answers leak into a public model.
EU rightholder exercising Article 4(3) opt-out. The EU AI Act extends the existing copyright text-and-data-mining opt-out into a machine-readable signal. The EU AI Act opt-out preset on this page produces the user-agent blocks that match what major AI vendors have announced they will respect.
Brand keeping AI answers consistent. Some brands prefer that AI products reference their authoritative help centre through search rather than caching their content. Blocking in-product fetchers, but keeping training crawlers off, is one signal that says use my official channels.
Internal staging or low-quality content. A draft blog or staging environment should not leak into model training. Adding the AI bot block on top of a generic Disallow: / keeps things tidy even if the staging environment becomes public by accident.
Compliance with internal policy. Some organisations require a documented opt-out signal as part of their data governance. Even if enforcement is imperfect, having the file is part of meeting the policy.

Limitations and privacy notes

Blocking AI crawlers with robots.txt is the right first step but it is not bulletproof. The file is a public request, not a technical barrier. Bots that ignore the standard, scrape through residential proxies, or download content via a non-AI intermediary (like Common Crawl on behalf of a downstream model) will still see your pages. Old training datasets that already contain your content are not affected by a future robots.txt change. Different AI vendors interpret the same user-agent name slightly differently, and some bots like Google-Extended only opt the site out of training without removing it from Google Search.

This tool runs in your browser. The user-agent list, the sitemap URL, the custom paths and the existing robots.txt you paste are processed locally. Nothing is sent to PeopleAreGeek or to any third party. The bot reference table is shipped with the page; we update it when vendors publish new user-agents.

Frequently asked questions

Will blocking GPTBot hurt my Google ranking?

No. GPTBot is OpenAI training crawler, separate from Googlebot. Blocking GPTBot only opts you out of being used in future OpenAI model training. Your Google Search ranking depends on Googlebot, which you should leave allowed. Google-Extended is a separate Google user-agent that controls AI training opt-out without affecting search visibility.

Do AI vendors actually honour robots.txt?

The major reputable vendors do. OpenAI, Anthropic, Google, Apple, Perplexity, Meta and Common Crawl have all publicly committed to honouring the file. They have business reasons to comply: a reputation for ignoring robots.txt would invite litigation and breach contractual commitments to publishers. Smaller or anonymous scraper operators are a different story.

What is the difference between training crawlers and in-product fetchers?

Training crawlers gather pages to build datasets that future model versions learn from. They visit at scale, follow links, and respect crawl rates. In-product fetchers like ChatGPT-User or PerplexityBot fetch a single page on demand when a user asks an AI for the contents of a specific URL. Both can be blocked independently using their separate user-agent names.

What is the EU AI Act Article 4(3) opt-out about?

The EU AI Act incorporates the text-and-data-mining opt-out from the 2019 EU Copyright Directive. Rightholders can express their opt-out in a machine-readable way, and robots.txt user-agent blocks are the most widely supported expression. Major AI vendors that train models on EU data have committed to honouring user-agent-based opt-outs as one valid signal.

Should I also block Common Crawl (CCBot)?

If your goal is to opt out of AI training, yes. Common Crawl is a free public dataset that many open-source AI projects use as a base. Blocking CCBot prevents your content from ending up in that dataset. Common Crawl honours robots.txt and excludes blocked sites from future crawls.