Robots.txt Generator

Generate a robots.txt draft, test your real paths, compare the live file, and review crawl warnings before you publish.

This robots.txt generator builds a clean draft for WordPress, generic public sites, ecommerce sections, staging blocks, or AI crawler controls, then does the part most generators skip. It runs your real paths through the rules you wrote, so you can see which URLs end up allowed or blocked and which directive wins. It fetches the live robots.txt you already serve and diffs it against the draft, so nothing changes by surprise. It builds the sitemap line, flags the usual mistakes like a stray full-site Disallow or a blocked assets folder, and hands you an install checklist plus a copyable report. Drop the file at the root of each host, then fetch it back and test the paths that matter before you trust it.

Queries run through the PeopleAreGeek lookup service. We log nothing.

Robots.txt generator, live compare, path tester and WordPress crawl rule planner

Honestly, robots.txt trips people up more than it should. This builds you a clean draft for WordPress, public sites, ecommerce, staging, or AI crawler rules. Then it does the part most generators skip: it tests your real paths against that draft, pulls down the live file you already have, and shows you exactly what would change. You get sitemap lines and crawl warnings in an output you can actually hand to someone for review.

Site URL

Template

Main user-agent

Sitemap URL or path

Crawl-delay

Live compare host

Disallow paths

Allow paths

Extra groups or comments

Paths to test against generated rules

Quick reminder: robots.txt controls crawling. It's not secrecy, and it won't guarantee indexing either. After you publish at the root of the exact host, go test the live behavior.

A robots.txt generator should produce a file you can audit

This robots.txt generator builds a draft you can read, test, and defend before it ever reaches a live host. The file is tiny. The blast radius is not. One stray Disallow: / on a live domain and you've told crawlers to leave everything alone. Forget the sitemap line after a migration and discovery just crawls along slower than it should. On WordPress the job is usually pretty boring, which is good: keep admin paths out, let the AJAX endpoint through because some plugins genuinely need it, leave the public stuff open, and hand search engines your sitemap.

That boring workflow is the whole point. It spins up templates for WordPress, generic public sites, ecommerce sections, staging blocks, and AI crawler controls. It runs your important paths through the generated rules and flags the usual crawl mistakes. It'll also grab the live robots.txt you've got right now so you can see what actually changes before you upload anything.

How to use generated robots.txt safely

Drop the file at the root of the host, so https://example.com/robots.txt, or set it through whichever SEO plugin or hosting layer owns your virtual robots output. Then pull the live file back down and poke at the stuff that matters: public pages, admin areas, search pages, sitemap URLs, anything Search Console keeps yelling is blocked. One thing that bites people: these rules are host-specific. If both www and non-www resolve, you have to check both. Yes, both.

Use robots.txt for crawl access. It was never meant to hide private content.
Use noindex on a fetchable page when what you really want is the page gone from results.
Keep sitemap lines absolute and actually current.
Do not block CSS or JavaScript that your public pages need to render. Crawlers see a broken page otherwise.
Compare the live file again after any cache purge, plugin, or hosting change. That's usually when surprises sneak in.

Common robots.txt mistakes

The classic disaster is shipping a staging block to production and quietly blocking the entire site. Happens more than anyone admits. Blocking /wp-content/ is sneakier, because the site looks fine to you while crawlers can't render the assets. And here's the one people forget: if a page needs a noindex signal but you've blocked crawling, nobody ever reads that signal. Robots.txt is public too, so those "secret" folder names you disallowed are now a published list. Honestly the safest file is just short, deliberate, and tested against paths you actually care about.

Frequently asked questions

Does robots.txt guarantee that a URL will not be indexed?

Nope. All it controls is crawling. If you actually want a URL out of the index, reach for noindex on a fetchable page, or redirects, or a canonical cleanup, or the right status code. Depends what you are trying to do.

Should every WordPress site block wp-admin?

Most public ones do. The common pattern is blocking /wp-admin/ while leaving /wp-admin/admin-ajax.php open. Just test your theme and plugins once you have changed it, since some of them lean on that endpoint.

Can AI crawler rules go in robots.txt?

They can. A fair number of AI crawlers do read user-agent groups in robots.txt. But treat that as stating a preference out loud, not as a wall. Anything that ignores the rules will just ignore them, so do not mistake it for a security boundary.

What should a basic robots.txt contain?

Not much, really. A User-agent line, whatever Disallow or Allow rules you need, then a Sitemap line pointing at your XML sitemap. For a lot of sites the honest default is: allow everything, add the sitemap line, done.

Where do I put robots.txt?

Root of each host, exactly at /robots.txt. Tuck it in a subfolder and crawlers just ignore it. And every subdomain counts as its own host here, so each one needs its own file. The shop and the blog on separate subdomains? Two files.