Robots.txt generator, live compare, path tester and WordPress crawl rule planner
Honestly, robots.txt trips people up more than it should. This builds you a clean draft for WordPress, public sites, ecommerce, staging, or AI crawler rules. Then it does the part most generators skip: it tests your real paths against that draft, pulls down the live file you already have, and shows you exactly what would change. You get sitemap lines and crawl warnings in an output you can actually hand to someone for review.
Quick reminder: robots.txt controls crawling. It’s not secrecy, and it won’t guarantee indexing either. After you publish at the root of the exact host, go test the live behavior.
A robots.txt generator should produce a file you can audit
The file is tiny. The blast radius is not. One stray Disallow: / on a live domain and you’ve told crawlers to leave everything alone. Forget the sitemap line after a migration and discovery just crawls (sorry) along slower than it should. On WordPress the job is usually pretty boring, which is good: keep admin paths out, let the AJAX endpoint through because some plugins genuinely need it, leave the public stuff open, and hand search engines your sitemap.
That boring workflow is the whole point of this thing. It spins up templates for WordPress, generic public sites, ecommerce sections, staging blocks, AI crawler controls. It runs your important paths through the generated rules and flags the usual crawl mistakes. It’ll also grab the live robots.txt you’ve got right now so you can see what actually changes before you upload anything or go poking around in plugin settings.
How to use generated robots.txt safely
Drop the file at the root of the host, so https://example.com/robots.txt, or set it through whichever SEO plugin or hosting layer owns your virtual robots output. Then pull the live file back down and poke at the stuff that matters: public pages, admin areas, search pages, sitemap URLs, anything Search Console keeps yelling is blocked. One thing that bites people: these rules are host-specific. If both www and non-www resolve, you have to check both. Yes, both.
- Use robots.txt for crawl access. It was never meant to hide private content.
- Use noindex on a fetchable page when what you really want is the page gone from results.
- Keep sitemap lines absolute and, you know, actually current.
- Do not block CSS or JavaScript that your public pages need to render. Crawlers see a broken page otherwise.
- Compare the live file again after any cache purge, plugin, or hosting change. That’s usually when surprises sneak in.
Common robots.txt mistakes
The classic disaster is shipping a staging block to production and quietly blocking the entire site. Happens more than anyone admits. Blocking /wp-content/ is sneakier, because the site looks fine to you while crawlers can’t render the assets. And here’s the one people forget: if a page needs a noindex signal but you’ve blocked crawling, nobody ever reads that signal. Oh, and robots.txt is public, so those “secret” folder names you disallowed? Now they’re a published list. Honestly the safest file is just short, deliberate, and tested against paths you actually care about.
Common questions
Does robots.txt guarantee that a URL will not be indexed?
Nope. All it controls is crawling. If you actually want a URL out of the index, reach for noindex on a fetchable page, or redirects, or a canonical cleanup, or the right status code. Depends what you’re trying to do.
Should every WordPress site block wp-admin?
Most public ones do. The common pattern is blocking /wp-admin/ while leaving /wp-admin/admin-ajax.php open. Just test your theme and plugins once you’ve changed it, since some of them lean on that endpoint.
Can AI crawler rules go in robots.txt?
They can. A fair number of AI crawlers do read user-agent groups in robots.txt. But I’d treat that as stating a preference out loud, not as a wall. Anything that ignores the rules will just ignore them, so don’t mistake it for a security boundary.
What should a basic robots.txt contain?
Not much, really. A User-agent line, whatever Disallow or Allow rules you need, then a Sitemap line pointing at your XML sitemap. For a lot of sites the honest default is: allow everything, add the sitemap line, done.
Does disallowing a path hide it from Google?
No, and this one surprises people. Disallow blocks crawling, sure, but the URL can still show up in results, just without a snippet because Google never got to read the page. Want it truly gone? Do the opposite of what feels right: allow crawling, and put a noindex directive on the page so it can actually be seen.
Where do I put robots.txt?
Root of each host, exactly at /robots.txt. Tuck it in a subfolder and crawlers just ignore it. And every subdomain counts as its own host here, so each one needs its own file. The shop and the blog on separate subdomains? Two files.













