Index control and preview directive audit
Paste a public URL. This reads the robots meta tags and the X-Robots-Tag header, lines up the generic directives against the crawler-specific ones, and keeps index control separate from snippet control (people blur those two constantly). Files with no HTML, your PDFs mostly, get reviewed the same way.
Meta tags only live in HTML. The X-Robots-Tag header works on either kind of response, HTML or not, which is the only way to reach files that can’t hold a meta tag in the first place.
What robots meta and X-Robots-Tag actually decide
Most people flatten robots directives down to one word. Indexable, or blocked. That’s too coarse for a real audit. A noindex signal decides whether a fetched URL is allowed to show up in results at all. nofollow is a different thing entirely, it’s about how links get treated. And the preview directives, max-snippet or max-image-preview or nosnippet, only govern how much a search engine shows around your result. Three separate decisions wearing the same coat.
So this checker reads the generic robots meta tag, then the crawler-specific ones if it can actually pull the HTML, plus the X-Robots-Tag header. On WordPress that combination bites people. Your SEO plugin writes page-level tags into the HTML, and meanwhile the host or a CDN quietly bolts on a header you never asked for. For a PDF or an image it’s worse, or rather, simpler in a frustrating way: headers are basically the only knob you’ve got.
Robots meta is not robots.txt
Here’s the trap. A robots.txt rule gates crawl access to a path. A page-level directive, though, has to be fetched first, a crawler can’t read a tag on a page it was told never to open. So if you block a URL in robots.txt and then slap a noindex on it, that noindex might never get seen. Honestly that’s the mistake I see most. Moved something? Read the redirects. Got duplicate public content floating around? Read the canonicals. And for anything you genuinely want kept out of search, put the index-control directive somewhere a crawler can actually reach it.
- robots is your generic HTML meta directive, the catch-all every crawler reads.
- googlebot and bingbot let you override that catch-all for one named crawler when you need to.
- X-Robots-Tag rides in the HTTP headers, and it’s what saves you on non-HTML responses.
- Snippet directives trim the preview without necessarily pulling the URL out of search. Worth remembering.
- Expected outcome tells the tool a deliberate noindex isn’t a bug, so it won’t grade your private page like a screwup.
A practical robots directive workflow
- Check the exact public URL that actually showed up, the one from the sitemap or the Search Console report or wherever the ticket pointed you.
- Read the response status and the content type first. Before you touch the tags.
- Put the generic meta, the crawler-specific tags and the X-Robots-Tag header side by side and compare them together.
- When the signals disagree, pair your noindex finding with canonical, redirect and robots.txt checks. Don’t trust one in isolation.
- Retest after anything changes: theme, SEO plugin, cache, CDN, a tweaked server header.
Common questions
Is nofollow the same as noindex?
No, and the mix-up costs people. Noindex is about whether the URL shows up in results. Nofollow just changes how links get handled. Blur the two and you’ll spend a week chasing the wrong reason a page isn’t performing.
Why check X-Robots-Tag on a PDF?
Because a PDF has no HTML head. There’s nowhere to drop a robots meta tag. The response header is the only spot left to apply indexing or preview controls to that file, so that’s where you look.
Does a missing robots meta tag mean a public page is broken?
No. A page indexes just fine with no robots meta tag at all, that’s the default. What actually breaks things is a restrictive directive you didn’t expect, or control layers fighting each other, or a response path nothing can read.
What is the difference between the robots meta tag and robots.txt?
robots.txt handles crawling, site-wide, who’s allowed to fetch what. The robots meta tag (and the X-Robots-Tag header) handles indexing, page by page. Catch is, the page has to be crawlable for anyone to read its noindex in the first place.
What does noindex, follow mean?
Keep this page out of the index, but still crawl its links so the targets get found and link value keeps flowing through. You’ll see it a lot on paginated or thin pages, the ones you don’t want ranking themselves but still want passing equity onward.
Can I set robots directives in an HTTP header?
Yep. The X-Robots-Tag header takes the same directives, and for non-HTML files like PDFs or images it’s your only option. They’ve got nowhere to park a meta tag, so the header does all the work.













