The meta robots tag is an HTML <meta> element placed in the <head> section of a web page. It provides instructions to search engine crawlers about how to treat that specific page. Unlike robots.txt, which controls crawling at the site or directory level, the meta robots tag controls indexing behavior for individual pages.
Every page on your site can have its own meta robots tag. This granular control lets you allow crawling of a page while preventing it from appearing in search results, or allow indexing while preventing crawlers from following links on that page.
Common meta robots directives
| Directive | Meaning | Use case |
|---|---|---|
noindex | Do not include this page in the search index | Thank you pages, admin panels, duplicate content |
nofollow | Do not follow links on this page | User-generated content, untrusted links, login-required pages |
noarchive | Do not show a cached copy of this page | Content that updates frequently and cached versions would be misleading |
nosnippet | Do not show a text snippet or video preview | Pages where previews might reveal sensitive information |
noimageindex | Do not index images on this page | Photo galleries where you want text indexed but not images |
notranslate | Do not offer translation of this page | Pages with technical terminology that should not be machine-translated |
unavailable_after | Remove this page from the index after a specified date | Time-limited offers, event pages, seasonal content |
Multiple directives can be combined in a single tag:
<meta name="robots" content="noindex, nofollow">This tells search engines not to index the page and not to follow any links found on it.
Targeting specific crawlers
The name attribute can target specific user agents rather than all crawlers:
<meta name="googlebot" content="noindex"><meta name="bingbot" content="noindex">This blocks only Google and Bing while allowing other crawlers to index the page. Use this sparingly, as most site owners want consistent behavior across all search engines.
Meta robots vs X-Robots-Tag
The same directives can be delivered via an HTTP header called X-Robots-Tag. This is useful for non-HTML files like PDFs, images, or video where a <meta> tag cannot be embedded. Both methods are equally valid, but X-Robots-Tag is less commonly used for HTML pages.
Example HTTP response header:
X-Robots-Tag: noindex, nofollowYou can also combine it with content-type-specific rules:
X-Robots-Tag: noindexContent-Type: application/pdfCommon mistakes
- Applying
noindexto staging or development sites and forgetting to remove it before launch. This is one of the most common causes of a new site not appearing in search results. - Using
noindexon important pages like product categories or blog posts. Always double-check which template applies the tag. - Conflicting directives between meta tags and HTTP headers. If the meta says
indexbut the header saysnoindex, different crawlers may behave differently. - Assuming
noindexalso prevents crawling. It does not. Userobots.txtDisallow if you want to block crawling entirely. - Using
nofollowsite-wide, which blocks link equity flow throughout the site and prevents crawlers from discovering new pages. - Applying
noindexto paginated series, which can prevent search engines from understanding the relationship between pages. - Forgetting that
noindexeventually leads tonofollowbehavior. If a page is not indexed, links from it may not pass equity even without an explicitnofollow.
Meta robots and SEO strategy
Strategic use of meta robots tags helps focus crawl budget on valuable pages:
- Thin pages - Login forms, search results, tag archives with few posts
- Duplicate content - Print-friendly versions, mobile-specific URLs, parameter-based sorting
- Private content - Account dashboards, checkout flows, internal documentation
- Temporary pages - Campaign landing pages, expired promotions, maintenance notices
- Paginated content - Use
rel="canonical"to the first page rather thannoindexon subsequent pages
How crawler.sh checks meta robots tags
crawler.sh extracts and reports meta robots directives for every crawled page. The crawler seo command provides:
- A count of pages with
noindexdirectives - A list of
noindexpages so you can audit them individually - Detection of
nofollowat the page level - The raw meta robots content for each page in the crawl output
- Identification of conflicting directives between meta tags and HTTP headers
This helps catch accidental blocking of content that should be indexed. The CSV export lets you sort and filter by directive type, making it easy to spot patterns like an entire directory that was accidentally noindexed.