Indexability is the quality of a web page that determines whether a search engine can add it to its search index. A page must be both crawlable and permitted by directives before it can be indexed. Even if a search engine can access a page, it will not index it if the page signals it should not be included.
When Googlebot visits a page, it first checks whether it is allowed to crawl the URL, then examines the page content and directives to decide whether to store it in the index. Indexability is the gate at the end of this process. A page can pass the crawlability check and still fail the indexability check.
What affects indexability
noindexmeta tag -<meta name="robots" content="noindex">tells search engines not to index the page. This is the most common blocking method.noindexHTTP header -X-Robots-Tag: noindexin the response headers achieves the same result. Used when you cannot edit HTML, such as with PDFs or API responses.- Canonical tags - A canonical pointing to a different URL may cause the non-canonical version to be excluded from the index. Search engines prefer the canonical target.
- Quality signals - Search engines may choose not to index thin or duplicate content even without explicit directives. A page with only a few sentences or a direct copy of another page may be de-indexed.
- Robots.txt - While robots.txt blocks crawling, it does not directly prevent indexing if other pages link to the blocked URL. Google may still index the URL based on anchor text from external links, showing a result with no snippet.
- Authentication requirements - Pages behind login walls are technically crawlable if the crawler has credentials, but search engines typically do not index content they cannot access publicly.
- Legal or manual actions - Search engines may remove pages from the index due to DMCA complaints, court orders, or manual spam penalties.
Indexability vs crawlability
Crawlability is about access: can a search engine bot reach the page? Indexability is about permission: is the page allowed to be stored in the search index? A page can be crawlable but not indexable (for example, a page with a noindex tag). A page can also be indexable in principle but not crawlable if robots.txt blocks it, though search engines may still index it based on external signals.
The relationship looks like this:
- Crawler discovers the URL via sitemap, internal link, or external reference
- Crawler checks robots.txt for crawl permission
- Crawler fetches the page HTML
- Crawler parses meta robots tags and HTTP headers
- Crawler evaluates content quality and canonical status
- If all checks pass, the page enters the index
Failing at step 2 means the page is not crawlable. Failing at steps 4-6 means the page is not indexable.
Common indexability issues
- Important pages accidentally tagged with
noindexbecause a CMS template includes it by default - Conflicting directives between meta tags and HTTP headers, such as a meta tag saying
indexwhile the header saysnoindex noindexapplied site-wide via a global template or plugin setting, blocking the entire site- Pages with valuable content left out of XML sitemaps, making them harder to discover
- Thin or duplicate content that search engines choose to de-index rather than display
- Canonical tags pointing to non-existent or irrelevant URLs
- Staging or development sites exposed to crawlers with
noindextags that were never removed before launch - JavaScript-rendered content where the
noindexdirective is only added after page load, causing a race condition
Checking indexability
You can check indexability using several methods:
- Google Search Console URL Inspection - Shows whether a URL is indexed and what the blocking reason is
- Site: search operator -
site:example.com/page-urlshows if the page appears in Google’s index - Crawler tools - Tools that extract meta robots tags and HTTP headers across an entire site
- Browser developer tools - Check the Network tab for
X-Robots-Tagheaders and the Elements tab for meta robots tags
How crawler.sh checks indexability
crawler.sh analyzes indexability as part of its SEO audit. The crawler seo command reports:
- Pages with
noindexin meta robots tags - Pages with
noindexinX-Robots-Tagheaders - The overall count and percentage of noindex pages on the site
- Pages that are orphaned or have no internal links (indirect indexability signal, since orphan pages are harder to discover)
- Conflicting directives between meta tags and headers
These checks help ensure that valuable content is not accidentally hidden from search engines. The CSV export makes it easy to share indexability reports with clients or team members.