What is Indexability in SEO

Indexability is the quality of a web page that determines whether a search engine can add it to its search index. A page must be both crawlable and permitted by directives before it can be indexed. Even if a search engine can access a page, it will not index it if the page signals it should not be included.

When Googlebot visits a page, it first checks whether it is allowed to crawl the URL, then examines the page content and directives to decide whether to store it in the index. Indexability is the gate at the end of this process. A page can pass the crawlability check and still fail the indexability check.

What affects indexability

noindex meta tag - <meta name="robots" content="noindex"> tells search engines not to index the page. This is the most common blocking method.
noindex HTTP header - X-Robots-Tag: noindex in the response headers achieves the same result. Used when you cannot edit HTML, such as with PDFs or API responses.
Canonical tags - A canonical pointing to a different URL may cause the non-canonical version to be excluded from the index. Search engines prefer the canonical target.
Quality signals - Search engines may choose not to index thin or duplicate content even without explicit directives. A page with only a few sentences or a direct copy of another page may be de-indexed.
Robots.txt - While robots.txt blocks crawling, it does not directly prevent indexing if other pages link to the blocked URL. Google may still index the URL based on anchor text from external links, showing a result with no snippet.
Authentication requirements - Pages behind login walls are technically crawlable if the crawler has credentials, but search engines typically do not index content they cannot access publicly.
Legal or manual actions - Search engines may remove pages from the index due to DMCA complaints, court orders, or manual spam penalties.

Indexability vs crawlability

Crawlability is about access: can a search engine bot reach the page? Indexability is about permission: is the page allowed to be stored in the search index? A page can be crawlable but not indexable (for example, a page with a noindex tag). A page can also be indexable in principle but not crawlable if robots.txt blocks it, though search engines may still index it based on external signals.

The relationship looks like this:

Crawler discovers the URL via sitemap, internal link, or external reference
Crawler checks robots.txt for crawl permission
Crawler fetches the page HTML
Crawler parses meta robots tags and HTTP headers
Crawler evaluates content quality and canonical status
If all checks pass, the page enters the index

Failing at step 2 means the page is not crawlable. Failing at steps 4-6 means the page is not indexable.

Common indexability issues

Important pages accidentally tagged with noindex because a CMS template includes it by default
Conflicting directives between meta tags and HTTP headers, such as a meta tag saying index while the header says noindex
noindex applied site-wide via a global template or plugin setting, blocking the entire site
Pages with valuable content left out of XML sitemaps, making them harder to discover
Thin or duplicate content that search engines choose to de-index rather than display
Canonical tags pointing to non-existent or irrelevant URLs
Staging or development sites exposed to crawlers with noindex tags that were never removed before launch
JavaScript-rendered content where the noindex directive is only added after page load, causing a race condition

Checking indexability

You can check indexability using several methods:

Google Search Console URL Inspection - Shows whether a URL is indexed and what the blocking reason is
Site: search operator - site:example.com/page-url shows if the page appears in Google’s index
Crawler tools - Tools that extract meta robots tags and HTTP headers across an entire site
Browser developer tools - Check the Network tab for X-Robots-Tag headers and the Elements tab for meta robots tags

How crawler.sh checks indexability

crawler.sh analyzes indexability as part of its SEO audit. The crawler seo command reports:

Pages with noindex in meta robots tags
Pages with noindex in X-Robots-Tag headers
The overall count and percentage of noindex pages on the site
Pages that are orphaned or have no internal links (indirect indexability signal, since orphan pages are harder to discover)
Conflicting directives between meta tags and headers

These checks help ensure that valuable content is not accidentally hidden from search engines. The CSV export makes it easy to share indexability reports with clients or team members.