How to Find Noindex Pages with CLI
Learn how to detect pages blocked from indexing with noindex directives using crawler.sh CLI. Ensure important pages are not accidentally hidden.
A noindex directive tells search engines not to include a page in their index. While this is useful for pages like login screens or internal search results, accidentally noindexing important pages makes them completely invisible in search results. A single misplaced noindex tag can remove a high-traffic page from Google overnight.
This guide shows you how to find every noindexed page on your website using the crawler.sh CLI.
Step 1: Install crawler.sh CLI
Install the CLI with a single command:
curl -fsSL https://install.crawler.sh | shThis downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.
Verify the installation:
crawler --versionStep 2: Crawl the target website
Run a full crawl of the website you want to audit:
crawler crawl https://example.comThe crawler checks both the <meta name="robots"> tag and the X-Robots-Tag HTTP header for noindex directives. Results are saved as an NDJSON file (.crawl) in the current directory.
Step 3: Run SEO audit
Run the SEO analysis on your crawl data:
crawler seo example-com.crawlThe noindex pages check flags every page that contains a noindex directive, whether in the HTML meta tag or the HTTP header.
Step 4: Identify noindex pages
Look for the Noindex Pages section in the SEO report. Review each flagged page to determine if the noindex is intentional or accidental. Common causes of accidental noindexing:
- Staging environment settings left in place after going live
- CMS “discourage search engines” checkbox forgotten after development
- Plugin or theme updates that reset indexing settings
- Blanket noindex rules in robots meta that are too broad
- A/B testing tools that add noindex to test variants
Step 5: Fix and re-crawl
For each flagged page:
- Remove noindex from pages that should appear in search results
- Keep noindex on pages that should not be indexed (login, admin, thank-you pages, duplicate content)
- Check both sources - the meta tag in HTML and the X-Robots-Tag HTTP header
After fixing, re-crawl to verify:
crawler crawl https://example.comcrawler seo example-com.crawlWhy noindex pages matter for SEO
A noindex directive is absolute - if it is present, the page will not appear in search results regardless of its content quality or backlinks. This makes accidental noindexing one of the most damaging SEO mistakes. Regular audits catch these issues before they impact traffic. Even intentionally noindexed pages are worth reviewing periodically, as business needs change and pages that were once internal may now be valuable to index.