How to Find Orphan Pages on a Website with CLI
Learn how to detect orphan pages with zero incoming internal links using crawler.sh CLI. Identify isolated pages and fix your internal linking.
Orphan pages are pages on your site that no other page links to. They are invisible to users browsing your site and difficult for search engines to discover. Without incoming internal links, these pages receive no link equity and may never get indexed, no matter how good the content is.
This guide shows you how to find every orphan page on a website using the crawler.sh CLI.
Step 1: Install crawler.sh CLI
Install the CLI with a single command:
curl -fsSL https://install.crawler.sh | shThis downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.
Verify the installation:
crawler --versionYou need version 0.6.2 or later for orphan page detection. Run crawler update to get the latest version.
Step 2: Crawl the target website
Run a full crawl of the website you want to check for orphan pages:
crawler crawl https://example.comThe crawler follows every internal link it discovers, recording which pages link to which. Results are saved as an NDJSON file (.crawl) in the current directory. For larger sites, increase the page limit:
crawler crawl https://example.com --max-pages 5000During the crawl, the crawler collects all internal links found on every page. This data is what powers the orphan page detection - it builds a map of which pages receive incoming links and which do not.
Step 3: Run SEO analysis for orphan pages
Run the seo command to get a full SEO report including orphan page detection:
crawler seo example-com.crawlThe SEO report runs all 24 automated checks. The Orphan pages section lists every page that has zero incoming internal links from any other crawled page. The start URL (typically the homepage) is excluded since it is reached directly.
Step 4: Review the results
Each orphan page in the report shows the URL of the isolated page. Common patterns to look for:
- Old blog posts that were never linked from an index or category page
- Landing pages created for ad campaigns that were never integrated into the site
- Pages from a previous site structure that lost their links during a redesign
- Accidentally published drafts that went live without being linked from anywhere
Step 5: Export the orphan pages report
Export the results to a file for sharing with your team or tracking fixes:
crawler seo example-com.crawl --format csv --output orphan-pages.csvYou can also export as plain text:
crawler seo example-com.crawl --format txt --output orphan-pages.txtThe CSV format is ideal for importing into a spreadsheet where you can categorize orphan pages by type and assign fixes to team members.
How to fix orphan pages
Once you have your list of orphan pages, here is how to address them:
- Add internal links from relevant pages. Find pages on your site that cover related topics and add contextual links to the orphan page. This is the best fix for pages that should remain live and visible.
- Add to navigation or hub pages. If the orphan page belongs in a category, tag archive, or sidebar, add it there so users and search engines can find it.
- Redirect to a relevant page. If the orphan content is outdated or duplicated, set up a 301 redirect to the most relevant existing page.
- Delete and return 410. If the page is no longer needed, remove it and return a 410 status code so search engines drop it from the index.
- Re-crawl after fixing. Run
crawler crawlagain to confirm the orphan pages now have incoming internal links.