March 7, 2026

How to Find Orphan Pages on a Website with CLI

Learn how to detect orphan pages with zero incoming internal links using crawler.sh CLI. Identify isolated pages and fix your internal linking.

Mehmet Kose
4 mins read

Orphan pages are pages on your site that no other page links to. They are invisible to users browsing your site and difficult for search engines to discover. Without incoming internal links, these pages receive no link equity and may never get indexed, no matter how good the content is.

This guide shows you how to find every orphan page on a website using the crawler.sh CLI.

Step 1: Install crawler.sh CLI

Install the CLI with a single command:

curl -fsSL https://install.crawler.sh | sh

This downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.

Verify the installation:

crawler --version

You need version 0.6.2 or later for orphan page detection. Run crawler update to get the latest version.

Step 2: Crawl the target website

Run a full crawl of the website you want to check for orphan pages:

crawler crawl https://example.com

The crawler follows every internal link it discovers, recording which pages link to which. Results are saved as an NDJSON file (.crawl) in the current directory. For larger sites, increase the page limit:

crawler crawl https://example.com --max-pages 5000

During the crawl, the crawler collects all internal links found on every page. This data is what powers the orphan page detection - it builds a map of which pages receive incoming links and which do not.

Step 3: Run SEO analysis for orphan pages

Run the seo command to get a full SEO report including orphan page detection:

crawler seo example-com.crawl

The SEO report runs all 24 automated checks. The Orphan pages section lists every page that has zero incoming internal links from any other crawled page. The start URL (typically the homepage) is excluded since it is reached directly.

Step 4: Review the results

Each orphan page in the report shows the URL of the isolated page. Common patterns to look for:

  • Old blog posts that were never linked from an index or category page
  • Landing pages created for ad campaigns that were never integrated into the site
  • Pages from a previous site structure that lost their links during a redesign
  • Accidentally published drafts that went live without being linked from anywhere

Step 5: Export the orphan pages report

Export the results to a file for sharing with your team or tracking fixes:

crawler seo example-com.crawl --format csv --output orphan-pages.csv

You can also export as plain text:

crawler seo example-com.crawl --format txt --output orphan-pages.txt

The CSV format is ideal for importing into a spreadsheet where you can categorize orphan pages by type and assign fixes to team members.

How to fix orphan pages

Once you have your list of orphan pages, here is how to address them:

  • Add internal links from relevant pages. Find pages on your site that cover related topics and add contextual links to the orphan page. This is the best fix for pages that should remain live and visible.
  • Add to navigation or hub pages. If the orphan page belongs in a category, tag archive, or sidebar, add it there so users and search engines can find it.
  • Redirect to a relevant page. If the orphan content is outdated or duplicated, set up a 301 redirect to the most relevant existing page.
  • Delete and return 410. If the page is no longer needed, remove it and return a 410 status code so search engines drop it from the index.
  • Re-crawl after fixing. Run crawler crawl again to confirm the orphan pages now have incoming internal links.
Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt