March 6, 2026

How to Find Non-Self Canonicals with CLI

Learn how to detect non-self canonical tags using crawler.sh CLI. Find pages pointing canonical URLs to different pages and audit your strategy.

Mehmet Kose
3 mins read

A canonical tag tells search engines which version of a page is the “official” one. When a page’s canonical URL points to a different page, it signals that the current page is a duplicate and should not be indexed - all ranking signals should be consolidated to the canonical target. While this is correct for true duplicates, incorrect canonical tags can accidentally deindex important pages.

This guide shows you how to find every page with a non-self canonical tag using the crawler.sh CLI.

Step 1: Install crawler.sh CLI

Install the CLI with a single command:

curl -fsSL https://install.crawler.sh | sh

This downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.

Verify the installation:

crawler --version

Step 2: Crawl the target website

Run a full crawl of the website you want to audit:

crawler crawl https://example.com

The crawler records the canonical URL for every page it visits. Results are saved as an NDJSON file (.crawl) in the current directory. For larger sites:

crawler crawl https://example.com --max-pages 5000

Step 3: Run SEO audit

Run the SEO analysis on your crawl data:

crawler seo example-com.crawl

The non-self canonicals check flags every page where the canonical URL differs from the page’s own URL.

Step 4: Identify non-self canonicals

Look for the Non-Self Canonicals section in the SEO report. Each entry shows the page URL and the canonical URL it points to. Common scenarios:

  • Correct usage: Paginated pages canonicalizing to page 1, HTTP pages canonicalizing to HTTPS, pages with query parameters canonicalizing to the clean URL
  • Incorrect usage: Unique pages accidentally pointing to a different page, all pages pointing to the homepage, canonical URLs pointing to 404 pages
  • CMS issues: Plugins that set incorrect canonicals, template errors that hardcode a single canonical across multiple pages

Step 5: Fix and re-crawl

For each flagged page:

  • Verify intentional canonicals are pointing to the correct target and that the target returns 200 OK
  • Fix incorrect canonicals by updating them to self-referencing canonicals or the correct target URL
  • Remove canonicals that point to non-existent pages (404s) or irrelevant pages
  • Check for canonical chains - where page A canonicalizes to B, which canonicalizes to C

After fixing, re-crawl to verify:

crawler crawl https://example.com
crawler seo example-com.crawl

Why non-self canonicals matter for SEO

Canonical tags are a strong signal to search engines. An incorrect canonical can effectively remove a page from search results, even if the page has valuable, unique content. Regularly auditing non-self canonicals helps you catch mistakes early - before they impact your organic traffic. This check is especially important after site migrations, CMS updates, or URL structure changes.

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt