How to Find Non-Self Canonicals with CLI
Learn how to detect non-self canonical tags using crawler.sh CLI. Find pages pointing canonical URLs to different pages and audit your strategy.
A canonical tag tells search engines which version of a page is the “official” one. When a page’s canonical URL points to a different page, it signals that the current page is a duplicate and should not be indexed - all ranking signals should be consolidated to the canonical target. While this is correct for true duplicates, incorrect canonical tags can accidentally deindex important pages.
This guide shows you how to find every page with a non-self canonical tag using the crawler.sh CLI.
Step 1: Install crawler.sh CLI
Install the CLI with a single command:
curl -fsSL https://install.crawler.sh | shThis downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.
Verify the installation:
crawler --versionStep 2: Crawl the target website
Run a full crawl of the website you want to audit:
crawler crawl https://example.comThe crawler records the canonical URL for every page it visits. Results are saved as an NDJSON file (.crawl) in the current directory. For larger sites:
crawler crawl https://example.com --max-pages 5000Step 3: Run SEO audit
Run the SEO analysis on your crawl data:
crawler seo example-com.crawlThe non-self canonicals check flags every page where the canonical URL differs from the page’s own URL.
Step 4: Identify non-self canonicals
Look for the Non-Self Canonicals section in the SEO report. Each entry shows the page URL and the canonical URL it points to. Common scenarios:
- Correct usage: Paginated pages canonicalizing to page 1, HTTP pages canonicalizing to HTTPS, pages with query parameters canonicalizing to the clean URL
- Incorrect usage: Unique pages accidentally pointing to a different page, all pages pointing to the homepage, canonical URLs pointing to 404 pages
- CMS issues: Plugins that set incorrect canonicals, template errors that hardcode a single canonical across multiple pages
Step 5: Fix and re-crawl
For each flagged page:
- Verify intentional canonicals are pointing to the correct target and that the target returns 200 OK
- Fix incorrect canonicals by updating them to self-referencing canonicals or the correct target URL
- Remove canonicals that point to non-existent pages (404s) or irrelevant pages
- Check for canonical chains - where page A canonicalizes to B, which canonicalizes to C
After fixing, re-crawl to verify:
crawler crawl https://example.comcrawler seo example-com.crawlWhy non-self canonicals matter for SEO
Canonical tags are a strong signal to search engines. An incorrect canonical can effectively remove a page from search results, even if the page has valuable, unique content. Regularly auditing non-self canonicals helps you catch mistakes early - before they impact your organic traffic. This check is especially important after site migrations, CMS updates, or URL structure changes.