canonical URL
A canonical URL is an HTML element that tells search engines which version of a page is the preferred one for indexing.
A canonical URL is specified using the <link rel="canonical" href="..."> tag in the <head> section of a page. It tells search engines which URL should be treated as the authoritative version when the same or similar content is accessible at multiple URLs.
Why canonical URLs matter
Duplicate content is common on the web. The same page might be accessible with and without trailing slashes, with different query parameters, or through HTTP and HTTPS. Without a canonical tag, search engines must guess which version to index. This can split ranking signals across multiple URLs, weakening the page’s search performance.
A self-referencing canonical (where the canonical URL points to the page itself) is the most common and recommended setup. It confirms to search engines that this URL is the preferred version.
Non-self canonicals
A non-self canonical occurs when a page’s canonical tag points to a different URL. This is sometimes intentional (for example, syndicated content pointing back to the original) but often accidental. Common causes include:
- CMS generating wrong canonical URLs after migration
- Parameter URLs canonicalizing to incorrect base URLs
- Staging or development URLs leaking into canonical tags
How crawler.sh helps
The crawler seo command detects non-self canonical URLs across all crawled pages. Each flagged page shows both the page URL and the canonical target so you can verify whether the canonical is correct or needs fixing. Regular audits catch canonical issues before they impact your search visibility.