sitemap XML

A sitemap XML file lists the pages on a website to help search engines discover and crawl content more efficiently.

A sitemap XML file is a structured document that lists the URLs on a website along with optional metadata such as last modification date, change frequency, and priority. It follows the Sitemap Protocol and is typically located at https://example.com/sitemap.xml.

Why sitemaps matter

Sitemaps help search engines discover pages that might be hard to find through normal crawling. While search engines follow links to discover content, some pages may have few internal links or sit deep in the site structure. A sitemap ensures these pages are still found and crawled.

Sitemaps are especially useful for:

  • Large sites with thousands of pages
  • New sites with few external backlinks
  • Sites with content behind JavaScript rendering
  • Pages that are not well-linked in the site navigation

What a sitemap contains

A basic sitemap lists URLs wrapped in XML tags. Each URL entry can include optional metadata:

  • <loc> - The page URL (required)
  • <lastmod> - The date the page was last modified
  • <changefreq> - How often the page changes (deprecated by most engines)
  • <priority> - Relative priority compared to other pages on the site

How crawler.sh helps

Run crawler crawl https://example.com -f sitemap to crawl a site and generate a valid XML sitemap containing only pages that return a 200 status code. You can also crawl first with the default NDJSON format and then export to sitemap later using crawler export example-com.crawl -f sitemap. The crawler also detects existing sitemaps during the crawl and reports them in the site info.

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt