March 4, 2026

v0.4.0: Better Content Extraction

Significantly improved content extraction accuracy with richer article metadata including site name, lead image, and publish dates.

Mehmet Kose

2 mins read

What’s New in v0.4.0

Richer Article Metadata

Crawl results now include additional metadata when available:

Site Name - the publication or site name (e.g. “The New York Times”)
Lead Image - the article’s primary image URL
Publish Date - when the article was originally published
Modified Date - when the article was last updated

These fields appear automatically in your .crawl files and JSON exports. No configuration changes needed.

Who Benefits

Content teams using Content Archive export get cleaner markdown with less noise from page chrome
SEO professionals analyzing content across large sites get more reliable word counts and excerpts
Developers building on crawl data get structured article metadata without additional scraping

About crawler.sh

crawler.sh is a fast Rust-based web crawler and SEO auditing tool that runs entirely on your own machine. Use the CLI for automation, scripts, and CI pipelines, or the desktop app for a visual dashboard with live crawl progress, SEO issue charts, and one-click exports.

Every release ships across both the CLI and the desktop app. Download the latest version or run crawler update from the terminal to upgrade an existing install.

crawler.sh runs locally on your machine. Use the CLI or the desktop app, with your data staying on your hardware.

What’s New in v0.4.0

More Accurate Content Extraction

Richer Article Metadata

Who Benefits

Related

About crawler.sh