v0.7.3: Content Freshness Signals
New SEO check that flags missing, stale, and inconsistent dateModified / datePublished metadata across JSON-LD, Open Graph, and HTTP Last-Modified.
What’s New in v0.7.3
Content Freshness Signals (check #24)
crawler.sh now audits the date metadata on every crawled page. Search engines and AI answer engines weight recency, and pages without clear publication and modification dates lose visibility in regular search, Google Discover, News, and AI citations. The new check catches the entire failure mode: missing dates, stale content, conflicting sources, and bad data formats.
The audit runs on every crawl, not just --content runs. It flags six distinct issues:
- Missing freshness signals - the page exposes no date in any source.
- Stale content - the most recent valid date is older than 730 days (configurable).
- Inconsistent freshness signals - two same-kind dates disagree by more than 7 days.
- Invalid date format - a non-empty date string fails to parse.
- dateModified before datePublished - a logical impossibility, usually a CMS bug.
- Missing structured data dates - the page has Open Graph or HTTP dates but no JSON-LD
datePublished/dateModified, weakening rich-result eligibility.
Where the Dates Come From
For every HTML page, crawler.sh reads up to five date sources:
- JSON-LD
datePublished/dateModifiedinsideArticle,BlogPosting,NewsArticle,WebPage,Report, and related types (including@graphentries). - Open Graph
<meta property="article:published_time">and<meta property="article:modified_time">. - HTTP
Last-Modifiedresponse header. - Readability-extracted dates (when content extraction is enabled).
JSON-LD parsing is bounded for safety: up to 8 script tags per page, each capped at 256 KB. Malformed JSON is skipped silently, so adversarial pages cannot break the crawl.
CLI Usage
# Run the SEO audit with the default 730-day staleness thresholdcrawler seo example-com.crawl
# Tighten the threshold to one yearcrawler seo example-com.crawl --stale-after-days 365
# Export including freshness rowscrawler seo example-com.crawl --export csv --output report.csvThe freshness rows appear in the same crawler seo output, alongside the existing 23 categories. CSV and TXT exports include them automatically.
Desktop App
Two updates land in the desktop app:
- SEO Issues card now lists the same six freshness rows alongside the existing checks.
- Content Freshness card is a new dashboard card that summarizes the median page age, the percentage of pages updated in the last 90 days and the last year, and shows a per-page list with color-coded “updated X ago” badges (green up to 90 days, amber up to a year, orange up to two years, red beyond).
The Content Freshness card replaces the Newsletter card slot.
Who Benefits
- SEO professionals see exactly which pages are missing the date metadata search engines need for freshness signals.
- Content teams catch CMS bugs that cause
dateModifiedto drift away from the on-page date. - AEO practitioners identify pages that AI engines are likely to skip due to missing or conflicting recency information.
- News and blog publishers make sure every article ships with valid Article schema dates before publication.
Related
About crawler.sh
crawler.sh is a fast Rust-based web crawler and SEO auditing tool that runs entirely on your own machine. Use the CLI for automation, scripts, and CI pipelines, or the desktop app for a visual dashboard with live crawl progress, SEO issue charts, and one-click exports.
Every release ships across both the CLI and the desktop app.
Download the latest version
or run crawler update
from the terminal to upgrade an existing install.