v0.7.3: Content Freshness Signals

What’s New in v0.7.3

Content Freshness Signals (check #24)

crawler.sh now audits the date metadata on every crawled page. Search engines and AI answer engines weight recency, and pages without clear publication and modification dates lose visibility in regular search, Google Discover, News, and AI citations. The new check catches the entire failure mode: missing dates, stale content, conflicting sources, and bad data formats.

The audit runs on every crawl, not just --content runs. It flags six distinct issues:

Missing freshness signals - the page exposes no date in any source.
Stale content - the most recent valid date is older than 730 days (configurable).
Inconsistent freshness signals - two same-kind dates disagree by more than 7 days.
Invalid date format - a non-empty date string fails to parse.
dateModified before datePublished - a logical impossibility, usually a CMS bug.
Missing structured data dates - the page has Open Graph or HTTP dates but no JSON-LD datePublished / dateModified, weakening rich-result eligibility.

Where the Dates Come From

For every HTML page, crawler.sh reads up to five date sources:

JSON-LD datePublished / dateModified inside Article, BlogPosting, NewsArticle, WebPage, Report, and related types (including @graph entries).
Open Graph <meta property="article:published_time"> and <meta property="article:modified_time">.
HTTP Last-Modified response header.
Readability-extracted dates (when content extraction is enabled).

JSON-LD parsing is bounded for safety: up to 8 script tags per page, each capped at 256 KB. Malformed JSON is skipped silently, so adversarial pages cannot break the crawl.

CLI Usage

# Run the SEO audit with the default 730-day staleness threshold
crawler seo example-com.crawl

# Tighten the threshold to one year
crawler seo example-com.crawl --stale-after-days 365

# Export including freshness rows
crawler seo example-com.crawl --export csv --output report.csv

The freshness rows appear in the same crawler seo output, alongside the existing 23 categories. CSV and TXT exports include them automatically.

Desktop App

Two updates land in the desktop app:

SEO Issues card now lists the same six freshness rows alongside the existing checks.
Content Freshness card is a new dashboard card that summarizes the median page age, the percentage of pages updated in the last 90 days and the last year, and shows a per-page list with color-coded “updated X ago” badges (green up to 90 days, amber up to a year, orange up to two years, red beyond).

The Content Freshness card replaces the Newsletter card slot.

Who Benefits

SEO professionals see exactly which pages are missing the date metadata search engines need for freshness signals.
Content teams catch CMS bugs that cause dateModified to drift away from the on-page date.
AEO practitioners identify pages that AI engines are likely to skip due to missing or conflicting recency information.
News and blog publishers make sure every article ships with valid Article schema dates before publication.