Changelog
March 5, 2026

v0.5.0: Automatic Keyword Extraction

Automatic keyword extraction from crawled pages using RAKE, with XSS/injection sanitization for safer content handling.

Mehmet Kose
Mehmet Kose
2 mins read

What’s New in v0.5.0

Automatic Keyword Extraction

Every crawled page now includes automatically extracted keywords using the RAKE (Rapid Automatic Keyword Extraction) algorithm. Keywords are ranked by relevance and included in your crawl results - no configuration needed.

This gives you instant insight into the topical focus of each page without manual analysis. Keywords appear in .crawl files, JSON exports, and the desktop app’s content view. RAKE works by identifying candidate phrases based on word frequency and co-occurrence within the page content, then scoring them by relevance. The result is a ranked list of multi-word keyword phrases that reflect the core topics covered on each page.

Practical Use Cases

Keyword extraction opens up several workflows that were previously manual or required separate tools. You can compare the extracted keywords across pages to find content overlap and cannibalization. You can verify that target keywords actually appear in the body content, not just the title and meta tags. And you can export crawl results as JSON to feed into content planning spreadsheets or reporting dashboards.

Content Sanitization

All extracted content is now sanitized against XSS and injection attacks before being stored. This protects downstream tools and workflows that consume crawl data, especially when crawling untrusted or third-party sites. The sanitization layer strips dangerous HTML constructs - including script tags, event handlers, and encoded payloads - while preserving the meaningful text content. This runs automatically on all extracted Markdown, keywords, titles, and descriptions.

Who Benefits

  • SEO professionals can quickly identify keyword gaps and topical alignment across large sites
  • Content teams get automatic keyword analysis without running separate tools
  • Developers consuming crawl data get safer, sanitized output by default

Wrap-up

A CMS shouldn't slow you down. Crawler aims to expand into your workflow — whether you're coding content models, collaborating on product copy, or launching updates at 2am.

If that sounds like the kind of tooling you want to use — try Crawler .

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt