Clean Markdown from any website. Without the cloud bill.

crawler.sh is a local crawler that turns websites into RAG-ready Markdown for AI training, fine-tuning, and agent context. Renders JavaScript with a custom engine, respects robots.txt, and runs on your laptop. No headless Chrome, no per-page fees, no API quota.

crawler.sh desktop app overview

Markdown for AI

RAG-ready Markdown from any page.

Extract the main article content from any page as clean Markdown, ready for RAG pipelines, fine-tuning corpora, or agent context. Every page ships with word count, author byline, language, and excerpt. Bulk export the entire site as a Markdown archive.

Markdown extraction for AI training

JavaScript
Rendering

SPAs rendered without headless Chrome.

A custom JavaScript render engine handles React, Vue, Next, and other SPAs without spinning up headless Chrome. Chrome 131 TLS fingerprint and shared cookie jar mean session-walled pages render with the right state. Auto-detected per site, or force on or off.

JavaScript rendering engine

Polite Crawling

robots.txt and adaptive pacing by default.

Respects Disallow, Allow, and Crawl-delay out of the box. Adapts per-host pacing on 429 and 403 responses with exponential backoff, and slows down automatically on protected sites. Important when you are building an AI dataset and the source matters.

Polite crawling with robots.txt and adaptive backoff

SEO Analysis

Automated checks across every page.

Detect missing titles, duplicate meta descriptions, noindex directives, thin content, broken links, long URLs, content freshness signals, and more. Useful before you ship a site, or before you train on one. Export issues as CSV or TXT.

SEO and AEO checks

Workflow Examples

From quick crawl to full pipeline

Loading...

Built for Every Workflow

Content Archiving

Extract readable content from any website as clean Markdown. Perfect for backups, migrations, or feeding content into other tools.

SEO Auditing

Run 24 automated checks across every page - find missing titles, duplicate descriptions, thin content, content freshness signals, and more before they hurt your rankings.

Sitemap Generation

Generate W3C-compliant Sitemap XML from a live crawl. Keep your sitemaps accurate and up to date without manual maintenance.

Site Monitoring

Crawl your site regularly to catch broken links, missing pages, and status code changes before your visitors do.

Crawl any website, find every issue, and export the data you need - all from your own machine.

crawler.shFast, local-first, and privacy-friendly
Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt