Skip to content

Crawler Cloud - Hosted Crawling API

Crawler Cloud is a hosted version of crawler.sh that runs the same Rust crawling and SEO engine you use locally, but on managed infrastructure. Instead of starting a crawl from your laptop or a CI runner, you submit jobs to an API endpoint, and Crawler Cloud handles fetching, JavaScript rendering, content extraction, SEO analysis, and result storage. The desktop app and CLI continue to work the same way; Crawler Cloud is an additional surface for teams that need scheduled crawls, larger page budgets, and a shared dashboard across an organization.

The goal is to keep the developer ergonomics of the CLI (NDJSON output, deterministic SEO checks, the same .crawl file format) while removing the operational work of running crawls on your own machines. If you have ever wanted to crawl a marketing site every night and diff the results, fan out hundreds of small crawls across a portfolio of domains, or pipe SEO issues into a notification channel, Crawler Cloud is being built for those workflows.

REST API

Start crawls, check status, and retrieve results via a simple REST API. Compatible with any language or platform.

Scheduled Crawls

Set up recurring crawls on a schedule - daily, weekly, or custom intervals. Track changes over time.

Webhooks

Receive real-time notifications when crawls complete. Integrate with your existing pipelines.

Dashboard

Monitor crawl jobs, browse results, and manage your account through a web dashboard.

The API will follow a standard job model. You POST a crawl request with a target URL and configuration (max pages, max depth, concurrency, JavaScript rendering mode, content extraction toggle), get back a job id, and poll a status endpoint or wait for a webhook. Completed jobs return a downloadable .crawl NDJSON file plus pre-computed SEO summaries, so you can either consume results directly or pipe them into the local crawler info, crawler export, and crawler seo subcommands. Authentication uses bearer tokens scoped to an organization.

Recurring crawls are defined per project with cron-style expressions, sensible defaults (daily at 03:00 UTC), and configurable retry policies for transient network failures or rate limiting. Each scheduled run is stored as an immutable artifact, which makes it straightforward to diff two crawls and surface what changed: new pages, removed pages, status code regressions, broken outgoing links, missing canonicals, drifting titles, or content freshness signals that have gone stale. This is the workflow that drives the dashboard’s change view.

When a crawl finishes, Crawler Cloud delivers a signed webhook with the job id, target URL, page count, status distribution, top SEO issue categories, and a short-lived download URL for the full result file. Deliveries use exponential backoff on non-2xx responses, and every payload is signed with an HMAC header so you can verify authenticity before triggering downstream work like opening tickets, posting to Slack, or updating a content audit dashboard.

The web dashboard surfaces the same views as the desktop app (Live Feed, SEO Issues, Page Status, Redirects, Content Freshness) but across every crawl in your organization, with per-project history and team access controls. You can browse extracted Markdown, download exports, share read-only links with stakeholders, and manage scheduled jobs without touching the API.

We are actively developing Crawler Cloud. Sign up for our newsletter on the crawler.sh homepage to get notified when the private beta opens, and email hello@crawler.sh if you want to be considered for early access.