All frequently asked questions

Getting Started

How do I install crawler.sh?

Install the CLI with a single command: curl -fsSL https://install.crawler.sh | sh. The desktop app is available as a direct download from the product page. See the documentation for full details.

What platforms does crawler.sh support?

The CLI runs on macOS (Apple Silicon and Intel) and Linux (x86_64). The desktop app is available for macOS (Apple Silicon and Intel) and Linux. Windows support is on the roadmap.

Do I need to create an account?

No account is required to use the CLI or desktop app. Signing in unlocks a higher page limit (400 pages per crawl vs 50 for anonymous use). Everything runs locally on your machine.

CLI & Usage

What subcommands does the CLI provide?

The CLI has 4 subcommands: crawl (crawl a website and save results), info (inspect and analyze a .crawl file), export (convert a .crawl file to JSON or Sitemap XML), and seo (analyze SEO issues across 24 check categories).

What output formats are supported?

The default crawl output is NDJSON (.crawl extension) - streamable, one JSON object per line. You can also export to JSON arrays, W3C-compliant Sitemap XML, SEO CSV (two columns: Issue Type and URL), and SEO TXT (human-readable grouped report).

How does content extraction work?

When enabled (on by default), the crawler extracts the main article content from HTML pages and converts it to clean Markdown. The result includes the Markdown text, word count, author byline, and excerpt. Use --no-extract to disable it for faster crawls.

Does crawler.sh support JavaScript-rendered pages?

Yes. crawler.sh includes a built-in JavaScript rendering engine. In the default Auto mode, it samples the first few pages and automatically enables rendering when it detects client-side JavaScript frameworks. You can also force rendering with --render or disable it with --no-render.

What does the SEO analysis check?

The seo subcommand checks 24 categories: missing/short/long titles, missing/short/long meta descriptions, missing/short/long content, long URLs, noindex pages, nofollow pages, non-self canonicals, paginated pages, duplicate titles, duplicate descriptions, missing/multiple/empty/long/short/duplicate H1 tags, broken outgoing links, and content freshness signals (missing, stale, or inconsistent dateModified / datePublished across JSON-LD, Open Graph, and HTTP Last-Modified). Only 2xx HTML pages are analyzed.

Desktop App

What does the desktop app include?

The desktop app features 9 interactive dashboard cards: Live Feed (real-time crawled URLs), SEO Issues (automated analysis), Page Status (status code charts), Redirects (redirect chain audit), Settings (crawl configuration), Downloads (export results), Site Content (Markdown viewer), Account (sign-in and subscription), and Content Freshness (per-page age summary).

How is the desktop app different from the CLI?

Both share the same crawling engine. The CLI is designed for automation and scripting (pipe output, run in CI, etc.), while the desktop app provides a visual dashboard with charts, interactive cards, and real-time crawl monitoring.

What is included in the premium tier?

The premium tier ($99 / year) unlocks the Content Archive export - full page content as clean Markdown files in a ZIP archive. All other features (crawling, SEO analysis, JSON/Sitemap export) are available in the free tier.

Troubleshooting

My crawl stops early or hits the page limit before finishing.

Without signing in, crawls are capped at 50 pages per session. Signing in raises the limit to 400 pages on the free tier, and the Pro plan raises it to 10,000. If your crawl hits the cap before reaching the pages you care about, raise the limit, narrow the start URL to a sub-section of the site, or lower --max-depth so the crawler stays closer to the entry point.

JavaScript-rendered pages show up as nearly empty in the results.

In the default Auto mode, crawler.sh samples the first few pages and only enables JavaScript rendering when it detects a client-side framework or an empty body shell. If your site is heavy on client-side rendering but the sample missed it, force rendering with --render (CLI) or set the JS Render mode to Always in the desktop Settings card. If the issue is bot protection (Cloudflare, DataDome, PerimeterX, and similar), the SEO report will flag it and rendering alone will not bypass it.

I get permission errors or the CLI binary will not run on macOS or Linux.

On macOS, if Gatekeeper blocks the binary, run xattr -d com.apple.quarantine /usr/local/bin/crawler (or the path where the installer placed it) and try again. On Linux, make sure the binary is executable with chmod +x and that the install directory (~/.crawler/bin by default) is on your PATH. If the install script could not write to ~/.crawler/bin, set CRAWLER_INSTALL_DIR to a writable location and re-run the installer.

Community & Support

Where can I get help with crawler.sh?

Reach out via email at hello@crawler.sh for any questions or support. The documentation covers installation, usage, and all CLI options.

Frequently Asked Questions

Everything you need to know about installing, configuring, and using crawler.sh. Browse by category to find answers about the CLI tool, the desktop app, output formats, SEO analysis, content extraction, and subscription tiers. Can't find what you're looking for? Reach out on our contact page.

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt