Skip to content

Crawler CLI Overview

Crawler CLI is a command-line web crawler. It crawls websites concurrently, extracts content as Markdown, analyzes SEO issues across 16 categories, and exports results in multiple formats - all from your terminal.

# Install
curl -fsSL https://install.crawler.sh | sh
# Verify
crawler --help

See the Installation guide for all installation methods and uninstall instructions.

The CLI follows a four-step workflow: crawl, inspect, analyze, and export.

  1. crawler crawl https://example.com -p 200

    Crawls up to 200 pages and saves results to example-com.crawl.

  2. crawler info example-com.crawl

    Displays domain, page count, file size, HTTP status distribution, and response time statistics (average, fastest, slowest).

  3. crawler seo example-com.crawl

    Runs 16 automated SEO checks and displays issues grouped by category with affected URLs.

  4. crawler seo example-com.crawl --export csv
    crawler export example-com.crawl -f json

    Export SEO issues as CSV or TXT, and convert crawl data to JSON or Sitemap XML.

# Crawl with defaults (100 pages, depth 10, 5 concurrent)
crawler crawl https://example.com
# URL prefix is optional
crawler crawl example.com
# Large site: 500 pages, depth 5, 10 concurrent requests
crawler crawl -p 500 -d 5 -c 10 https://example.com
# Quick surface crawl: 20 pages, depth 2
crawler crawl -p 20 -d 2 https://example.com
# Default NDJSON format (creates example-com.crawl)
crawler crawl https://example.com
# JSON output
crawler crawl -f json https://example.com
# Sitemap XML
crawler crawl -f sitemap https://example.com
# Custom output path
crawler crawl -f json -o site-data.json https://example.com
# Disable content extraction for speed
crawler crawl --no-extract https://example.com
# Combine for maximum speed
crawler crawl --no-extract --delay 50 -c 10 -p 500 https://example.com
# Verbose: show detailed logging
crawler crawl -v https://example.com
# Quiet: only print errors and the output file path
crawler crawl -q https://example.com
# View summary statistics
crawler info example-com.crawl

Output includes domain, total page count, file size, HTTP status code distribution, and response time stats (average, fastest, slowest with page paths).

# Display SEO issues in terminal
crawler seo example-com.crawl
# Export as CSV (two columns: Issue Type, URL)
crawler seo example-com.crawl --export csv
# Export as human-readable TXT
crawler seo example-com.crawl --export txt
# Custom export path
crawler seo example-com.crawl --export txt -o report.txt
# Convert .crawl to JSON
crawler export example-com.crawl -f json
# Convert .crawl to sitemap XML
crawler export example-com.crawl -f sitemap
# Custom output path
crawler export example-com.crawl -f sitemap -o sitemap.xml
# Full workflow: crawl, inspect, analyze SEO, export
crawler crawl https://example.com -p 200
crawler info example-com.crawl
crawler seo example-com.crawl
crawler seo example-com.crawl --export csv
# Quick crawl + sitemap generation
crawler crawl https://example.com -f sitemap
# Deep crawl with content extraction, then convert
crawler crawl https://example.com -p 1000 -d 20 -c 10
crawler export example-com.crawl -f json
  1. crawler crawl https://example.com -p 500 -d 20

    Crawl up to 500 pages with a max depth of 20 to capture the full site structure.

  2. crawler export example-com.crawl -f sitemap

    Creates example-com-sitemap.xml containing all pages with 2xx status codes.

  1. crawler crawl https://example.com -p 200

    Content extraction is enabled by default, which is required for word count and content-related SEO checks.

  2. crawler seo example-com.crawl

    Displays all 16 SEO check categories with affected URLs in the terminal.

  3. crawler seo example-com.crawl --export csv

    Creates example-com-seo.csv with two columns: Issue Type and URL.

  4. crawler seo example-com.crawl --export txt -o seo-report.txt

    Creates a human-readable report grouped by issue category with indented URLs.

  1. crawler crawl https://example.com -p 100

    By default, extract_content is enabled. Each crawled page includes a markdown field with the extracted content.

  2. crawler export example-com.crawl -f json

    The JSON output includes markdown, word_count, byline, and excerpt fields for each page where content was extracted.

  3. The JSON file contains an array of page objects:

    [
    {
    "url": "https://example.com/article",
    "title": "Example Article",
    "status": 200,
    "markdown": "# Example Article\n\nContent here...",
    "word_count": 350,
    "byline": "Author Name",
    "excerpt": "A brief summary..."
    }
    ]

Use --no-extract to skip content extraction. This significantly reduces per-page processing time and output file size when you only need URLs and metadata.

crawler crawl --no-extract https://example.com

Increase --concurrency for faster crawling on sites that can handle it:

crawler crawl -c 10 -p 500 https://example.com

Lower --delay for faster crawling (use responsibly):

crawler crawl --delay 50 https://example.com