Crawler CLI Overview

What is Crawler CLI?

Crawler CLI is a command-line web crawler. It crawls websites concurrently, extracts content as Markdown, analyzes SEO issues across 16 categories, and exports results in multiple formats - all from your terminal.

# Install
curl -fsSL https://install.crawler.sh | sh

# Verify
crawler --help

See the Installation guide for all installation methods and uninstall instructions.

Standard Workflow

The CLI follows a four-step workflow: crawl, inspect, analyze, and export.

Crawl a website
Section titled “Crawl a website”
```
crawler crawl https://example.com -p 200
```
Crawls up to 200 pages and saves results to example-com.crawl.
Inspect the results
Section titled “Inspect the results”
```
crawler info example-com.crawl
```
Displays domain, page count, file size, HTTP status distribution, and response time statistics (average, fastest, slowest).
Analyze SEO issues
Section titled “Analyze SEO issues”
```
crawler seo example-com.crawl
```
Runs 16 automated SEO checks and displays issues grouped by category with affected URLs.
Export results
Section titled “Export results”
```
crawler seo example-com.crawl --export csv
crawler export example-com.crawl -f json
```
Export SEO issues as CSV or TXT, and convert crawl data to JSON or Sitemap XML.

Command Examples

Basic Crawl

# Crawl with defaults (100 pages, depth 10, 5 concurrent)
crawler crawl https://example.com

# URL prefix is optional
crawler crawl example.com

Custom Limits

# Large site: 500 pages, depth 5, 10 concurrent requests
crawler crawl -p 500 -d 5 -c 10 https://example.com

# Quick surface crawl: 20 pages, depth 2
crawler crawl -p 20 -d 2 https://example.com

Output Formats

# Default NDJSON format (creates example-com.crawl)
crawler crawl https://example.com

# JSON output
crawler crawl -f json https://example.com

# Sitemap XML
crawler crawl -f sitemap https://example.com

# Custom output path
crawler crawl -f json -o site-data.json https://example.com

Fast Crawling

# Disable content extraction for speed
crawler crawl --no-extract https://example.com

# Combine for maximum speed
crawler crawl --no-extract --delay 50 -c 10 -p 500 https://example.com

Verbose and Quiet Modes

# Verbose: show detailed logging
crawler crawl -v https://example.com

# Quiet: only print errors and the output file path
crawler crawl -q https://example.com

Inspecting Crawl Files

# View summary statistics
crawler info example-com.crawl

Output includes domain, total page count, file size, HTTP status code distribution, and response time stats (average, fastest, slowest with page paths).

SEO Analysis

# Display SEO issues in terminal
crawler seo example-com.crawl

# Export as CSV (two columns: Issue Type, URL)
crawler seo example-com.crawl --export csv

# Export as human-readable TXT
crawler seo example-com.crawl --export txt

# Custom export path
crawler seo example-com.crawl --export txt -o report.txt

Export and Convert

# Convert .crawl to JSON
crawler export example-com.crawl -f json

# Convert .crawl to sitemap XML
crawler export example-com.crawl -f sitemap

# Custom output path
crawler export example-com.crawl -f sitemap -o sitemap.xml

Example Workflows

# Full workflow: crawl, inspect, analyze SEO, export
crawler crawl https://example.com -p 200
crawler info example-com.crawl
crawler seo example-com.crawl
crawler seo example-com.crawl --export csv

# Quick crawl + sitemap generation
crawler crawl https://example.com -f sitemap

# Deep crawl with content extraction, then convert
crawler crawl https://example.com -p 1000 -d 20 -c 10
crawler export example-com.crawl -f json

Tutorials

Generate a Sitemap from a Live Site

Crawl the site
Section titled “Crawl the site”
```
crawler crawl https://example.com -p 500 -d 20
```
Crawl up to 500 pages with a max depth of 20 to capture the full site structure.
Export as sitemap
Section titled “Export as sitemap”
```
crawler export example-com.crawl -f sitemap
```
Creates example-com-sitemap.xml containing all pages with 2xx status codes.

Analyze SEO Issues and Export a Report

Crawl with content extraction
Section titled “Crawl with content extraction”
```
crawler crawl https://example.com -p 200
```
Content extraction is enabled by default, which is required for word count and content-related SEO checks.
View SEO analysis
Section titled “View SEO analysis”
```
crawler seo example-com.crawl
```
Displays all 16 SEO check categories with affected URLs in the terminal.
Export as CSV
Section titled “Export as CSV”
```
crawler seo example-com.crawl --export csv
```
Creates example-com-seo.csv with two columns: Issue Type and URL.
Export as readable report
Section titled “Export as readable report”
```
crawler seo example-com.crawl --export txt -o seo-report.txt
```
Creates a human-readable report grouped by issue category with indented URLs.

Extract Content as Markdown

Crawl with content extraction enabled
Section titled “Crawl with content extraction enabled”
```
crawler crawl https://example.com -p 100
```
By default, extract_content is enabled. Each crawled page includes a markdown field with the extracted content.
Export to JSON
Section titled “Export to JSON”
```
crawler export example-com.crawl -f json
```
The JSON output includes markdown, word_count, byline, and excerpt fields for each page where content was extracted.

Inspect the output

The JSON file contains an array of page objects:

[
  {
    "url": "https://example.com/article",
    "title": "Example Article",
    "status": 200,
    "markdown": "# Example Article\n\nContent here...",
    "word_count": 350,
    "byline": "Author Name",
    "excerpt": "A brief summary..."
  }
]

Performance Tips

Disable content extraction

Use --no-extract to skip content extraction. This significantly reduces per-page processing time and output file size when you only need URLs and metadata.

crawler crawl --no-extract https://example.com

Tune concurrency

Increase --concurrency for faster crawling on sites that can handle it:

crawler crawl -c 10 -p 500 https://example.com

Reduce delay

Lower --delay for faster crawling (use responsibly):

crawler crawl --delay 50 https://example.com

Crawler CLI Overview

What is Crawler CLI?

Standard Workflow

Crawl a website

Inspect the results

Analyze SEO issues

Export results