Crawler CLI Overview
What is Crawler CLI?
Section titled “What is Crawler CLI?”Crawler CLI is a command-line web crawler. It crawls websites concurrently, extracts content as Markdown, analyzes SEO issues across 16 categories, and exports results in multiple formats - all from your terminal.
# Installcurl -fsSL https://install.crawler.sh | sh
# Verifycrawler --helpSee the Installation guide for all installation methods and uninstall instructions.
Standard Workflow
Section titled “Standard Workflow”The CLI follows a four-step workflow: crawl, inspect, analyze, and export.
-
Crawl a website
Section titled “Crawl a website”crawler crawl https://example.com -p 200Crawls up to 200 pages and saves results to
example-com.crawl. -
Inspect the results
Section titled “Inspect the results”crawler info example-com.crawlDisplays domain, page count, file size, HTTP status distribution, and response time statistics (average, fastest, slowest).
-
Analyze SEO issues
Section titled “Analyze SEO issues”crawler seo example-com.crawlRuns 16 automated SEO checks and displays issues grouped by category with affected URLs.
-
Export results
Section titled “Export results”crawler seo example-com.crawl --export csvcrawler export example-com.crawl -f jsonExport SEO issues as CSV or TXT, and convert crawl data to JSON or Sitemap XML.
Command Examples
Section titled “Command Examples”Basic Crawl
Section titled “Basic Crawl”# Crawl with defaults (100 pages, depth 10, 5 concurrent)crawler crawl https://example.com
# URL prefix is optionalcrawler crawl example.comCustom Limits
Section titled “Custom Limits”# Large site: 500 pages, depth 5, 10 concurrent requestscrawler crawl -p 500 -d 5 -c 10 https://example.com
# Quick surface crawl: 20 pages, depth 2crawler crawl -p 20 -d 2 https://example.comOutput Formats
Section titled “Output Formats”# Default NDJSON format (creates example-com.crawl)crawler crawl https://example.com
# JSON outputcrawler crawl -f json https://example.com
# Sitemap XMLcrawler crawl -f sitemap https://example.com
# Custom output pathcrawler crawl -f json -o site-data.json https://example.comFast Crawling
Section titled “Fast Crawling”# Disable content extraction for speedcrawler crawl --no-extract https://example.com
# Combine for maximum speedcrawler crawl --no-extract --delay 50 -c 10 -p 500 https://example.comVerbose and Quiet Modes
Section titled “Verbose and Quiet Modes”# Verbose: show detailed loggingcrawler crawl -v https://example.com
# Quiet: only print errors and the output file pathcrawler crawl -q https://example.comInspecting Crawl Files
Section titled “Inspecting Crawl Files”# View summary statisticscrawler info example-com.crawlOutput includes domain, total page count, file size, HTTP status code distribution, and response time stats (average, fastest, slowest with page paths).
SEO Analysis
Section titled “SEO Analysis”# Display SEO issues in terminalcrawler seo example-com.crawl
# Export as CSV (two columns: Issue Type, URL)crawler seo example-com.crawl --export csv
# Export as human-readable TXTcrawler seo example-com.crawl --export txt
# Custom export pathcrawler seo example-com.crawl --export txt -o report.txtExport and Convert
Section titled “Export and Convert”# Convert .crawl to JSONcrawler export example-com.crawl -f json
# Convert .crawl to sitemap XMLcrawler export example-com.crawl -f sitemap
# Custom output pathcrawler export example-com.crawl -f sitemap -o sitemap.xmlExample Workflows
Section titled “Example Workflows”# Full workflow: crawl, inspect, analyze SEO, exportcrawler crawl https://example.com -p 200crawler info example-com.crawlcrawler seo example-com.crawlcrawler seo example-com.crawl --export csv
# Quick crawl + sitemap generationcrawler crawl https://example.com -f sitemap
# Deep crawl with content extraction, then convertcrawler crawl https://example.com -p 1000 -d 20 -c 10crawler export example-com.crawl -f jsonTutorials
Section titled “Tutorials”Generate a Sitemap from a Live Site
Section titled “Generate a Sitemap from a Live Site”-
Crawl the site
Section titled “Crawl the site”crawler crawl https://example.com -p 500 -d 20Crawl up to 500 pages with a max depth of 20 to capture the full site structure.
-
Export as sitemap
Section titled “Export as sitemap”crawler export example-com.crawl -f sitemapCreates
example-com-sitemap.xmlcontaining all pages with 2xx status codes.
Analyze SEO Issues and Export a Report
Section titled “Analyze SEO Issues and Export a Report”-
Crawl with content extraction
Section titled “Crawl with content extraction”crawler crawl https://example.com -p 200Content extraction is enabled by default, which is required for word count and content-related SEO checks.
-
View SEO analysis
Section titled “View SEO analysis”crawler seo example-com.crawlDisplays all 16 SEO check categories with affected URLs in the terminal.
-
Export as CSV
Section titled “Export as CSV”crawler seo example-com.crawl --export csvCreates
example-com-seo.csvwith two columns:Issue TypeandURL. -
Export as readable report
Section titled “Export as readable report”crawler seo example-com.crawl --export txt -o seo-report.txtCreates a human-readable report grouped by issue category with indented URLs.
Extract Content as Markdown
Section titled “Extract Content as Markdown”-
Crawl with content extraction enabled
Section titled “Crawl with content extraction enabled”crawler crawl https://example.com -p 100By default,
extract_contentis enabled. Each crawled page includes amarkdownfield with the extracted content. -
Export to JSON
Section titled “Export to JSON”crawler export example-com.crawl -f jsonThe JSON output includes
markdown,word_count,byline, andexcerptfields for each page where content was extracted. -
Inspect the output
Section titled “Inspect the output”The JSON file contains an array of page objects:
[{"url": "https://example.com/article","title": "Example Article","status": 200,"markdown": "# Example Article\n\nContent here...","word_count": 350,"byline": "Author Name","excerpt": "A brief summary..."}]
Performance Tips
Section titled “Performance Tips”Disable content extraction
Section titled “Disable content extraction”Use --no-extract to skip content extraction. This significantly reduces per-page processing time and output file size when you only need URLs and metadata.
crawler crawl --no-extract https://example.comTune concurrency
Section titled “Tune concurrency”Increase --concurrency for faster crawling on sites that can handle it:
crawler crawl -c 10 -p 500 https://example.comReduce delay
Section titled “Reduce delay”Lower --delay for faster crawling (use responsibly):
crawler crawl --delay 50 https://example.com