Guides
March 6, 2026

How to Find Long Content with CLI

Learn how to detect pages with over 5,000 words using crawler.sh CLI. Find excessively long pages that may need to be split for better user experience and SEO.

Mehmet Kose
3 mins read

Pages with more than 5,000 words can overwhelm readers and may indicate content that should be split into multiple focused pages. While long-form content can rank well, excessively long pages often suffer from topic drift, slower load times, and poor user engagement metrics like high bounce rates.

This guide shows you how to find every excessively long page on your website using the crawler.sh CLI.

Step 1: Install crawler.sh CLI

Install the CLI with a single command:

curl -fsSL https://install.crawler.sh | sh

This downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.

Verify the installation:

crawler --version

Step 2: Crawl the target website

Run a crawl with content extraction enabled:

crawler crawl https://example.com --extract-content

The --extract-content flag enables word count analysis for each page. Results are saved as an NDJSON file (.crawl) in the current directory.

Step 3: Run SEO audit

Run the SEO analysis on your crawl data:

crawler seo example-com.crawl

The long content check flags any page with more than 5,000 words of extracted content.

Step 4: Identify long content

Look for the Long Content section in the SEO report. Pages that commonly exceed 5,000 words:

  • Documentation pages that cover multiple topics
  • “Ultimate guide” articles that try to cover everything
  • Pages that accumulate content over time (changelogs, FAQs)
  • Auto-generated pages that aggregate content from multiple sources
  • Legal pages with extensive terms and conditions

Step 5: Fix and re-crawl

For each flagged page, evaluate whether the length is justified:

  • Split into multiple pages if the content covers distinct subtopics - each page should focus on one search intent
  • Add a table of contents if the length is justified (comprehensive guides, tutorials)
  • Remove outdated or redundant sections that no longer add value
  • Keep as-is if the content is genuinely comprehensive and engagement metrics are strong

After reviewing, re-crawl to verify:

crawler crawl https://example.com --extract-content
crawler seo example-com.crawl

Why long content matters for SEO

While long-form content is not inherently bad, pages over 5,000 words may signal structural issues. Search engines prefer focused content that matches specific search intents. A single page trying to rank for many different queries often ranks poorly for all of them. Splitting long content into focused pages gives each topic the best chance of ranking while improving the user experience.

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt