How to Find Long Content with CLI
Learn how to detect pages with over 5,000 words using crawler.sh CLI. Find excessively long pages that may need to be split for better user experience and SEO.
Pages with more than 5,000 words can overwhelm readers and may indicate content that should be split into multiple focused pages. While long-form content can rank well, excessively long pages often suffer from topic drift, slower load times, and poor user engagement metrics like high bounce rates.
This guide shows you how to find every excessively long page on your website using the crawler.sh CLI.
Step 1: Install crawler.sh CLI
Install the CLI with a single command:
curl -fsSL https://install.crawler.sh | shThis downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.
Verify the installation:
crawler --versionStep 2: Crawl the target website
Run a crawl with content extraction enabled:
crawler crawl https://example.com --extract-contentThe --extract-content flag enables word count analysis for each page. Results are saved as an NDJSON file (.crawl) in the current directory.
Step 3: Run SEO audit
Run the SEO analysis on your crawl data:
crawler seo example-com.crawlThe long content check flags any page with more than 5,000 words of extracted content.
Step 4: Identify long content
Look for the Long Content section in the SEO report. Pages that commonly exceed 5,000 words:
- Documentation pages that cover multiple topics
- “Ultimate guide” articles that try to cover everything
- Pages that accumulate content over time (changelogs, FAQs)
- Auto-generated pages that aggregate content from multiple sources
- Legal pages with extensive terms and conditions
Step 5: Fix and re-crawl
For each flagged page, evaluate whether the length is justified:
- Split into multiple pages if the content covers distinct subtopics - each page should focus on one search intent
- Add a table of contents if the length is justified (comprehensive guides, tutorials)
- Remove outdated or redundant sections that no longer add value
- Keep as-is if the content is genuinely comprehensive and engagement metrics are strong
After reviewing, re-crawl to verify:
crawler crawl https://example.com --extract-contentcrawler seo example-com.crawlWhy long content matters for SEO
While long-form content is not inherently bad, pages over 5,000 words may signal structural issues. Search engines prefer focused content that matches specific search intents. A single page trying to rank for many different queries often ranks poorly for all of them. Splitting long content into focused pages gives each topic the best chance of ranking while improving the user experience.