How to Find Missing Content with CLI
Learn how to detect pages with no extractable content using crawler.sh CLI. Find empty or content-less pages that offer no value to search engines or visitors.
Pages with no extractable content are invisible to search engines. If a page has no text content - just navigation, footers, or JavaScript-rendered placeholders - search engines have nothing to index or rank. These pages waste crawl budget and can drag down your site’s overall quality signals.
This guide shows you how to find every page with missing content using the crawler.sh CLI.
Step 1: Install crawler.sh CLI
Install the CLI with a single command:
curl -fsSL https://install.crawler.sh | shThis downloads the correct binary for your operating system and architecture, places it in ~/.crawler/bin/, and adds it to your PATH. Restart your terminal or run source ~/.bashrc (or ~/.zshrc) to pick up the new PATH entry.
Verify the installation:
crawler --versionStep 2: Crawl the target website
Run a crawl with content extraction enabled:
crawler crawl https://example.com --extract-contentThe --extract-content flag tells the crawler to extract the main content from each page and calculate word counts. Without this flag, content checks will not run. Results are saved as an NDJSON file (.crawl) in the current directory.
Step 3: Run SEO audit
Run the SEO analysis on your crawl data:
crawler seo example-com.crawlThe missing content check flags every page where the content extractor found no meaningful text content.
Step 4: Identify missing content
Look for the Missing Content section in the SEO report. Pages with no content often include:
- Login and authentication pages
- Redirect placeholder pages
- Pages that load content entirely via JavaScript after page load
- Empty category or tag pages with no posts
- PDF or image viewers with no surrounding text
- Pages under construction
Step 5: Fix and re-crawl
For each flagged page, decide whether it should have content:
- Add content to pages that should rank in search (product pages, articles, landing pages)
- Noindex pages that intentionally have no content (login, utility pages) to prevent them from wasting crawl budget
- Fix JavaScript rendering if content exists but is not in the initial HTML response
- Remove or redirect truly empty pages that serve no purpose
After fixing, re-crawl to verify:
crawler crawl https://example.com --extract-contentcrawler seo example-com.crawlWhy missing content matters for SEO
Search engines rank pages based on their content. A page with no content has nothing to match against search queries, making it essentially invisible in search results. Worse, a high percentage of content-less pages signals to search engines that your site may be low quality. Ensuring every indexable page has meaningful content is fundamental to SEO performance.