thin content

Thin content refers to web pages with little or no substantive content, offering minimal value to users and search engines.

Thin content refers to web pages that have little or no meaningful content. These pages provide minimal value to users and can negatively impact a site’s search engine rankings. Search engines aim to surface pages that thoroughly answer user queries, and thin pages fail this standard.

Types of thin content

  • Pages with very few words - Pages under 200 words often lack the depth needed to be useful. Not every page needs to be long, but pages targeting competitive queries need substance.
  • Pages with no extractable content - Some pages exist in the HTML but contain no readable text content. This can happen with pages that rely entirely on images, videos, or JavaScript-rendered content that the crawler cannot extract.
  • Duplicate or near-duplicate content - Pages that repeat content found elsewhere on the site without adding new value.
  • Auto-generated pages - Tag pages, search results pages, or faceted navigation pages with no unique content.

Why thin content matters

Search engines may demote sites with large amounts of thin content. A few thin pages are unlikely to cause harm, but a pattern of thin content across many pages signals low quality to search engines. In severe cases, this can trigger manual penalties.

How crawler.sh helps

Run crawler crawl --extract-content to extract and measure content on every page. The crawler seo command flags pages with no content (missing content) and pages with fewer than 200 words (short content). It also flags pages over 5,000 words (long content) that may need splitting. Review thin pages and either add meaningful content, consolidate them with other pages, or apply noindex to keep them out of search results.

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt