JavaScript rendering

What is JavaScript Rendering in Web Crawling

JavaScript rendering is the process of executing JavaScript on a web page to build the final DOM before content extraction.

JavaScript rendering is the process of executing JavaScript code on a web page to generate the final HTML that users see. Many modern websites ship a minimal HTML shell and use JavaScript to load content dynamically. Without JavaScript rendering, a crawler only sees the initial empty or incomplete HTML.

When you view a page in your browser, the JavaScript engine runs every script tag, fetches data from APIs, manipulates the DOM, and produces the finished page. A basic HTTP client that simply downloads the raw HTML skips this entire step. For sites built with React, Vue, Angular, or similar frameworks, this means the crawler misses almost all the actual content.

Static HTML vs rendered HTML

AspectStatic HTMLJavaScript-rendered HTML
Content sourceServer sends complete HTMLBrowser executes JS to build DOM
What crawlers seeFull content immediatelyEmpty shell or placeholder
ExamplesTraditional blogs, docs sitesReact, Vue, Angular apps
Crawling requirementStandard HTTP requestJavaScript execution engine
PerformanceFast, low overheadSlower, more memory

Consider a typical React app. The raw HTML might look like this:

<div id="root"></div>
<script src="/app.js"></script>

A crawler without JS rendering sees only the empty div. After rendering, that same div contains the full article, navigation, and footer.

Why JavaScript rendering matters for crawlers

When a crawler fetches a page, it receives the raw HTML from the server. If that HTML contains <script> tags that populate the page, the crawler must execute those scripts to access the actual content. This is critical for:

  • Content extraction - Reading article text, product descriptions, and metadata
  • Link discovery - Finding navigation links that are injected by JavaScript
  • SEO analysis - Checking titles, descriptions, and headings that may be set by JS
  • Single-page applications - Crawling apps where every route is rendered client-side
  • Lazy-loaded content - Images, comments, or related articles that load on scroll
  • Dynamic meta tags - Open Graph tags or canonical URLs set by JavaScript after page load

Search engines like Google can render JavaScript, but their rendering queue has limits. A page that relies heavily on JavaScript may be crawled less frequently or with outdated content if the rendering pipeline lags behind the initial crawl.

Approaches to JavaScript rendering

  • Headless browsers - Full Chromium or WebKit engines that run the page like a real browser. These are accurate but resource-intensive. Each page consumes significant memory and CPU.
  • Lightweight JS engines - QuickJS-based renderers that execute JavaScript without the full browser overhead. Faster and lighter, though they may lack some browser APIs.
  • Hybrid approaches - Detecting whether a page needs JS rendering and applying it selectively. This avoids wasting resources on static pages while ensuring dynamic content is captured.
  • Prerendering services - External services that render pages on demand and cache the result. Useful for large sites but add latency and cost.

When you need JavaScript rendering

You need JS rendering if your target site shows any of these characteristics:

  • The raw HTML contains little or no visible text
  • Content loads after an initial spinner or skeleton UI
  • Navigation uses client-side routing without full page reloads
  • Product listings or search results come from API calls
  • Meta tags or canonical URLs are set by JavaScript
  • The site uses a modern frontend framework

You can skip JS rendering if the site serves complete HTML server-side, as most blogs, documentation sites, and traditional CMS-driven pages do.

How crawler.sh handles JavaScript rendering

crawler.sh includes a built-in JavaScript rendering engine based on QuickJS. It executes JavaScript, builds the DOM, and extracts the rendered HTML. The engine implements browser APIs like document.querySelector, window.history, setTimeout, URL, Blob, and FileReader so that most scripts run without modification.

The crawler offers three modes:

  • Auto-detect - The site profiler analyzes the first few pages and enables JS rendering only when needed. It looks for empty body shells, JavaScript framework markers, and script-to-text ratios.
  • Always - JS rendering is applied to every page. Best when you know the entire site requires it.
  • Never - Only the raw HTML is used. Fastest option for static sites.

This selective approach avoids the performance penalty of rendering static pages while ensuring dynamic content is captured. The profiler also sets an appropriate crawl posture for JavaScript-heavy sites, adding extra drain time and retry attempts to handle slower-loading content.

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt