NDJSON

NDJSON (Newline Delimited JSON) is a data format where each line is a valid JSON object, ideal for streaming and processing large datasets.

NDJSON (Newline Delimited JSON) is a text-based data format where each line contains a complete, valid JSON object separated by newline characters. Unlike standard JSON, which wraps everything in a single array or object, NDJSON treats each line independently.

Why NDJSON is useful

Standard JSON requires the entire file to be parsed before any data can be processed. With NDJSON, each line can be read and processed independently. This makes it ideal for:

  • Streaming data - Process records as they arrive without waiting for the full dataset
  • Large datasets - No need to load the entire file into memory
  • Append-friendly - New records can be added to the end of the file without modifying existing data
  • Line-by-line processing - Standard Unix tools like grep, head, and tail work directly on NDJSON files

NDJSON vs JSON

A JSON file containing crawl results wraps all pages in an array. To add a new page, you must parse the entire file, add the entry, and rewrite it. An NDJSON file simply appends a new line. This makes NDJSON the better choice for incremental data collection like web crawling.

How crawler.sh uses NDJSON

crawler.sh uses NDJSON as its default output format. When you run crawler crawl, each crawled page is written as a single JSON line to a .crawl file. This allows the crawler to write results in real time as pages are crawled, without buffering the entire dataset in memory. You can analyze .crawl files with crawler info and convert them to other formats with crawler export.

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt