Skip to content

Desktop Reference

The Settings card provides these parameters:

ParameterRangeDefaultDescription
Max Pages1 – 1,00020Maximum number of pages to crawl
Max Depth1 – 5010Maximum crawl depth from the start URL
Concurrency1 – 205Number of concurrent HTTP requests
Delay (ms)0 – 5,00050Delay between requests in milliseconds
Extract Contenton / offonEnable content extraction to Markdown

The desktop app also includes a Dark Mode toggle in the Settings card, persisted across sessions.

Each crawled page contains the following fields, serialized to JSON in the output formats.

FieldTypeDescription
urlStringFull URL of the page
status_codeNumberHTTP status code
content_typeString?Content-Type header value
titleString?HTML <title> text
meta_descriptionString?<meta name="description"> content
canonical_urlString?<link rel="canonical"> href
discovered_fromString?Parent URL that linked to this page
links_foundNumberNumber of new same-domain links discovered
depthNumberCrawl depth from the start URL
response_time_msNumberHTTP response time in milliseconds
markdownString?Extracted content as Markdown (when enabled)
word_countNumber?Word count of extracted content
bylineString?Author byline from content extraction
excerptString?Article excerpt from content extraction
meta_robotsString?<meta name="robots"> content
x_robots_tagString?X-Robots-Tag HTTP header value
rel_nextString?<link rel="next"> href (pagination)
rel_prevString?<link rel="prev"> href (pagination)

Fields marked with ? are optional and may be absent depending on the page.

The crawler emits real-time events during a crawl session, used to display live progress in the dashboard.

EventDataDescription
StartedurlCrawl has begun
Discoveredurl, depthNew URL found and enqueued
PageCrawledPage objectPage successfully fetched and parsed
PageErrorurl, errorFailed to fetch a page
Progresscrawled, total_discoveredPeriodic progress update
Completedtotal_pages, total_errorsCrawl finished

Complete crawl data as an array of page objects:

[
{
"url": "https://example.com",
"title": "Example",
"meta_description": "An example site",
"canonical_url": "https://example.com",
"discovered_from": null,
"status_code": 200,
"markdown": "# Example\n\nContent here...",
"word_count": 150,
"byline": "Author Name",
"excerpt": "A brief summary..."
}
]

Content fields (markdown, word_count, byline, excerpt) are only included when present.

Standard urlset document. Only includes pages with 2xx status codes. Special characters (&, <, >) are escaped in <loc> elements.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com</loc>
<lastmod>2024-01-01</lastmod>
</url>
</urlset>

Full page content as clean Markdown files bundled in a .zip archive.

Two columns: Issue Type and URL. One row per affected URL, with values double-quote escaped.

"Issue Type","URL"
"Missing titles","https://example.com/page-1"
"Short content","https://example.com/page-2"

Human-readable report grouped by issue category with indented URLs.

Missing titles (2)
https://example.com/page-1
https://example.com/page-2
Short content (1)
https://example.com/page-3

Desktop exports follow the pattern:

{domain}-{label}-{timestamp}.{ext}
  • domain - hostname with dots replaced by dashes (e.g., example-com)
  • label - export type (crawl-results, sitemap, seo-issues)
  • timestamp - ISO 8601 date/time formatted with dashes (e.g., 2026-02-22-14-30-00)
  • macOS - Apple Silicon (M1/M2/M3/M4) or Intel
  • The DMG is a universal binary that runs natively on both architectures