Skip to content

Crawler MCP Tool Reference

The Crawler MCP server exposes three tools. All three honor robots.txt by default and refuse redirects to private addresses unless you opt in.

Crawl an entire website (same host) and return enriched page data for each page, including Markdown, SEO metadata, headings, and links.

ParameterTypeRequiredDefaultDescription
urlstringyes-Start URL.
max_pagesintegerno50Hard cap on pages crawled. Server enforces a tier cap on top of this.
max_depthintegerno10BFS depth limit.
render_jsstringno"auto"JavaScript rendering mode: "auto", "always", or "never".
respect_robotsbooleannotrueHonor robots.txt Disallow rules.
concurrencyintegerno5Parallel request count.
delay_msintegerno50Inter-request delay in milliseconds.
user_agentstringnoautoCustom User-Agent string.
check_outgoing_linksbooleannotrueValidate outgoing links for 404s and other errors.
allow_private_redirectsbooleannofalseAllow redirects to localhost, link-local, and RFC1918 addresses.
extract_contentbooleannotrueReturn Markdown content for each page. Set to false for faster link discovery only.
include_htmlbooleannofalseInclude the raw (post-JS-render) HTML body of each HTML page as an html field. Significantly increases response size on large crawls.
{
"pages": [
{
"url": "https://example.com/page",
"status_code": 200,
"content_type": "text/html; charset=utf-8",
"title": "Page Title",
"meta_description": "A concise summary of this page.",
"h1_text": "Main Heading",
"h1_count": 1,
"word_count": 1200,
"lang": "en",
"byline": "Author Name",
"excerpt": "First paragraph of the article...",
"site_name": "Example",
"image": "https://example.com/og-card.png",
"markdown": "# Main Heading\n\nArticle body in Markdown...",
"canonical_url": "https://example.com/page",
"meta_robots": "index, follow",
"x_robots_tag": "index, follow",
"keywords": ["keyword1", "keyword2"],
"published_time": "2026-01-15T10:00:00Z",
"modified_time": "2026-03-20T14:30:00Z",
"rel_next": "https://example.com/page?p=2",
"rel_prev": null,
"links_found": 42,
"internal_links": ["/about", "/contact"],
"outgoing_link_errors": [
{
"url": "https://example.com/broken",
"status_code": 404,
"error": "Not Found",
"link_type": "a"
}
],
"redirect_chain": [
{
"url": "http://example.com/page",
"status_code": 301,
"location": "https://example.com/page"
}
],
"redirect_loop_detected": false,
"response_time_ms": 234,
"depth": 0,
"html": null
}
],
"page_count": 1,
"cap_applied": 50,
"cap_max": 50
}
{
"url": "https://example.com",
"max_pages": 20,
"delay_ms": 500,
"check_outgoing_links": false
}

Fetch a single page and return its full enriched data. Use this for one-shot URL lookups when you do not need to walk the site.

ParameterTypeRequiredDefaultDescription
urlstringyes-Page URL.
render_jsstringno"auto"JavaScript rendering mode: "auto", "always", or "never".
user_agentstringnoautoCustom User-Agent string.
respect_robotsbooleannotrueHonor robots.txt Disallow rules.
allow_private_redirectsbooleannofalseAllow redirects to localhost, link-local, and RFC1918 addresses.
extract_contentbooleannotrueReturn Markdown content. Set to false for metadata-only lookup.
include_htmlbooleannofalseInclude the raw HTML body of the response as an html field.

A single page object with the same schema as crawl_site page entries.

{
"url": "https://example.com/page",
"render_js": "never",
"extract_content": false
}

Crawl shallowly and return discovered URLs with optional titles. Use this to understand a site’s structure before reading its content.

ParameterTypeRequiredDefaultDescription
urlstringyes-Start URL.
max_depthintegerno2BFS depth limit.
max_pagesintegernotier capHard cap on pages.
concurrencyintegerno5Parallel request count.
delay_msintegerno50Inter-request delay in milliseconds.
user_agentstringnoautoCustom User-Agent string.
respect_robotsbooleannotrueHonor robots.txt Disallow rules.
allow_private_redirectsbooleannofalseAllow redirects to localhost, link-local, and RFC1918 addresses.
extract_contentbooleannotrueInclude page titles in results. Set to false for URL-only discovery.
{
"links": [
{
"url": "https://example.com/about",
"title": "About Us",
"depth": 1,
"status_code": 200
},
{
"url": "https://example.com/blog",
"title": "Blog",
"depth": 1,
"status_code": 200
}
],
"count": 2
}

When extract_content is false, the title field is omitted from each link.

{
"url": "https://example.com",
"max_depth": 1,
"extract_content": true
}