How is discover_links different from crawl_site?

discover_links returns only URLs and titles, with no full page content. It is faster and lighter. crawl_site returns the full Markdown and metadata for every page. Use discover_links for reconnaissance and crawl_site for deep reading.

Can I use discover_links on a JavaScript-heavy site?

Yes. If the site requires JavaScript to render navigation links, set render_js to always or auto in the tool call. The crawler will render the page before extracting links.

What depth should I use for site mapping?

Depth 1 maps every page linked directly from the homepage. Depth 2 covers subsections. Depth 3 is usually enough for a complete site map without excessive requests.

How to Discover a Site's Structure Before Scraping with MCP

Crawling a large site blind can waste time and requests. With discover_links, you can map the structure first: see which sections exist, how deep they go, and which URLs matter - before deciding what to crawl in depth.

This guide shows how to use discover_links as a reconnaissance step.

Step 1: Install crawler-mcp

Run the install script:

curl -fsSL https://install.crawler.sh/install-mcp.sh | sh

This downloads the correct binary for your platform to ~/.crawler/bin/crawler-mcp.

For more detail, see the installation guide.

Step 2: Wire it into your client

Step 3: Discover the top-level structure

Ask the agent to map the site:

Use crawler-sh to discover_links on https://example.com to depth 1.

This returns every URL linked directly from the homepage, with titles and status codes. You get a quick overview of the site’s main sections.

Step 4: Go deeper where it matters

After seeing the top level, target specific sections:

Use crawler-sh to discover_links on https://example.com/products to depth 2.

This maps the product section without crawling the rest of the site. Repeat for each section you care about.

Step 5: Plan the full crawl

With the structure mapped, ask the agent to plan a targeted crawl:

Based on the links we discovered, which pages should we crawl in depth? Give me a list of URLs and a rationale.

The agent reasons over the discovered structure and recommends which pages deserve a full crawl_site or fetch_page call.