May 14, 2026

How to Discover a Site's Structure Before Scraping with MCP

Use discover_links to map a website before committing to a full crawl, saving time and requests.

Mehmet Kose
2 mins read

Crawling a large site blind can waste time and requests. With discover_links, you can map the structure first: see which sections exist, how deep they go, and which URLs matter - before deciding what to crawl in depth.

This guide shows how to use discover_links as a reconnaissance step.

Step 1: Install crawler-mcp

Run the install script:

curl -fsSL https://install.crawler.sh/install-mcp.sh | sh

This downloads the correct binary for your platform to ~/.crawler/bin/crawler-mcp.

For more detail, see the installation guide.

Step 2: Wire it into your client

Step 3: Discover the top-level structure

Ask the agent to map the site:

Use crawler-sh to discover_links on https://example.com to depth 1.

This returns every URL linked directly from the homepage, with titles and status codes. You get a quick overview of the site’s main sections.

Step 4: Go deeper where it matters

After seeing the top level, target specific sections:

Use crawler-sh to discover_links on https://example.com/products to depth 2.

This maps the product section without crawling the rest of the site. Repeat for each section you care about.

Step 5: Plan the full crawl

With the structure mapped, ask the agent to plan a targeted crawl:

Based on the links we discovered, which pages should we crawl in depth? Give me a list of URLs and a rationale.

The agent reasons over the discovered structure and recommends which pages deserve a full crawl_site or fetch_page call.

Frequently Asked Questions

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt