How to Discover a Site's Structure Before Scraping with MCP
Use discover_links to map a website before committing to a full crawl, saving time and requests.
Crawling a large site blind can waste time and requests. With discover_links, you can map the structure first: see which sections exist, how deep they go, and which URLs matter - before deciding what to crawl in depth.
This guide shows how to use discover_links as a reconnaissance step.
Step 1: Install crawler-mcp
Run the install script:
curl -fsSL https://install.crawler.sh/install-mcp.sh | shThis downloads the correct binary for your platform to ~/.crawler/bin/crawler-mcp.
For more detail, see the installation guide.
Step 2: Wire it into your client
Step 3: Discover the top-level structure
Ask the agent to map the site:
Use crawler-sh to discover_links on https://example.com to depth 1.
This returns every URL linked directly from the homepage, with titles and status codes. You get a quick overview of the site’s main sections.
Step 4: Go deeper where it matters
After seeing the top level, target specific sections:
Use crawler-sh to discover_links on https://example.com/products to depth 2.
This maps the product section without crawling the rest of the site. Repeat for each section you care about.
Step 5: Plan the full crawl
With the structure mapped, ask the agent to plan a targeted crawl:
Based on the links we discovered, which pages should we crawl in depth? Give me a list of URLs and a rationale.
The agent reasons over the discovered structure and recommends which pages deserve a full crawl_site or fetch_page call.