Do I need to re-crawl my docs after every update?

Yes, if you want the bot to answer questions about the latest content. Set up a cron job or CI trigger to re-crawl your docs site nightly or after each deployment, then rebuild the index.

Can the bot handle questions not covered in the docs?

The bot will only know what is in the crawled content. For questions outside the docs, it should gracefully say it does not have that information rather than hallucinating an answer.

How do I deploy this as a Slack or Discord bot?

Wrap the Python service from Step 2 in a FastAPI endpoint, then connect it to the Slack Bolt SDK or Discord.py. Each incoming message triggers a crawl or cache lookup, and the LLM formats the response.

How to Bootstrap a Documentation QA Bot with MCP

Support teams spend hours answering the same questions. If your documentation already covers the answers, an agent can do the lookup for you. With crawler-mcp, you can crawl your docs site once, store the Markdown, and let the agent answer questions by searching the cached content.

This guide shows how to build a documentation QA bot using crawl_site.

Step 1: Install crawler-mcp

Run the install script:

curl -fsSL https://install.crawler.sh/install-mcp.sh | sh

This downloads the correct binary for your platform to ~/.crawler/bin/crawler-mcp.

For more detail, see the installation guide.

Step 2: Connect to a remote model via API

A chatbot needs a persistent backend, not a local IDE plugin. Use the MCP Python SDK or any HTTP client to bridge crawler-mcp with a remote LLM API.

Here is a minimal Python service that starts the MCP server, exposes the three tools, and forwards tool results to an LLM:

import asyncio, json, subprocess
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

# Start crawler-mcp
server = StdioServerParameters(
    command="/Users/you/.crawler/bin/crawler-mcp",
    env={"CRAWLER_TOKEN": "your-token"},
)

async def ask_llm(messages):
    # Replace with your provider: OpenAI, Anthropic, Gemini, etc.
    import openai
    response = await openai.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
    )
    return response

async def main():
    async with stdio_client(server) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools = await session.list_tools()
            # Pass tools to your LLM and let it decide which to call

The service runs crawler-mcp as a subprocess, reads its JSON-RPC stream, and translates tool calls between the LLM and the crawler. You can deploy this as a FastAPI endpoint, a Slack bot, or a Discord bot.

For a no-code start, wire crawler-mcp into Claude Desktop first, prove the workflow in chat, then port the same prompts to your API service.

Step 3: Crawl your docs site

Ask the agent to crawl your documentation:

Use crawler-sh to crawl https://docs.example.com to depth 3 with max_pages 200. Save the Markdown for each page to ./docs-cache/ as individual files.

The agent calls crawl_site, receives Markdown for every page, and writes each one to disk.

Step 4: Index the docs for search

Ask the agent to create a searchable index:

Read all files in ./docs-cache/, extract the title and first paragraph from each, and write an index.json with {url, title, summary} for every page.

The agent scans the cached files and builds a lightweight index.

Step 5: Answer support questions

When a question comes in, ask the agent to search the cache:

Someone asked “How do I configure SSO?” Search the docs cache for relevant pages and give a step-by-step answer with links.

The agent reads the cached docs, finds the relevant sections, and returns a grounded answer with source URLs.