MCP 服务器

Robot Resources Scraper

Web scraper and token compressor that converts HTML to clean markdown with 70-80% fewer tokens. Single-page compression and multi-page BFS crawling with auto-fallback fetch modes.

README

@robot-resources/scraper-mcp

MCP server for Scraper — context compression for AI agents.

What is Robot Resources?

Human Resources, but for your AI agents.

Robot Resources gives AI agents two superpowers:

Router — Routes each LLM call to the cheapest capable model. 60-90% cost savings across OpenAI, Anthropic, and Google.
Scraper — Compresses web pages to clean markdown. 70-80% fewer tokens per page.

Both run locally. Your API keys never leave your machine. Free, unlimited, no tiers.

Install the full suite

npx robot-resources

One command sets up everything. Learn more at robotresources.ai

About this MCP server

This package gives AI agents two tools to compress web content into token-efficient markdown via the Model Context Protocol: single-page compression and multi-page BFS crawling.

Installation

npx @robot-resources/scraper-mcp

Or install globally:

npm install -g @robot-resources/scraper-mcp

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "scraper": {
      "command": "npx",
      "args": ["-y", "@robot-resources/scraper-mcp"]
    }
  }
}

Tools

`scraper_compress_url`

Compress a single web page into markdown with 70-90% fewer tokens.

Parameters:

Parameter	Type	Required	Default	Description
`url`	string	yes	—	URL to compress
`mode`	string	no	`'auto'`	`'fast'`, `'stealth'`, `'render'`, or `'auto'`
`timeout`	number	no	`10000`	Fetch timeout in milliseconds
`maxRetries`	number	no	`3`	Max retry attempts (0-10)

Example prompt: "Compress https://docs.example.com/getting-started"

`scraper_crawl_url`

Crawl multiple pages from a starting URL using BFS link discovery.

Parameters:

Parameter	Type	Required	Default	Description
`url`	string	yes	—	Starting URL to crawl
`maxPages`	number	no	`10`	Max pages to crawl (1-100)
`maxDepth`	number	no	`2`	Max link depth (0-5)
`mode`	string	no	`'auto'`	`'fast'`, `'stealth'`, `'render'`, or `'auto'`
`include`	string[]	no	—	URL patterns to include (glob)
`exclude`	string[]	no	—	URL patterns to exclude (glob)
`timeout`	number	no	`10000`	Per-page timeout in milliseconds

Example prompt: "Crawl the docs at https://docs.example.com with max 20 pages"

Fetch Modes

Mode	How	Use when
`'fast'`	Plain HTTP	Default sites, APIs, docs
`'stealth'`	TLS fingerprint impersonation	Anti-bot protected sites
`'render'`	Headless browser (Playwright)	JS-rendered SPAs
`'auto'`	Fast → stealth fallback on 403/challenge	Unknown sites (default)