ScrapeLab MCP

ScrapeLab MCP

Enables undetectable web scraping and browser automation for AI agents with 84 tools including stealth navigation, element extraction, network interception, and auto cookie consent dismissal. Bypasses anti-bot systems like Cloudflare and DataDome while providing LLM-ready markdown output and full Chrome DevTools Protocol access.

Category
访问服务器

README

<div align="center">

<img src="media/logo_mcp1.png" alt="ScrapeLab MCP" width="500"/>

ScrapeLab MCP

The most complete stealth browser MCP server for AI agents.

84 tools. Undetectable by anti-bot systems. Full CDP access.
LLM-ready markdown. Auto cookie consent dismiss (100+ CMPs).
Accessibility snapshots, PDF export, HAR capture, network hooks, element cloning.

MCP License Python Tools

</div>


What is this?

An MCP server that gives AI agents (Claude, Cursor, Windsurf, etc.) a fully undetectable browser with 84 automation tools. Built on nodriver + Chrome DevTools Protocol + FastMCP.

Why not Playwright MCP? Playwright is detectable. Sites with Cloudflare, DataDome, or any anti-bot system will block it. ScrapeLab uses nodriver (the successor of undetected-chromedriver) — no navigator.webdriver flag, no automation fingerprints, no detection.

Key differentiators

Feature ScrapeLab MCP Playwright MCP Stealth Browser MCP
Anti-bot bypass (Cloudflare, DataDome) Yes No Yes
Markdown output (LLM-ready) Yes Yes No
Cookie consent auto-dismiss (100+ CMPs) Yes No No
Accessibility snapshots Yes Yes No
PDF export Yes Yes No
HAR export Yes No No
Network interception + hooks Deep (Python hooks) Routes only Deep
Element cloning (styles, events, animations) Full CDP No Full CDP
Progressive element cloning Yes No Yes
Tools 84 61 90
Modular sections (enable/disable) Yes Capabilities Yes

LLM-Ready Markdown

get_page_content returns clean markdown instead of raw HTML — 98-99% smaller, ready for LLM consumption.

Mode Engine Best for Size reduction
readability=False (default) html2text Full page structure, navigation, all content ~98%
readability=True trafilatura Article/main content only, precision extraction ~99%

Both modes strip scripts, styles, SVGs, cookie banners, navigation chrome, and HTML comments before conversion.

Cookie Consent Auto-Dismiss

Every navigate call automatically dismisses cookie/GDPR consent popups. No manual clicks, no leftover overlays blocking your scraper.

Three-layer system:

  1. DuckDuckGo autoconsent — 2863 rules covering 100+ consent management platforms (iubenda, Cookiebot, OneTrust, Quantcast, TrustArc, etc.)
  2. CMP JS API fallback — Calls platform APIs directly from the main page (_sp_.destroyMessages(), OneTrust.AllowAll(), __tcfapi, Didomi, Cookiebot) — handles cross-origin iframe popups like SourcePoint
  3. DOM click fallback — Catches multi-step consent flows (e.g. iubenda's 2-click Italian flow) by re-clicking accept buttons

Disable per-instance with spawn_browser(auto_dismiss_consent=False).


Quickstart

1. Clone and install

git clone https://github.com/competitorch/ScrapeLabMCP.git
cd ScrapeLabMCP
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Add to your MCP client

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "scrapelab-mcp": {
      "command": "/path/to/ScrapeLabMCP/.venv/bin/python",
      "args": ["/path/to/ScrapeLabMCP/src/server.py"]
    }
  }
}

Claude Code CLI:

claude mcp add-json scrapelab-mcp '{
  "type": "stdio",
  "command": "/path/to/.venv/bin/python",
  "args": ["/path/to/src/server.py"]
}'

3. Use it

You: "Open a browser and navigate to example.com"
You: "Take a screenshot and get the accessibility snapshot"
You: "Get the page content as markdown"
You: "Export the page as PDF"
You: "Show me all network requests and export as HAR"

Tools Reference (84 tools)

Browser Management (10 tools)

Tool Description
spawn_browser Launch undetectable browser instance (headless, proxy, custom UA, auto-consent)
navigate Navigate to URL with wait conditions + auto cookie consent dismiss
close_instance Clean shutdown of browser instance
list_instances List all active browser instances
get_instance_state Full page state (URL, cookies, storage, viewport)
go_back / go_forward Browser history navigation
reload_page Reload with optional cache bypass
get_accessibility_snapshot Structured accessibility tree — the fastest way for an LLM to understand a page
save_as_pdf Export page as PDF with full layout control

Element Interaction (11 tools)

Tool Description
query_elements Find elements by CSS/XPath with visibility info
click_element Natural click with fallback strategies
type_text Human-like typing
paste_text Instant paste via CDP
scroll_page Directional scrolling
wait_for_element Smart wait with timeout
execute_script Run JavaScript in page context
select_option Dropdown selection
get_element_state Element properties and bounding box
take_screenshot Screenshot (viewport, full page, or element)
get_page_content HTML, text, or markdown (readability=True for article extraction)

Element Extraction (8 tools)

Deep extraction with optional save_to_file=True on every tool.
Style extraction supports method="js" or method="cdp" for maximum accuracy.

Tool Description
extract_element_styles 300+ CSS properties, pseudo-elements, inheritance chain
extract_element_structure DOM tree, attributes, data attributes, children
extract_element_events Event listeners, inline handlers, framework detection
extract_element_animations CSS animations, transitions, transforms, keyframes
extract_element_assets Images, backgrounds, fonts, icons, videos
extract_related_files Linked CSS/JS files, imports, modules
clone_element_complete Master clone: all of the above in one call (method="comprehensive" or "cdp")

Progressive Cloning (10 tools)

Lazy-load element data on demand — start lightweight, expand what you need.

Tool Description
clone_element_progressive Base structure with element_id for on-demand expansion
expand_styles / expand_events / expand_children Expand specific data categories
expand_css_rules / expand_pseudo_elements / expand_animations Expand detailed styling data
list_stored_elements / clear_stored_element / clear_all_elements Manage stored elements

Network & Traffic (12 tools)

Deep network monitoring with interception, search, and standard export formats.

Tool Description
list_network_requests All captured requests with type filtering
get_request_details / get_response_details / get_response_content Inspect individual requests
search_network_requests Search by URL pattern, method, status, body content
modify_headers Modify request headers for future requests
set_network_capture_filters / get_network_capture_filters Control what gets captured
export_network_data / import_network_data JSON export/import
export_har Export as HAR 1.2 — importable in Chrome DevTools, Postman, Fiddler

Dynamic Hooks (7 tools)

AI-generated Python functions that intercept and modify network traffic in real-time.

Tool Description
create_dynamic_hook Full hook with custom Python function
create_simple_dynamic_hook Template hook (block, redirect, add_headers, log)
list_dynamic_hooks / get_dynamic_hook_details / remove_dynamic_hook Manage hooks
get_hook_documentation Docs for writing hooks (overview, requirements, examples, patterns)
validate_hook_function Validate hook code before deploying

CDP Functions (12 tools)

Direct Chrome DevTools Protocol access for advanced automation.

Tool Description
execute_cdp_command Raw CDP command execution
discover_global_functions / discover_object_methods Discover page APIs
call_javascript_function / execute_function_sequence Call JS functions
inject_and_execute_script Inject and run scripts
inspect_function_signature Inspect function signatures
create_persistent_function Functions that survive navigation
create_python_binding / execute_python_in_browser Python-in-browser via py2js
get_execution_contexts / list_cdp_commands / get_function_executor_info CDP introspection

Cookies & Storage (3 tools)

Tool Description
get_cookies / set_cookie / clear_cookies Cookie management

Tab Management (5 tools)

Tool Description
new_tab / list_tabs / switch_tab / close_tab / get_active_tab Full tab lifecycle

Debugging (5 tools)

Tool Description
get_debug_view / clear_debug_view / export_debug_logs / get_debug_lock_status Debug system
validate_browser_environment_tool Diagnose platform and browser issues

Modular Architecture

Load only what you need:

# Full suite (84 tools)
python src/server.py

# Core only — browser + element interaction
python src/server.py --minimal

# Disable specific sections
python src/server.py --disable-cdp-functions --disable-progressive-cloning

# List all sections
python src/server.py --list-sections

Sections

Section Tools Description
browser-management 10 Core browser ops, accessibility, PDF
element-interaction 11 Click, type, scroll, screenshot, markdown
element-extraction 8 Deep element cloning with save_to_file
network-debugging 12 Network monitoring, HAR export
cdp-functions 12 Raw CDP access
progressive-cloning 10 Lazy element expansion
cookies-storage 3 Cookie management
tabs 5 Tab management
debugging 5 Debug tools
dynamic-hooks 7 Network hook system

Environment Variables

Variable Default Description
SCRAPELAB_IDLE_TIMEOUT 5 Minutes before idle browser instances are auto-closed
PORT 8000 Port for HTTP/SSE transport

Troubleshooting

No compatible browser found — Install Chrome, Chromium, or Edge. Run validate_browser_environment_tool() to diagnose.

Too many tools for your use case — Use --minimal or --disable-<section>.

Browser instances piling up — Instances auto-close after 5 minutes of inactivity (configurable via SCRAPELAB_IDLE_TIMEOUT).


License

MIT — see LICENSE.


<div align="center">

Built by Edoardo Nardi
Stealth engine powered by nodriver

</div>

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选