MCP 服务器

krwl3r

MCP server for web scraping and browser automation, enabling AI agents to extract clean, token-efficient content from web pages.

README

██╗  ██╗██████╗ ██╗    ██╗██╗     ██████╗ ██████╗
██║ ██╔╝██╔══██╗██║    ██║██║     ╚════██╗██╔══██╗
█████╔╝ ██████╔╝██║ █╗ ██║██║      █████╔╝██████╔╝
██╔═██╗ ██╔══██╗██║███╗██║██║      ╚═══██╗██╔══██╗
██║  ██╗██║  ██║╚███╔███╔╝███████╗██████╔╝██║  ██║
╚═╝  ╚═╝╚═╝  ╚═╝ ╚══╝╚══╝ ╚══════╝╚═════╝ ╚═╝  ╚═╝

// it crawls so your agents don't have to

KRWL3R is written in 1337speak, referencing Linkin Park's "KRWLNG" from the Reanimation album (2002) — where "Crawling" was reimagined without vowels. This project does the same: reimagines web crawling for the AI agent era.

What is KRWL3R

KRWL3R is a web intelligence engine purpose-built for AI agents. It combines two battle-tested open source projects into a unified, agent-friendly interface:

Scrapling — adaptive scraping with auto-healing selectors that survive website redesigns
PinchTab — headless browser control with intelligent text extraction (~800 tokens per page)

Instead of dumping raw HTML at your LLM, KRWL3R extracts clean, structured, token-efficient content — and exposes it through MCP, HTTP API, CLI, and ACP interfaces so any agent can use it.

Features

Category	What you get
Stealth scraping	Anti-bot evasion, fingerprint rotation, realistic browser profiles
Auto-healing selectors	Selectors adapt when sites change layout — no more broken scrapers
Dynamic content	Full JavaScript rendering via headless Chrome
Token-efficient output	Pages compressed to ~800 tokens with semantic structure preserved
Browser control	Click, type, scroll, screenshot — full interaction when scraping isn't enough
Multi-instance	Run parallel browser sessions for concurrent extraction
MCP server	Native Model Context Protocol — plug into Claude, Cursor, Windsurf, and more
HTTP API	REST endpoints for any language or framework
CLI	Pipe web data directly into shell workflows
ACP support	Agent Communication Protocol for Gemini CLI and other ACP clients

Quick Start

Install

pip install krwl3r

Scrape a page

from krwl3r import Scraper

scraper = Scraper()
result = scraper.extract("https://example.com")

print(result.title)       # Page title
print(result.content)     # Clean text, ~800 tokens
print(result.metadata)    # Structured metadata

Control a browser

from krwl3r import Browser

async with Browser() as browser:
    page = await browser.new_page("https://example.com")
    await page.click("button#load-more")
    content = await page.extract()
    print(content.text)

Use with Claude Desktop (MCP)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "krwl3r": {
      "command": "krwl3r",
      "args": ["mcp"]
    }
  }
}

Then ask Claude: "Scrape the pricing page at example.com and summarize the plans."

Compatibility

KRWL3R works with any AI tool that supports MCP, HTTP, or CLI interfaces.

Client	Protocol	Status
Claude Desktop	MCP	Supported
Claude Code	MCP	Supported
Cursor	MCP	Supported
Windsurf	MCP	Supported
OpenCode	MCP	Supported
Gemini CLI	ACP	Supported
Codex CLI	HTTP / CLI	Supported
Kimi CLI	HTTP / CLI	Supported
Forge	HTTP / MCP	Supported
Any HTTP client	REST API	Supported

Architecture

                         ┌─────────────────────────────┐
                         │        AI  AGENTS            │
                         │  Claude, Gemini, Codex, ...  │
                         └──────────┬──────────────────┘
                                    │
              ┌─────────────────────┼─────────────────────┐
              │                     │                      │
         ┌────▼────┐          ┌────▼────┐           ┌────▼────┐
         │   MCP   │          │  HTTP   │           │   ACP   │
         │ Server  │          │  API    │           │ Server  │
         └────┬────┘          └────┬────┘           └────┬────┘
              │                     │                      │
              └─────────────────────┼──────────────────────┘
                                    │
                         ┌──────────▼──────────┐
                         │    KRWL3R  CORE     │
                         │                     │
                         │  ┌───────────────┐  │
                         │  │  Orchestrator │  │
                         │  └───────┬───────┘  │
                         │          │          │
                         │   ┌──────┴──────┐   │
                         │   │             │   │
                         │ ┌─▼──┐     ┌───▼─┐ │
                         │ │Scrp│     │Pnch │ │
                         │ │lng │     │Tab  │ │
                         │ └─┬──┘     └───┬─┘ │
                         │   │             │   │
                         └───┼─────────────┼───┘
                             │             │
                      ┌──────▼──┐    ┌────▼─────┐
                      │  HTTP   │    │ Headless  │
                      │Requests │    │ Chrome    │
                      └─────────┘    └──────────┘

Layer 1 — Protocol Adapters: MCP, HTTP REST, ACP, and CLI interfaces that translate agent requests into unified internal calls.

Layer 2 — Core Orchestrator: Routes requests, manages concurrency, handles retries, and selects the optimal extraction strategy.

Layer 3 — Extraction Engines: Scrapling for fast HTTP-based extraction with auto-healing selectors. PinchTab for full browser control when JavaScript rendering or interaction is required.

Layer 4 — Transport: Raw HTTP requests for static content, headless Chrome instances for dynamic pages.

Powered By

KRWL3R stands on the shoulders of two exceptional open source projects:

Scrapling

D4Vinci/Scrapling — BSD-3-Clause — ~20k stars

An undetectable, powerful web scraping library with automatic anti-bot evasion and adaptive selectors that survive website changes. Scrapling's auto-healing selector engine is what makes KRWL3R resilient — when a site redesigns, selectors adapt instead of breaking.

PinchTab

pinchtab/pinchtab — MIT — ~3k stars

A Go-based browser control and text extraction engine that produces clean, ~800-token page representations. PinchTab's intelligent content extraction is what makes KRWL3R token-efficient — agents get structured content instead of raw HTML soup.

License

MIT — use it, fork it, ship it.

Contributing

Contributions are welcome. See docs/contributing.md for guidelines.

Quick version:

Fork the repo
Create a feature branch (git checkout -b feat/my-feature)
Commit with conventional commits (feat:, fix:, docs:, chore:)
Open a pull request

Please be respectful of the upstream projects (Scrapling and PinchTab) — KRWL3R integrates them, it does not fork or replace them.

<sub>// 2026 — built for the agent era</sub>