superFetch MCP Server
An MCP server that fetches web pages and extracts clean, AI-friendly Markdown content using Mozilla Readability. It provides secure web access for LLMs with built-in SSRF protection and automated content cleaning for improved context retrieval and summarization.
README
superFetch MCP Server
<!-- markdownlint-disable MD033 -->
<img src="docs/logo.png" alt="SuperFetch MCP Logo" width="200">
One-Click Install
A Model Context Protocol (MCP) server that fetches web pages, extracts readable content with Mozilla Readability, and returns AI-friendly Markdown.
Built for AI workflows that need clean text, stable metadata, and safe-by-default fetching.
Great for: LLM summarization, context retrieval, knowledge base ingestion, and AI agents.
| Quick Start | Tool | Resources | Configuration | Security | Development |
[!CAUTION] This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.
Features
- Cleaner outputs for LLMs: Readability extraction with quality gates (content ratio + heading retention ≥ 70%)
- Markdown that’s easy to consume: metadata footer for HTML + configurable source injection for raw Markdown (markdown or frontmatter)
- Handles “raw content” sources: preserves markdown/text; rewrites GitHub/GitLab/Bitbucket/Gist URLs to raw
- Works for both local and hosted setups:
- Stdio mode: best for MCP clients (VS Code / Claude Desktop / Cursor)
- HTTP mode: best for self-hosting (auth, sessions, rate limiting, Host/Origin validation)
- Fast and resilient: redirect validation, timeouts, and response size limits
- Security-first defaults: URL validation + SSRF/DNS/IP blocklists (blocks private ranges and cloud metadata endpoints)
You get, in one tool call:
- Clean, readable Markdown from any public URL (docs, articles, blogs, wikis)
If you’re comparing “just call fetch()” vs superFetch: superFetch focuses on extracting the main content in a readble format for LLMs and even humans, when requested url is fetched it returns clean structured markdown that can also be saved as a resource for later use.
What it is (and isn’t)
- It is a content extraction tool: focuses on extracting readable content, not screenshots or full-page data.
- It is an MCP server: integrates with any MCP-compatible client (Claude Desktop, VS Code, Cursor, Cline, Windsurf, Codex, etc).
- It isn’t a general web scraper: it extracts main content, not all page elements.
- It isn’t a browser: it doesn’t execute JavaScript or render pages.
- It’s opinionated on safety: blocks private/internal URLs and cloud metadata endpoints by default.
Quick Start
Recommended: use stdio mode with your MCP client (no HTTP server).
Try it in 60 seconds
- Add the MCP server config (below)
- Restart your MCP client
- Call the
fetch-urltool with any public URL
What the tool returns
You’ll get structuredContent with url, resolvedUrl, optional title, and markdown (plus a superfetch://cache/... resource link when cache is enabled and content is large).
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"superFetch": {
"command": "npx",
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
}
}
}
VS Code
Add to .vscode/mcp.json in your workspace:
{
"servers": {
"superFetch": {
"command": "npx",
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
}
}
}
With Custom Configuration
Add environment variables in your MCP client config under env.
See Configuration or CONFIGURATION.md for all available options and presets.
Example output (trimmed)
{
"url": "https://example.com/docs",
"inputUrl": "https://example.com/docs",
"resolvedUrl": "https://example.com/docs",
"title": "Documentation",
"markdown": "# Getting Started\n\n...\n\n---\n\n _Documentation_ | [_Original Source_](https://example.com/docs) | _12-01-2026_"
}
Tip (Windows): If you encounter issues, try:
cmd /c "npx -y @j0hanz/superfetch@latest --stdio"
<details> <summary><strong>Other clients (Cursor, Cline, Windsurf, Codex)</strong></summary>
Cursor
- Open Cursor Settings
- Go to Features > MCP Servers
- Click "+ Add new global MCP server"
- Add this configuration:
{
"mcpServers": {
"superFetch": {
"command": "npx",
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
}
}
}
<details> <summary><strong>Codex IDE</strong></summary>
Add to your ~/.codex/config.toml file:
Basic Configuration:
[mcp_servers.superfetch]
command = "npx"
args = ["-y", "@j0hanz/superfetch@latest", "--stdio"]
With Environment Variables: See CONFIGURATION.md for examples.
Access config file: Click the gear icon -> "Codex Settings > Open config.toml"
Documentation: Codex MCP Guide
</details>
<details> <summary><strong>Cline (VS Code Extension)</strong></summary>
Open the Cline MCP settings file:
macOS:
code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
Windows:
code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json
Add the configuration:
{
"mcpServers": {
"superFetch": {
"command": "npx",
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
"disabled": false,
"autoApprove": []
}
}
}
</details>
<details> <summary><strong>Windsurf</strong></summary>
Add to ./codeium/windsurf/model_config.json:
{
"mcpServers": {
"superFetch": {
"command": "npx",
"args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
}
}
}
</details>
<details> <summary><strong>Claude Desktop (Config File Locations)</strong></summary>
macOS:
# Open config file
open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
# Or with VS Code
code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
Windows:
code %APPDATA%\Claude\claude_desktop_config.json
</details>
</details>
Use cases
1) Turn a docs page into “LLM-ready” Markdown
- Call
fetch-urlwith the docs URL - Feed the returned
markdowninto your summarizer / chunker - Use the metadata footer fields (especially Original Source) for citations
2) Fetch a GitHub/GitLab/Bitbucket file as raw markdown
- Pass the normal “web UI” URL to
fetch-url - superFetch will rewrite it to the raw content URL when possible
- This avoids navigation UI and reduces boilerplate
3) Large pages: keep responses stable with cache resources
- When content is large, the tool can include a
superfetch://cache/...resource link - In MCP clients that support resources, you can read the full content via the resource URI
- In HTTP mode, you can also download cached content via
/mcp/downloads/:namespace/:hashwhen cache is enabled
4) Safe-by-default web access for agents
- superFetch blocks private IP ranges and common cloud metadata endpoints
- If your agent needs internal access, this is intentionally not supported by default (see Security)
Installation (Alternative)
Global Installation
npm install -g @j0hanz/superfetch
# Run in stdio mode
superfetch --stdio
# Run HTTP server (requires auth token)
superfetch
From Source
git clone https://github.com/j0hanz/super-fetch-mcp-server.git
cd super-fetch-mcp-server
npm install
npm run build
Running the Server
<details> <summary><strong>stdio Mode</strong> (direct MCP integration)</summary>
node dist/index.js --stdio
</details>
<details> <summary><strong>HTTP Mode</strong> (default)</summary>
HTTP mode requires authentication. By default it binds to 127.0.0.1. Non-loopback HOST values require ALLOW_REMOTE=true. To listen on all interfaces, set HOST=0.0.0.0 or HOST=::, set ALLOW_REMOTE=true, and configure OAuth (remote bindings require OAuth).
API_KEY=supersecret npx -y @j0hanz/superfetch@latest
# Server runs at http://127.0.0.1:3000
Windows (PowerShell):
$env:API_KEY = "supersecret"
npx -y @j0hanz/superfetch@latest
For multiple static tokens, set ACCESS_TOKENS (comma/space separated).
Auth is required for /mcp and /mcp/downloads via Authorization: Bearer <token> (static mode also accepts X-API-Key).
Endpoints:
GET /health(no auth; returns status, name, version, uptime)POST /mcp(auth required)GET /mcp(auth required; SSE stream; requiresAccept: text/event-stream)DELETE /mcp(auth required)GET /mcp/downloads/:namespace/:hash(auth required)
Sessions are managed via the mcp-session-id header (see HTTP Mode Details).
</details>
Available Tools
Tool Response Notes
The tool returns structuredContent with url, inputUrl, resolvedUrl, optional title, and markdown when inline content is available. resolvedUrl may differ from inputUrl when the URL is rewritten to raw content (GitHub/GitLab/Bitbucket/Gist). On errors, error is included instead of content.
The response includes:
- a
textblock containing JSON ofstructuredContent - a
resourceblock embedding markdown when inline content is available (stdio always embeds full markdown; HTTP embeds inline markdown when it fits or when truncated) - when content exceeds the inline limit and cache is enabled, a
resource_linkblock pointing tosuperfetch://cache/...(stdio mode still embeds full markdown; HTTP mode omits embedded markdown) - error responses set
isError: trueand returnstructuredContentwitherrorandurl
fetch-url
Fetches a webpage and converts it to clean Markdown format with a metadata footer for HTML (raw markdown is preserved with source injection).
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to fetch |
Example structuredContent:
{
"url": "https://example.com/docs",
"inputUrl": "https://example.com/docs",
"resolvedUrl": "https://example.com/docs",
"title": "Documentation",
"markdown": "---\ntitle: Documentation\n---\n\n# Getting Started\n\nWelcome..."
}
Error response:
{
"url": "https://example.com/broken",
"error": "Failed to fetch: 404 Not Found"
}
Large Content Handling
- Inline markdown is capped at 20,000 characters (
maxInlineContentChars). - Stdio mode: full markdown is embedded as a
resourceblock; if cache is enabled and content exceeds the inline limit, aresource_linkis still included. - HTTP mode: if content exceeds the inline limit and cache is enabled, the response includes a
resource_linktosuperfetch://cache/...and omits embedded markdown. If cache is disabled, the inline markdown is truncated with...[truncated]. - Upstream fetch size is capped at 10 MB of HTML; larger responses fail.
Resources
| URI | Description |
|---|---|
superfetch://cache/{namespace}/{urlHash} |
Cached content entry (namespace: markdown) |
Resource listings enumerate cached entries, and subscriptions notify clients when cache entries update.
Download Endpoint (HTTP Mode)
When running in HTTP mode, cached content can be downloaded directly. Downloads are available only when cache is enabled.
Endpoint
GET /mcp/downloads/:namespace/:hash
namespace:markdown- Auth required (
Authorization: Bearer <token>; in static token mode,X-API-Keyis accepted)
Response Headers
| Header | Value |
|---|---|
Content-Type |
text/markdown; charset=utf-8 |
Content-Disposition |
attachment; filename="<name>" |
Cache-Control |
private, max-age=<CACHE_TTL> |
Example Usage
curl -H "Authorization: Bearer $TOKEN" \
http://localhost:3000/mcp/downloads/markdown/abc123.def456 \
-o article.md
Error Responses
| Status | Code | Description |
|---|---|---|
| 400 | BAD_REQUEST |
Invalid namespace or hash format |
| 404 | NOT_FOUND |
Content not found or expired |
| 503 | SERVICE_UNAVAILABLE |
Download service disabled |
Configuration
Set environment variables in your MCP client env or in the shell before starting the server.
Core Server Settings
| Variable | Default | Description |
|---|---|---|
HOST |
127.0.0.1 |
HTTP bind address |
PORT |
3000 |
HTTP server port (1024-65535) |
USER_AGENT |
superFetch-MCP/2.0 |
User-Agent header for outgoing requests |
CACHE_ENABLED |
true |
Enable response caching |
CACHE_TTL |
3600 |
Cache TTL in seconds (60-86400) |
LOG_LEVEL |
info |
Logging level (debug enables verbose logs) |
ALLOW_REMOTE |
false |
Allow non-loopback binds (OAuth required) |
ALLOWED_HOSTS |
(empty) | Additional allowed Host/Origin values |
TRANSFORM_TIMEOUT_MS |
30000 |
Worker transform timeout in ms (5000-120000) |
TOOL_TIMEOUT_MS |
50000 |
Overall tool timeout in ms (1000-300000) |
TRANSFORM_METADATA_FORMAT |
markdown |
Raw markdown metadata format (markdown or frontmatter) |
For HTTP server tuning (SERVER_HEADERS_TIMEOUT_MS, SERVER_REQUEST_TIMEOUT_MS, SERVER_KEEP_ALIVE_TIMEOUT_MS, SERVER_SHUTDOWN_CLOSE_IDLE, SERVER_SHUTDOWN_CLOSE_ALL), see CONFIGURATION.md.
Auth (HTTP Mode)
| Variable | Default | Description |
|---|---|---|
AUTH_MODE |
auto | static or oauth. Auto-selects OAuth if OAUTH_ISSUER_URL, OAUTH_AUTHORIZATION_URL, OAUTH_TOKEN_URL, or OAUTH_INTROSPECTION_URL is set |
ACCESS_TOKENS |
(empty) | Comma/space-separated static bearer tokens |
API_KEY |
(empty) | Adds a static bearer token and enables X-API-Key header |
Static mode requires at least one token (ACCESS_TOKENS or API_KEY).
OAuth (HTTP Mode)
Required when AUTH_MODE=oauth (or auto-selected by presence of OAuth URLs):
| Variable | Default | Description |
|---|---|---|
OAUTH_ISSUER_URL |
- | OAuth issuer |
OAUTH_AUTHORIZATION_URL |
- | Authorization endpoint |
OAUTH_TOKEN_URL |
- | Token endpoint |
OAUTH_INTROSPECTION_URL |
- | Introspection endpoint |
Optional:
| Variable | Default | Description |
|---|---|---|
OAUTH_REVOCATION_URL |
- | Revocation endpoint |
OAUTH_REGISTRATION_URL |
- | Dynamic client registration endpoint |
OAUTH_RESOURCE_URL |
http://<host>:<port>/mcp |
Protected resource URL |
OAUTH_REQUIRED_SCOPES |
(empty) | Required scopes (comma/space separated) |
OAUTH_CLIENT_ID |
- | Client ID for introspection |
OAUTH_CLIENT_SECRET |
- | Client secret for introspection |
OAUTH_INTROSPECTION_TIMEOUT_MS |
5000 |
Introspection timeout (1000-30000) |
Fixed Limits (Not Configurable via env)
- Fetch timeout: 15 seconds
- Max redirects: 5
- Max HTML response size: 10 MB
- Inline markdown limit: 20,000 characters
- Cache max entries: 100
- Session TTL: 30 minutes
- Session init timeout: 10 seconds
- Max sessions: 200
- Rate limit: 100 req/min per IP (60s window)
See CONFIGURATION.md for preset examples and quick-start snippets.
HTTP Mode Details
HTTP mode uses the MCP Streamable HTTP transport. The workflow is:
POST /mcpwith aninitializerequest and nomcp-session-idheader.- The server returns
mcp-session-idin the response headers. - Use that header for subsequent
POST /mcp,GET /mcp, andDELETE /mcprequests.
If the mcp-protocol-version header is missing, the server rejects the request. Only mcp-protocol-version: 2025-11-25 is supported.
GET /mcp and DELETE /mcp require mcp-session-id. POST /mcp without an initialize request will return 400.
Additional HTTP transport notes:
POST /mcpshould advertiseAccept: application/json, text/event-stream(the server normalizes missing or*/*Accept headers).GET /mcprequiresAccept: text/event-stream(otherwise 406).- JSON-RPC batch requests are not supported (400).
If the server reaches its session cap (200), it evicts the oldest session when possible; otherwise it returns a 503.
Host and Origin headers are always validated. Allowed values include loopback hosts, the configured HOST (if not a wildcard), and any entries in ALLOWED_HOSTS. When binding to 0.0.0.0 or ::, set ALLOWED_HOSTS to the hostnames clients will send.
Security
SSRF Protection
Blocked destinations include:
- Loopback and unspecified addresses (
127.0.0.0/8,::1,0.0.0.0,::) - Private/ULA ranges (
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,fc00::/7) - Link-local and shared address space (
169.254.0.0/16,100.64.0.0/10,fe80::/10) - Multicast/reserved ranges (
224.0.0.0/4,240.0.0.0/4,ff00::/8) - IPv6 transition ranges (
64:ff9b::/96,64:ff9b:1::/48,2001::/32,2002::/16) - Cloud metadata endpoints (AWS/GCP/Azure/Alibaba) like
169.254.169.254,metadata.google.internal,metadata.azure.com,100.100.100.200,instance-data - Internal suffixes such as
.localand.internal
DNS resolution is performed and blocked if any resolved IP matches a blocked range.
URL Validation
- Only
httpandhttpsURLs - No embedded credentials in URLs
- Max URL length: 2048 characters
- Hostnames ending in
.localor.internalare rejected
Host/Origin Validation (HTTP Mode)
- Host header must match loopback, configured
HOST(if not a wildcard), orALLOWED_HOSTS - Origin header (when present) is validated against the same allow-list
Rate Limiting
Rate limiting applies to /mcp and /mcp/downloads (100 req/min per IP, 60s window). OPTIONS requests are not rate-limited.
Development
Scripts
| Command | Description |
|---|---|
npm run dev |
Development server with hot reload |
npm run build |
Compile TypeScript |
npm start |
Production server |
npm run lint |
Run ESLint |
npm run lint:fix |
Auto-fix lint issues |
npm run type-check |
TypeScript type checking |
npm run format |
Format with Prettier |
npm test |
Run Node test runner (builds dist) |
npm run test:coverage |
Run tests with experimental coverage |
npm run knip |
Find unused exports/dependencies |
npm run knip:fix |
Auto-fix unused code |
npm run inspector |
Launch MCP Inspector |
Note: Tests run via
node --testwith--experimental-transform-typesto execute.tstest files. Node will emit an experimental warning.
Tech Stack
| Category | Technology |
|---|---|
| Runtime | Node.js >=20.18.1 |
| Language | TypeScript 5.9 |
| MCP SDK | @modelcontextprotocol/sdk ^1.25.2 |
| Content Extraction | @mozilla/readability ^0.6.0 |
| HTML Parsing | linkedom ^0.18.12 |
| Markdown | node-html-markdown ^2.0.0 |
| HTTP | Express ^5.2.1, undici ^7.18.2 |
| Validation | Zod ^4.3.5 |
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Ensure linting passes:
npm run lint - Run tests:
npm test - Commit changes:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
For examples of other MCP servers, see: github.com/modelcontextprotocol/servers
<!-- markdownlint-enable MD033 -->
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。