MCP 服务器

MCP Webscan Server

启用网络内容扫描和分析，通过使用页面抓取、链接提取、站点爬行等工具，从网页中获取、分析和提取信息。

内容获取

Tools

extract-links

Extract and analyze all hyperlinks from a web page, organizing them into a structured format with URLs, anchor text, and contextual information. Performance-optimized with stream processing and worker threads for efficient handling of large pages. Works with either a direct URL or raw HTML content. Handles relative and absolute URLs properly by supporting an optional base URL parameter. Results can be limited to prevent overwhelming output for link-dense pages. Returns a comprehensive link inventory that includes destination URLs, link text, titles (if available), and whether links are internal or external to the source domain. Useful for site mapping, content analysis, broken link checking, SEO analysis, and as a preparatory step for targeted crawling operations.

crawl-site

Crawl a website and return a list of all the URLs found

check-links

Check for broken links on a page

fetch-page

Fetch a web page and convert it to Markdown

find-patterns

Find all links that match a given pattern

generate-site-map

Generate a sitemap for a website

README

MCP Webscan 服务器

一个用于网页内容扫描和分析的模型上下文协议 (MCP) 服务器。此服务器提供用于获取、分析和提取网页信息的工具。

功能特性

页面获取: 将网页转换为 Markdown 格式，便于分析
链接提取: 提取和分析网页中的链接
站点爬取: 递归地爬取网站以发现内容
链接检查: 识别网页上的无效链接
模式匹配: 查找符合特定模式的 URL
站点地图生成: 为网站生成 XML 站点地图

安装

通过 Smithery 安装

要通过 Smithery 自动为 Claude Desktop 安装 Webscan：

npx -y @smithery/cli install mcp-server-webscan --client claude

手动安装

# 克隆仓库
git clone <repository-url>
cd mcp-server-webscan

# 安装依赖
npm install

# 构建项目
npm run build

使用方法

启动服务器

npm start

该服务器运行在 stdio 传输上，使其与 Claude Desktop 等 MCP 客户端兼容。

可用工具

fetch-page
- 获取网页并将其转换为 Markdown 格式。
- 参数：
  - url (必需): 要获取的页面的 URL。
  - selector (可选): 用于定位特定内容的 CSS 选择器。
extract-links
- 提取网页中的所有链接及其文本。
- 参数：
  - url (必需): 要分析的页面的 URL。
  - baseUrl (可选): 用于过滤链接的基本 URL。
  - limit (可选, 默认: 100): 要返回的最大链接数。
crawl-site
- 递归地爬取网站，直到指定的深度。
- 参数：
  - url (必需): 要爬取的起始 URL。
  - maxDepth (可选, 默认: 2): 最大爬取深度 (0-5)。
check-links
- 检查页面上的无效链接。
- 参数：
  - url (必需): 要检查链接的 URL。
find-patterns
- 查找符合特定模式的 URL。
- 参数：
  - url (必需): 要搜索的 URL。
  - pattern (必需): 用于匹配 URL 的 JavaScript 兼容的正则表达式模式。
generate-site-map
- 通过爬取生成一个简单的 XML 站点地图。
- 参数：
  - url (必需): 站点地图爬取的根 URL。
  - maxDepth (可选, 默认: 2): 用于发现 URL 的最大爬取深度 (0-5)。
  - limit (可选, 默认: 1000): 要包含在站点地图中的最大 URL 数。

与 Claude Desktop 的示例用法

在 Claude Desktop 设置中配置服务器：

{
  "mcpServers": {
    "webscan": {
      "command": "node",
      "args": ["path/to/mcp-server-webscan/build/index.js"], // 已更正的路径
      "env": {
        "NODE_ENV": "development",
        "LOG_LEVEL": "info" // 示例：通过 env var 设置日志级别
      }
    }
  }
}

在您的对话中使用这些工具：

你能从 https://example.com 获取内容并将其转换为 Markdown 格式吗？

开发

前提条件

Node.js >= 18
npm

项目结构 (重构后)

mcp-server-webscan/
├── src/
│   ├── config/
│   │   └── ConfigurationManager.ts
│   ├── services/
│   │   ├── CheckLinksService.ts
│   │   ├── CrawlSiteService.ts
│   │   ├── ExtractLinksService.ts
│   │   ├── FetchPageService.ts
│   │   ├── FindPatternsService.ts
│   │   ├── GenerateSitemapService.ts
│   │   └── index.ts
│   ├── tools/
│   │   ├── checkLinksTool.ts
│   │   ├── checkLinksToolParams.ts
│   │   ├── crawlSiteTool.ts
│   │   ├── crawlSiteToolParams.ts
│   │   ├── extractLinksTool.ts
│   │   ├── extractLinksToolParams.ts
│   │   ├── fetchPageTool.ts
│   │   ├── fetchPageToolParams.ts
│   │   ├── findPatterns.ts
│   │   ├── findPatternsToolParams.ts
│   │   ├── generateSitemapTool.ts
│   │   ├── generateSitemapToolParams.ts
│   │   └── index.ts
│   ├── types/
│   │   ├── checkLinksTypes.ts
│   │   ├── crawlSiteTypes.ts
│   │   ├── extractLinksTypes.ts
│   │   ├── fetchPageTypes.ts
│   │   ├── findPatternsTypes.ts
│   │   ├── generateSitemapTypes.ts
│   │   └── index.ts
│   ├── utils/
│   │   ├── errors.ts
│   │   ├── index.ts
│   │   ├── logger.ts
│   │   ├── markdownConverter.ts
│   │   └── webUtils.ts
│   ├── initialize.ts
│   └── index.ts    # 主服务器入口点
├── build/          # 编译后的 JavaScript (已更正)
├── node_modules/
├── .clinerules
├── .gitignore
├── Dockerfile
├── LICENSE
├── mcp-consistant-servers-guide.md
├── package.json
├── package-lock.json
├── README.md
├── RFC-2025-001-Refactor.md
├── smithery.yaml
└── tsconfig.json