Web-curl MCP Server

Web-curl MCP Server

A powerful tool for fetching and extracting text content from web pages and APIs, supporting web scraping, REST API requests, and Google Custom Search integration.

Category
访问服务器

README

Google Custom Search API

Google Custom Search API is free with usage limits (e.g., 100 queries per day for free, with additional queries requiring payment). For full details on quotas, pricing, and restrictions, see the official documentation.

Web-curl

Web-curl Logo

Developed by Rayss

🚀 Open Source Project
🛠️ Built with Node.js & TypeScript (Node.js v18+ required)


Node.js License Status


🎬 Demo Video

<video src="demo/demo.mp4" controls width="600"></video>

Watch the demo

<details> <summary>Click to watch the demo directly in your browser</summary>

Demo Video (MP4)

</details>


📚 Table of Contents


<a name="overview"></a>

📝 Overview

Web-curl is a powerful tool for fetching and extracting text content from web pages and APIs. Use it as a standalone CLI or as an MCP (Model Context Protocol) server. Web-curl leverages Puppeteer for robust web scraping and supports advanced features such as resource blocking, custom headers, authentication, and Google Custom Search.


<a name="features"></a>

✨ Features

  • 🔎 Retrieve text content from any website.
  • 🚫 Block unnecessary resources (images, stylesheets, fonts) for faster loading.
  • ⏱️ Set navigation timeouts and content extraction limits.
  • 💾 Output results to stdout or save to a file.
  • 🖥️ Use as a CLI tool or as an MCP server.
  • 🌐 Make REST API requests with custom methods, headers, and bodies.
  • 🔍 Integrate Google Custom Search (requires API key and CX).
  • 🤖 Smart command parsing (auto-detects URLs and search queries).
  • 🛡️ Detailed error logging and robust error handling.

<a name="architecture"></a>

🏗️ Architecture

  • CLI & MCP Server: src/index.ts
    Implements both the CLI entry point and the MCP server, exposing tools like fetch_webpage, fetch_api, google_search, and smart_command.
  • Web Scraping: Uses Puppeteer for headless browsing, resource blocking, and content extraction.
  • REST Client: src/rest-client.ts
    Provides a flexible HTTP client for API requests, used by both CLI and MCP tools.
  • Configuration: Managed via CLI options, environment variables, and tool arguments.

<a name="installation"></a>

⚙️ MCP Server Configuration Example

To integrate web-curl as an MCP server, add the following configuration to your mcp_settings.json:

{
  "mcpServers": {
    "web-curl": {
      "command": "node",
      "args": [
        "build/index.js"
      ],
      "disabled": false,
      "alwaysAllow": [
        "fetch_webpage",
        "fetch_api",
        "google_search",
        "smart_command"
      ],
      "env": {
        "APIKEY_GOOGLE_SEARCH": "YOUR_GOOGLE_API_KEY",
        "CX_GOOGLE_SEARCH": "YOUR_CX_ID"
      }
    }
  }
}

🔑 How to Obtain Google API Key and CX

  1. Get a Google API Key:

    • Go to Google Cloud Console.
    • Create/select a project, then go to APIs & Services > Credentials.
    • Click Create Credentials > API key and copy it.
  2. Get a Custom Search Engine (CX) ID:

  3. Enable Custom Search API:

    • In Google Cloud Console, go to APIs & Services > Library.
    • Search for Custom Search API and enable it.

Replace YOUR_GOOGLE_API_KEY and YOUR_CX_ID in the config above.


<a name="installation"></a>

🛠️ Installation

# Clone the repository
git clone <repository-url>
cd web-curl

# Install dependencies
npm install

# Build the project
npm run build
### Puppeteer installation notes

- **Windows:** Just run `npm install`.
- **Linux:** You must install extra dependencies for Chromium. Run:
  ```bash
  sudo apt-get install -y \
    ca-certificates fonts-liberation libappindicator3-1 libasound2 libatk-bridge2.0-0 \
    libatk1.0-0 libcups2 libdbus-1-3 libdrm2 libgbm1 libnspr4 libnss3 \
    libx11-xcb1 libxcomposite1 libxdamage1 libxrandr2 xdg-utils

For more details, see the Puppeteer troubleshooting guide.


---

<a name="usage"></a>
## 🚀 Usage

### CLI Usage

The CLI supports fetching and extracting text content from web pages.

```bash
# Basic usage
node build/index.js https://example.com

# With options
node build/index.js --timeout 30000 --no-block-resources https://example.com

# Save output to a file
node build/index.js -o result.json https://example.com

Command Line Options

  • --timeout <ms>: Set navigation timeout (default: 60000)
  • --no-block-resources: Disable blocking of images, stylesheets, and fonts
  • -o <file>: Output result to specified file

MCP Server Usage

Web-curl can be run as an MCP server for integration with Roo Code or other MCP-compatible platforms.

Exposed Tools

  • fetch_webpage: Retrieve text content from a web page
  • fetch_api: Make REST API requests
  • google_search: Search the web using Google Custom Search API
  • smart_command: Accepts natural language commands and auto-routes to the appropriate tool

Running as MCP Server

npm run start

The server communicates via stdio and exposes tools as defined in src/index.ts.

MCP Tool Example (fetch_webpage)

{
  "name": "fetch_webpage",
  "arguments": {
    "url": "https://example.com",
    "blockResources": true,
    "timeout": 60000,
    "maxLength": 10000
  }
}

Google Search Integration

Set the following environment variables for Google Custom Search:

  • APIKEY_GOOGLE_SEARCH: Your Google API key
  • CX_GOOGLE_SEARCH: Your Custom Search Engine ID

<a name="configuration"></a>

🧩 Configuration

  • Resource Blocking: Block images, stylesheets, and fonts for faster scraping.
  • Timeouts: Set navigation and API request timeouts.
  • Custom Headers: Pass custom HTTP headers for advanced scenarios.
  • Authentication: Supports HTTP Basic Auth via username/password.
  • Environment Variables: Used for Google Search API integration.

<a name="examples"></a>

💡 Examples

<details> <summary>Fetch Webpage Content</summary>

{
  "name": "fetch_webpage",
  "arguments": {
    "url": "https://en.wikipedia.org/wiki/Web_scraping",
    "blockResources": true,
    "maxLength": 5000
  }
}

</details>

<details> <summary>Make a REST API Request</summary>

{
  "name": "fetch_api",
  "arguments": {
    "url": "https://api.github.com/repos/nodejs/node",
    "method": "GET",
    "headers": {
      "Accept": "application/vnd.github.v3+json"
    }
  }
}

</details>

<details> <summary>Google Search</summary>

{
  "name": "google_search",
  "arguments": {
    "query": "web scraping best practices",
    "num": 5
  }
}

</details>


<a name="troubleshooting"></a>

🛠️ Troubleshooting

  • Timeout Errors: Increase the timeout parameter if requests are timing out.
  • Blocked Content: If content is missing, try disabling resource blocking or adjusting resourceTypesToBlock.
  • Google Search Fails: Ensure APIKEY_GOOGLE_SEARCH and CX_GOOGLE_SEARCH are set in your environment.
  • Binary/Unknown Content: Non-text responses are base64-encoded.
  • Error Logs: Check the logs/error-log.txt file for detailed error messages.

<a name="tips--best-practices"></a>

🧠 Tips & Best Practices

<details> <summary>Click for advanced tips</summary>

  • Use resource blocking for faster and lighter scraping unless you need images or styles.
  • For large pages, use maxLength and startIndex to paginate content extraction.
  • Always validate your tool arguments to avoid errors.
  • Secure your API keys and sensitive data using environment variables.
  • Review the MCP tool schemas in src/index.ts for all available options.

</details>


<a name="contributing--issues"></a>

🤝 Contributing & Issues

Contributions are welcome! If you want to contribute, fork this repository and submit a pull request.
If you find any issues or have suggestions, please open an issue on the repository page.


<a name="license--attribution"></a>

📄 License & Attribution

This project was developed by Rayss.
For questions, improvements, or contributions, please contact the author or open an issue in the repository.


Note: Google Search API is free with usage limits. For details, see: Google Custom Search API Overview

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选