Website to Markdown MCP Server

Website to Markdown MCP Server

Fetches website content and converts it to Markdown format with AI-powered content cleanup, ad removal, and full OpenAPI/Swagger specification support for easy processing by AI assistants.

Category
访问服务器

README

🌐 Website to Markdown MCP Server

<div align="center">

Language: English | 繁體中文

</div>

A powerful Model Context Protocol (MCP) server designed for fetching website content and converting it to Markdown format, making it easier for AI to understand and process website information.

✨ Key Features

<div align="center">

🌟 Enhanced Processing 📊 OpenAPI Support ⚙️ Smart Analysis 🎯 Advanced Extraction
AI-powered content cleanup OpenAPI 3.x/Swagger 2.0 Reading time calculation Main content detection
Auto ad removal Professional validation Word count statistics Language detection
Content summarization Structured API parsing Smart retry mechanism Multi-format support

</div>


🆕 What's New in v1.2.0

<div align="center">

🚀 Major Enhancements

</div>

Feature Status Description
🧠 Enhanced Content Processor AI-powered content cleaning and extraction
📊 Smart Analytics Word count, reading time, content summary
🌍 Language Detection Automatic language identification
🎯 Intelligent Retry Smart retry mechanism with exponential backoff
🔍 Stealth Browser Anti-detection browsing capabilities
Rate Limiting Built-in rate limiting and concurrency control
🧹 Content Cleanup Remove ads, navigation, and irrelevant content
📝 Enhanced Markdown Support for strikethrough, underline, highlights

🚀 Quick Start

🎯 Method 1: NPX Installation (🌟 Recommended)

💡 Easiest way: No local installation needed!

Step 1: Create Configuration File 📄

Create a my-websites.json file:

{
  "websites": [
    {
      "name": "your_website",
      "url": "https://your-website.com",
      "description": "Your Project Website"
    },
    {
      "name": "api_docs",
      "url": "https://api.example.com/openapi.json",
      "description": "Your API Specification"
    }
  ]
}

Step 2: Configure MCP Server ⚙️

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "npx",
      "args": ["-y", "website-to-markdown-mcp"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}

Step 3: Restart and Test 🔄

  1. Restart Cursor
  2. Open Chat and use Agent mode
  3. Test command: Please list all configured websites

<div align="center">

🎉 Done! No installation required!

</div>


🎯 Method 2: Local Installation

💡 Best Practice: Use this method for development or customization!

Step 1: Clone and Build

git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
npm install
npm run build

Step 2: Configure MCP Server

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}

🔥 Enhanced Output Features

📊 Rich Content Analysis

Every fetched content now includes:

  • 📝 Content Summary: AI-generated summary of the main content
  • ⏱️ Reading Time: Estimated reading time based on content length
  • 🔢 Word Count: Accurate word count for both English and Chinese
  • 🌍 Language Detection: Automatic language identification
  • 🎯 Content Quality Score: Assessment of content relevance

📋 Enhanced Markdown Output

# 🚀 Example Website

**Source**: https://example.com
**Website**: example_site - Example Website
**📊 Reading Time**: 5 minutes
**🔢 Word Count**: 1,250 words
**🌍 Language**: English
**📝 Summary**: This article discusses the latest developments in web technology...

---

[Enhanced Markdown content with better formatting...]

🆕 Complete OpenAPI/Swagger Support

<div align="center">

🔥 Professional API Documentation

</div>

Feature OpenAPI 3.x Swagger 2.0 Description
🔍 Auto Detection Support JSON/YAML formats
Professional Validation Using @readme/openapi-parser
📋 Structured Parsing Endpoints, parameters, responses
🔗 Reference Resolution Auto handle $ref references
📊 Smart Summary Generate API overview
📝 Formatted Output Readable Markdown

🌟 Pre-configured Example Websites

{
  "websites": [
    {
      "name": "petstore_openapi",
      "url": "https://petstore3.swagger.io/api/v3/openapi.json",
      "description": "🐕 Swagger Petstore OpenAPI 3.0 Spec (Demo)"
    },
    {
      "name": "petstore_swagger",
      "url": "https://petstore.swagger.io/v2/swagger.json",
      "description": "🐱 Swagger Petstore Swagger 2.0 Spec (Demo)"
    },
    {
      "name": "github_api",
      "url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json",
      "description": "🐙 GitHub REST API OpenAPI Spec"
    }
  ]
}

📦 Installation & Setup

🛠️ System Requirements

  • Node.js 20.18.1+ (Recommended: v22.15.0 LTS)
  • npm 10.0.0+ or yarn
  • Cursor Editor

⚠️ Important: Some dependencies require Node.js v20.18.1 or higher. Please update your Node.js version if you encounter engine compatibility warnings.

⚡ NPM Package Installation

# Global installation
npm install -g website-to-markdown-mcp

# Or use directly with npx (recommended)
npx website-to-markdown-mcp

🔧 Development Setup

# 1. Clone repository
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp

# 2. Install dependencies
npm install

# 3. Build project
npm run build

🎛️ Advanced Configuration Options

<div align="center">

Configuration Priority Order

</div>

graph TD
    A[🔍 Check Environment Variable<br/>WEBSITES_CONFIG_PATH] --> B{File exists?}
    B -->|Yes| C[✅ Load External Config File]
    B -->|No| D[🔍 Check Environment Variable<br/>WEBSITES_CONFIG]
    D --> E{Valid JSON?}
    E -->|Yes| F[✅ Load Embedded Config]
    E -->|No| G[🔍 Check config.json]
    G --> H{File exists?}
    H -->|Yes| I[✅ Load Local Config]
    H -->|No| J[🔧 Use Default Config]

🎨 Configuration Method Details

📋 Method 1: External Configuration File (🌟 Recommended)

💡 Advantages: Easy to edit, syntax highlighting, version control friendly

<details> <summary><b>🔧 Detailed Setup Steps</b></summary>

  1. Create Configuration File

    # Can be placed anywhere
    touch my-api-configs.json
    
  2. Edit Configuration Content

    {
      "websites": [
        {
          "name": "my_docs",
          "url": "https://docs.example.com",
          "description": "📚 My Documentation Website"
        }
      ]
    }
    
  3. Set Environment Variable

    {
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-api-configs.json"
      }
    }
    

</details>

📋 Method 2: Embedded JSON (Backward Compatible)

<details> <summary><b>🔧 Configuration Example</b></summary>

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG": "{\"websites\":[{\"name\":\"example\",\"url\":\"https://example.com\",\"description\":\"Example Website\"}]}"
      }
    }
  }
}

</details>

📋 Method 3: Local config.json

<details> <summary><b>🔧 Local Configuration</b></summary>

Directly edit config.json in the project root directory:

{
  "websites": [
    {
      "name": "local_site",
      "url": "https://local.example.com",
      "description": "🏠 Local Test Website"
    }
  ]
}

</details>


🔧 Available Tools

🌐 General Tools

Tool Name Function Parameters Example
fetch_website Fetch any website url: Website URL Fetch OpenAPI spec files
list_configured_websites List configured websites None View all available websites

🎯 Dedicated Tools

Each configured website automatically generates corresponding dedicated tools:

  • fetch_petstore_openapi - Fetch Petstore OpenAPI 3.0 spec
  • fetch_petstore_swagger - Fetch Petstore Swagger 2.0 spec
  • fetch_github_api - Fetch GitHub API spec
  • fetch_tailwind_css - Fetch Tailwind CSS documentation

📊 Enhanced Output Format Examples

🌐 General Website Content with Analytics

# Website Title

**Source**: https://example.com
**Website**: example_site - Example Website
**📊 Reading Time**: 3 minutes
**🔢 Word Count**: 650 words
**🌍 Language**: English
**📝 Summary**: This article provides a comprehensive overview of modern web development practices, covering frontend frameworks, backend technologies, and deployment strategies.

---

[Enhanced cleaned Markdown content with ads removed and main content extracted...]

📋 OpenAPI 3.x Specification File

# 🚀 Example API (v2.1.0)

**Source**: https://api.example.com/openapi.json
**OpenAPI Version**: 3.0.3
**Validation Status**: ✅ Valid
**📊 Processing Time**: 1.2 seconds
**🔢 Endpoints**: 25 endpoints
**🌍 Server Locations**: 3 servers

---

## 📋 API Basic Information

- **API Name**: Example API
- **Version**: 2.1.0
- **OpenAPI Version**: 3.0.3
- **Description**: A powerful example API for modern applications

## 🌐 Servers

1. **https://api.example.com**
   - 🏢 Production server
2. **https://staging-api.example.com**
   - 🧪 Testing server

## 🛠️ API Endpoints

Total of **25** endpoints:

### 👥 `/users`
- **GET**: Get user list
- **POST**: Create new user

### 🔍 `/users/{id}`
- **GET**: Get specific user
- **PUT**: Update user information
- **DELETE**: Delete user

## 🧩 Components

- **Schemas**: 12 data models
- **Parameters**: 8 reusable parameters  
- **Responses**: 15 reusable responses
- **Security Schemes**: 3 security mechanisms

🎯 Usage Examples

💻 Basic Usage

Please fetch the content from https://docs.example.com and convert to markdown

🔍 OpenAPI Specification Fetching

Please use the fetch_petstore_openapi tool to fetch Petstore OpenAPI specification

📚 Documentation Website Fetching

Please fetch React official documentation content

🚨 Troubleshooting

📋 Complete Troubleshooting Guide: See TROUBLESHOOTING.md for detailed solutions to common issues.

❓ Quick Solutions

<details> <summary><b>🔧 Node.js Version Issues</b></summary>

Error: npm WARN EBADENGINE Unsupported engine

</details>

<details> <summary><b>🌐 Module Not Found Issues</b></summary>

Error: Cannot find module './db.json'

  • Solution 1: Clear npm cache: npm cache clean --force
  • Solution 2: Update Node.js version
  • Solution 3: Use local installation instead of npx

</details>

<details> <summary><b>⚙️ Configuration Issues</b></summary>

Q: Configuration changes not taking effect?

  • ✅ Confirm JSON format is correct
  • ✅ Restart Cursor
  • ✅ Check environment variable names

Q: JSON format errors?

  • 🛠️ Use JSON Validator
  • 🛠️ Confirm using double quotes
  • 🛠️ Check for extra commas

</details>

🔍 Debug Mode

Detailed logs are output to stderr at startup:

# View debug messages
npm run dev 2> debug.log

📈 Performance & Optimization

⚡ Performance Features

  • 🚀 Smart Retry: Intelligent retry with exponential backoff
  • 💾 Rate Limiting: Built-in rate limiting to prevent overload
  • 🎯 Content Filtering: Remove irrelevant content for faster processing
  • 🧹 Ad Removal: Automatic ad and popup removal
  • 📊 Stealth Mode: Anti-detection browsing capabilities

🛡️ Security Considerations

  • 🔒 HTTPS websites only (recommended)
  • 🛠️ Auto filter malicious scripts
  • 📝 Limit output content length
  • 🔐 Stealth browsing to avoid detection

📦 Dependencies

<div align="center">

Package Version Purpose
@modelcontextprotocol/sdk ^1.0.0 MCP Core Framework
@readme/openapi-parser ^4.1.0 Professional OpenAPI Parsing
axios ^1.6.0 HTTP Request Handling
cheerio ^1.0.0 HTML Parsing Engine
turndown ^7.1.2 HTML to Markdown
yaml ^2.8.0 YAML Format Support
zod ^3.22.0 Data Validation Framework
playwright ^1.40.0 Browser automation

</div>


📝 Changelog

🎉 v1.2.0 (Latest)

<div align="center">

🚀 Major Feature Updates

</div>

  • Added Enhanced content processing with AI-powered cleanup
  • Added Smart analytics: word count, reading time, content summary
  • Added Language detection and multi-language support
  • Added Stealth browser capabilities for anti-detection
  • Added Built-in rate limiting and retry mechanisms
  • Added Advanced content filtering and ad removal
  • 🔧 Enhanced Markdown processing with more HTML element support
  • 📊 Improved Output format with rich metadata
  • 🎯 Fixed Various technical issues and dependencies

🎯 v1.1.0 (Previous)

<div align="center">

🚀 Major Feature Updates

</div>

  • Added Full OpenAPI 3.x/Swagger 2.0 support
  • Added JSON/YAML format auto-detection
  • Added Professional-grade spec validation and reference resolution
  • Added Version auto-adaptation mechanism
  • Added Structured API documentation summary
  • 🔧 Pre-configured Multiple OpenAPI/Swagger examples
  • 📦 Added NPM package distribution with npx support
  • 🎯 Enhanced Installation methods for better user experience

🎯 v1.0.0 (Stable)

  • 🎉 Initial Release
  • 🌐 Basic Functions Website content fetching
  • 📝 Core Functions Markdown conversion
  • ⚙️ Configuration Support Multi-website management

🤝 Contributing

💡 How to Contribute

  1. 🍴 Fork this project
  2. 🌟 Create feature branch (git checkout -b feature/AmazingFeature)
  3. 📝 Commit changes (git commit -m 'Add some AmazingFeature')
  4. 📤 Push to branch (git push origin feature/AmazingFeature)
  5. 🔄 Open Pull Request

🐛 Issue Reporting

Report issues on the Issues page, please include:

  • 🔍 Issue Description
  • 🔄 Reproduction Steps
  • 💻 Environment Information
  • 📸 Screenshots or Logs

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


<div align="center">

🌟 If this project helps you, please give it a Star!

💬 Have questions or suggestions? Feel free to open an Issue!


Made by Sun ❤️ for the Developer Community

</div>

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选