MCP 服务器

mcp-for-docs

Automatically crawls documentation websites, converts them to organized markdown files, and generates condensed cheat sheets. Intelligently categorizes content into tools/APIs and provides local-first access to downloaded documentation.

README

mcp-for-docs

An MCP (Model Context Protocol) server that automatically downloads and converts documentation from various sources into organized markdown files.

Overview

mcp-for-docs is designed to crawl documentation websites, convert their content to markdown format, and organize them in a structured directory system. It can also generate condensed cheat sheets from the downloaded documentation.

Features

🕷️ Smart Documentation Crawler: Automatically crawls documentation sites with configurable depth
📝 HTML to Markdown Conversion: Preserves code blocks, tables, and formatting
📁 Automatic Categorization: Intelligently organizes docs into tools/APIs categories
📄 Cheat Sheet Generator: Creates condensed reference guides from documentation
🔍 Smart Discovery System: Automatically detects existing documentation before crawling
🚀 Local-First: Uses existing downloaded docs when available
⚡ Rate Limiting: Respects server limits and robots.txt
✅ User Confirmation: Prevents accidental regeneration of existing content
⚙️ Comprehensive Configuration: JSON-based configuration with environment variable overrides
🧪 Test Suite: 94 tests covering core functionality

Installation

Prerequisites

Node.js 18+
npm or yarn
Claude Desktop or Claude Code CLI

Setup

Clone the repository:

git clone https://github.com/shayonpal/mcp-for-docs.git
cd mcp-for-docs

Install dependencies:

npm install

Build the project:

npm run build

Add to your MCP configuration:

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "mcp-for-docs": {
      "command": "node",
      "args": ["/path/to/mcp-for-docs/dist/index.js"],
      "env": {}
    }
  }
}

For Claude Code CLI (~/.claude.json):

{
  "mcpServers": {
    "mcp-for-docs": {
      "command": "node",
      "args": ["/path/to/mcp-for-docs/dist/index.js"],
      "env": {}
    }
  }
}

Usage

Crawling Documentation

To download documentation from a website:

await crawl_documentation({
  url: "https://docs.n8n.io/",
  max_depth: 3,           // Optional, defaults to 3
  force_refresh: false    // Optional, set to true to regenerate existing docs
});

The tool will first check for existing documentation and show you what's already available. To regenerate existing content, use force_refresh: true.

The documentation will be saved to:

Tools: /Users/shayon/DevProjects/~meta/docs/tools/[tool-name]/
APIs: /Users/shayon/DevProjects/~meta/docs/apis/[api-name]/

Generating Cheat Sheets

To create a cheat sheet from documentation:

await generate_cheatsheet({
  url: "https://docs.anthropic.com/",
  use_local: true,          // Use local files if available (default)
  force_regenerate: false   // Optional, set to true to regenerate existing cheatsheets
});

Cheat sheets are saved to: /Users/shayon/DevProjects/~meta/docs/cheatsheets/

The tool will check for existing cheatsheets and show you what's already available. To regenerate existing content, use force_regenerate: true.

Listing Downloaded Documentation

To see what documentation is available locally:

await list_documentation({
  category: "all",  // Options: "tools", "apis", "all"
  include_stats: true
});

Supported Documentation Sites

The server has been tested with:

n8n documentation
Anthropic API docs
Obsidian Tasks plugin docs
Apple Swift documentation

Most documentation sites following standard patterns should work automatically.

Recent Updates

Configuration System (v0.4.0): Added comprehensive JSON-based configuration with environment variable support
Smart Discovery: Automatically finds and reports existing documentation before crawling
Improved Conversion: Fixed HTML to Markdown issues including table formatting and inline code preservation
Dynamic Categorization: Intelligent detection of tools vs APIs based on URL patterns and content analysis
Test Coverage: 94 tests passing with comprehensive unit and integration testing

For detailed changes, see CHANGELOG.md.

Configuration

Initial Setup

Copy the example configuration:

cp config.example.json config.json

Edit config.json and update the docsBasePath for your machine:

{
  "docsBasePath": "/Users/yourusername/path/to/docs"
}

Important: The config.json file is tracked in git. When you clone this repository on a different machine, you'll need to update the docsBasePath to match that machine's directory structure.

How Documentation Organization Works

The tool automatically organizes documentation based on content analysis:

You provide a URL when calling the tool (e.g., https://docs.n8n.io)
The categorizer analyzes the content and determines if it's:
- tools/ - Software tools, applications, plugins
- apis/ - API references, SDK documentation
Documentation is saved to: {docsBasePath}/{category}/{tool-name}/

For example:

https://docs.n8n.io → /Users/shayon/DevProjects/~meta/docs/tools/n8n/
https://docs.anthropic.com → /Users/shayon/DevProjects/~meta/docs/apis/anthropic/

This happens automatically - you don't need to configure anything per-site!

Configuration Options

Setting	Description	Default
`docsBasePath`	Where to store all documentation	Required - no default
`crawler.defaultMaxDepth`	How many levels deep to crawl	3
`crawler.defaultRateLimit`	Requests per second	2
`crawler.pageTimeout`	Page load timeout (ms)	30000
`crawler.userAgent`	Browser identification	MCP-for-docs/1.0
`cheatsheet.maxLength`	Max characters in cheatsheet	10000
`cheatsheet.filenameSuffix`	Append to cheatsheet names	-Cheatsheet.md

Multi-Machine Setup

Since config.json is tracked in git:

First machine: Set your docsBasePath and commit
Other machines: After cloning, update docsBasePath to match that machine
Use environment variable to override without changing the file:
```
export DOCS_BASE_PATH="/different/path/on/this/machine"
```

Development

# Install dependencies
npm install

# Run in development mode
npm run dev

# Run tests
npm test

# Build for production
npm run build

# Lint code
npm run lint

Architecture

Crawler: Uses Playwright for JavaScript-rendered pages
Parser: Extracts content using configurable selectors
Converter: Turndown library with custom rules for markdown
Categorizer: Smart detection of tools vs APIs
Storage: Organized file system structure

Known Issues

URL Structure Preservation (#15): Currently flattens URL structure when saving docs
Large Documentation Sites (#14): No document limit for very large sites
GitHub Repository Docs (#9): Specialized crawler for GitHub repos not yet implemented

See all open issues for the complete roadmap.