DocNav-MCP
DocNav is a Model Context Protocol (MCP) server which empowers LLM Agents to read, analyze, and manage lengthy documents intelligently, mimicking human-like comprehension and navigation capabilities. Available Tools - load_document: Load a document for navigation and analysis - Args: `fi
README
DocNav MCP Server
DocNav is a Model Context Protocol (MCP) server which empowers LLM Agents to read, analyze, and manage lengthy documents intelligently, mimicking human-like comprehension and navigation capabilities.
Features
- Document Navigation: Navigate through document sections, headings, and content structure
- Content Extraction: Extract and summarize specific document sections
- Search & Query: Find specific content within documents using intelligent search
- Multi-format Support: Currently supports Markdown (.md) files, with planned support for PDF and other formats
- MCP Integration: Seamless integration with MCP-compatible LLMs and applications
Architecture
DocNav follows a modular, extensible architecture:
- Core MCP Server: Main server implementation using the MCP protocol
- Document Processors: Pluggable processors for different file types
- Navigation Engine: Handles document structure analysis and navigation
- Content Extractors: Extract and format content from documents
- Search Engine: Provides search and query capabilities across documents
Installation
Prerequisites
- Python 3.10+
- uv package manager
Setup
- Clone the repository:
git clone https://github.com/shenyimings/DocNav-MCP.git
cd DocNav-MCP
- Install dependencies:
uv sync
Usage
Starting the MCP Server
uv run server.py
Connect to the MCP server
{
"mcpServers": {
"docnav": {
"command": "{{PATH_TO_UV}}", // Run `which uv` and place the output here
"args": [
"--directory",
"{{PATH_TO_SRC}}",
"run",
"server.py"
]
}
}
}
Available Tools
-
load_document: Load a document for navigation and analysis- Args:
file_path(path to document file) - Returns: Success message with auto-generated document ID
- Args:
-
get_outline: Get document outline/table of contents- Args:
doc_id(document identifier),max_depth(max heading depth, default 3) - Returns: Formatted document outline
- Tip: Use first after loading a document to understand structure
- Args:
-
read_section: Read content of a specific document section- Args:
doc_id(document identifier),section_id(e.g., 'h1_0', 'h2_1') - Returns: Section content with subsections
- Args:
-
search_document: Search for specific content within a document- Args:
doc_id(document identifier),query(search term or phrase) - Returns: Formatted search results with context
- Args:
-
navigate_section: Get navigation context for a section- Args:
doc_id(document identifier),section_id(section to navigate to) - Returns: Navigation context with parent, siblings, children
- Args:
-
list_documents: List all currently loaded documents- Returns: List of loaded documents with metadata
-
get_document_stats: Get statistics about a loaded document- Args:
doc_id(document identifier) - Returns: Document statistics and structure info
- Args:
-
remove_document: Remove a document from the navigator- Args:
doc_id(document identifier) - Returns: Success or error message
- Args:
Example Usage
# Load a document
result = await tools.load_document("path/to/document.md")
# Get document outline
outline = await tools.get_outline(doc_id)
# Get specific section content
section = await tools.read_section(doc_id, section_id)
# Search within document
results = await tools.search_document(doc_id, "search query")
Development
Project Structure
docnav-mcp/
--- server.py # Main MCP server
--- docnav/
------- __init__.py # Package initialization
------- models.py # Data models
------- navigator.py # Document navigation engine
------- processors/
------- __init__.py # Processor package
------- base.py # Base processor interface
------- markdown.py # Markdown processor
--- tests/
------- ... # Test files
Development Guidelines
See CLAUDE.md for detailed development guidelines including:
- Code quality standards
- Testing requirements
- Package management with uv
- Formatting and linting rules
Adding New Document Processors
- Create a new processor class inheriting from
BaseProcessor - Implement the required methods:
can_process,process,extract_section,search - Register the processor in the
DocumentNavigator - Add comprehensive tests
Running Tests
# Run all tests
uv run tests/run_tests.py
Code Quality
# Format code
uv run --frozen ruff format .
# Check linting
uv run --frozen ruff check .
# Type checking
uv run --frozen pyright
Roadmap
- [x] Complete Markdown processor implementation
- [x] Add PDF document support (PyMuPDF)
- [x] Improve test coverage and quality
- [ ] Implement advanced search capabilities
- [ ] Add document summarization features
- [ ] Support for additional document formats (DOCX, TXT, etc.)
- [ ] Performance optimizations for large documents
- [ ] Caching mechanisms for frequently accessed documents
- [ ] Add persistent storage for loaded documents
Contributing
- Fork the repository
- Create a feature branch
- Follow the development guidelines in CLAUDE.md
- Add tests for new functionality
- Submit a pull request
License
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.
Support
For issues and questions:
- Open an issue on GitHub
- Check the documentation in CLAUDE.md
- Review existing issues and discussions
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。