MinerU Document Explorer
Enables AI agents to search, deep-read, and build knowledge bases from Markdown, PDF, DOCX, and PPTX documents via MCP tools for retrieval, document navigation, and ingestion.
README
<div align="center">
<h1 align="center"> <img src="assets/logo.png" alt="logo" height="28" style="vertical-align: middle; margin-right: 8px;"> MinerU Document Explorer </h1>
<h4>Agent-native knowledge engine — search, deep-read, and build knowledge bases<br>from Markdown, PDF, DOCX, and PPTX.</h4>
<p> <a href="https://www.npmjs.com/package/mineru-document-explorer"><img src="https://img.shields.io/npm/v/mineru-document-explorer?style=flat-square&color=cb3837" alt="npm"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="license"/></a> <a href="https://github.com/opendatalab/MinerU-Document-Explorer/actions"><img src="https://img.shields.io/github/actions/workflow/status/opendatalab/MinerU-Document-Explorer/ci.yml?style=flat-square&label=CI" alt="CI"/></a> <a href="https://github.com/opendatalab/MinerU-Document-Explorer"><img src="https://img.shields.io/github/stars/opendatalab/MinerU-Document-Explorer?style=flat-square" alt="stars"/></a> </p>
<p> <a href="README-zh.md">中文文档</a> · <a href="docs/mcp.md">MCP Setup</a> · <a href="docs/cli.md">CLI Reference</a> · <a href="demo/">Demo</a> · <a href="CONTRIBUTING.md">Contributing</a> </p>
</div>
🤔 Why MinerU Document Explorer?
MinerU Document Explorer equips your agent with three tool suites — Retrieve, Deep Read, and Ingest — closing the full knowledge loop:

- 🔍 Retrieve — Cross-collection search: BM25, vector, and hybrid with LLM reranking and query expansion
- 📖 Deep Read — Navigate inside a single document without loading the whole file: table of contents, section reading, inline search, and element extraction
- 📝 Ingest — Build and maintain a LLM wiki from raw documents, following the Karpathy LLM Wiki pattern
Developed by the MinerU team, building on QMD and Karpathy's LLM Wiki.
💡 What can you do with it?
- Research assistant — Point it at a folder of PDFs and ask your AI agent to survey the literature, compare methods, or find specific results across papers
- Project knowledge base — Index your codebase, docs, and design specs; let the agent answer questions about architecture, find relevant code, or trace requirements
- Study companion — Upload textbooks or lecture notes; use
doc_toc+doc_readto navigate,doc_grepto search, and build a wiki of key concepts - Enterprise document search — Index contracts, reports, or manuals and search them with natural language queries enhanced by LLM reranking
See it in action: The demo/ folder contains a complete end-to-end example — an AI agent automatically reads ~10 arXiv papers on RAG, builds an interlinked wiki knowledge base, and writes a research survey. See the Demo Guide for step-by-step instructions.
🚀 Quick Start
Agent-Assisted Setup: If you're using an AI agent (Claude Code, Cursor, etc.), simply ask it to help you deploy MinerU Document Explorer and install skills — the agent can handle the entire setup process for you, including MCP configuration.
Follow the https://github.com/opendatalab/MinerU-Document-Explorer/blob/main/docs/quickstart.md to install MinerU Document Explorer and walk the user through configuration.
📖 Document Deep Reading
Navigate and search within a single document without reading the whole file:
# View document structure
qmd doc-toc papers/attention-is-all-you-need.pdf
# Read specific sections by address
qmd doc-read papers/attention-is-all-you-need.pdf "line:45-120"
# Search within one document
qmd doc-grep papers/attention-is-all-you-need.pdf "self-attention"
🔌 MCP Server — 15 Tools for AI Agents
Integrate with AI agents via Model Context Protocol.
MCP Server vs CLI: The MCP server runs as a persistent process — LLM models (embeddings, reranker, query expansion) are loaded once and stay in memory across requests. CLI commands like
qmd querymust reload all models on every invocation, adding ~5–15 s of startup overhead each time. For agent workflows, always prefer the MCP server.
Two transport modes:
| Mode | Command | Best for |
|---|---|---|
| stdio | qmd mcp |
Claude Desktop, Claude Code — client spawns and manages the process |
| HTTP daemon | qmd mcp --http --daemon |
Cursor, Windsurf, VS Code, multi-client setups — one shared persistent server |
# Start the HTTP daemon (recommended — models stay loaded across all requests)
qmd mcp --http --daemon # default port 8181
qmd mcp --http --daemon --port 8080 # custom port
# Verify server is running
curl http://localhost:8181/health
# Stop the daemon
qmd mcp stop
Client Configuration
<details> <summary><b>Cursor</b> — add to <code>.cursor/mcp.json</code> (project) or <code>~/.cursor/mcp.json</code> (global)</summary>
Option A — stdio (Cursor manages the process lifecycle):
{
"mcpServers": {
"qmd": {
"command": "qmd",
"args": ["mcp"]
}
}
}
Option B — HTTP (run qmd mcp --http --daemon first; models stay loaded, faster responses):
{
"mcpServers": {
"qmd": {
"url": "http://localhost:8181/mcp"
}
}
}
</details>
<details> <summary><b>Claude Desktop</b> — add to <code>~/Library/Application Support/Claude/claude_desktop_config.json</code></summary>
{
"mcpServers": {
"qmd": {
"command": "qmd",
"args": ["mcp"]
}
}
}
</details>
<details> <summary><b>Claude Code</b> — add to <code>~/.claude/settings.json</code> or run <code>claude mcp add qmd -- qmd mcp</code></summary>
{
"mcpServers": {
"qmd": {
"command": "qmd",
"args": ["mcp"]
}
}
}
</details>
<details> <summary><b>Windsurf / VS Code / Other MCP Clients</b></summary>
For stdio transport, use "command": "qmd", "args": ["mcp"] in your client's MCP configuration.
For HTTP transport, start qmd mcp --http --daemon and point your client to http://localhost:8181/mcp.
</details>
See MCP setup guide for all 15 tools and HTTP transport details.
Agent Skills
MinerU Document Explorer ships with a built-in Agent Skill that teaches AI agents how to use the full tool suite effectively — decision trees, usage patterns, and best practices for all 15 MCP tools.
# Install the skill (works with both npm and source installs)
qmd skill install # local project (.agents/skills/)
qmd skill install --global # global (~/.agents/skills/)
# Or from source repo
claude skill add ./skills/mineru-document-explorer/SKILL.md
📊 How It Compares
| MinerU Doc Explorer | LlamaIndex | Obsidian | NotebookLM | |
|---|---|---|---|---|
| Runs 100% locally | ✅ | ⚠️ LLM APIs | ✅ | ❌ Cloud |
| Agent integration (MCP) | 15 tools | Plugin | ❌ | ❌ |
| Deep reading within docs | ✅ | ❌ | ❌ | ✅ |
| Wiki knowledge compilation | ✅ | ❌ | Manual | ❌ |
| Formats | MD, PDF, DOCX, PPTX | Many | MD | PDF, URL |
| Search pipeline | BM25 + vec + rerank | Configurable | Basic | Proprietary |
| Zero-config search | ✅ qmd search |
❌ | Plugin | N/A |
| Open source | MIT | MIT | Partial | ❌ |
⚙️ Requirements
| Requirement | Notes |
|---|---|
| Node.js >= 22 or Bun | Runtime |
| Python >= 3.10 | Document processing (pymupdf, python-docx, python-pptx) |
| macOS | brew install sqlite for extension support |
📄 Document Processing Setup
Python 3.10+ is required for document processing (PDF, DOCX, PPTX):
# Check Python version
python3 --version # needs >= 3.10
# Install required Python packages
pip install pymupdf python-docx python-pptx
# Verify
python3 -c "import pymupdf; import docx; import pptx; print('OK')"
<details> <summary><b>MinerU Cloud</b> — high-quality PDF extraction for scanned documents and complex layouts (optional)</summary>
pip install mineru-open-sdk
export MINERU_API_KEY="your-key" # get from https://mineru.net
When MINERU_API_KEY is set, MinerU Cloud is automatically used as the primary PDF provider with PyMuPDF as fallback.
For advanced configuration (custom providers, local VLM models, GPT PageIndex), create ~/.config/qmd/doc-reading.json:
{
"docReading": {
"providers": {
"fullText": { "pdf": ["mineru_cloud", "pymupdf"] }
},
"credentials": {
"mineru": { "api_key": "your-api-key" }
}
}
}
</details>
🤖 LLM Models (auto-downloaded on first use)
| Model | Purpose | Size |
|---|---|---|
| embeddinggemma-300M | Vector embeddings | ~300 MB |
| qwen3-reranker-0.6b | Re-ranking | ~640 MB |
| qmd-query-expansion-1.7B | Query expansion | ~1.1 GB |
Models are only needed for
qmd embed,qmd vsearch, andqmd query.qmd searchruns BM25 retrieval.
📚 Documentation
| 🎯 Demo Guide | End-to-end example: agent-driven RAG research survey |
| 📖 CLI Reference | All commands, options, output formats |
| 🔌 MCP Server | Setup, 15 tools, HTTP transport |
| 📦 SDK / Library | TypeScript API, types, examples |
| 🏗️ Architecture | Search pipeline, scoring, data schema, chunking |
| 🤝 Contributing | Development setup, code style, how to contribute |
❤️ Acknowledgments
MinerU Document Explorer builds upon these foundational projects:
- QMD by Tobi Lutke — An on-device search engine and CLI toolkit for markdown documents
- LLM Wiki by Andrej Karpathy — the conceptual pattern for LLM-maintained knowledge bases
- MinerU by OpenDataLab — high-quality document parsing and extraction
📝 Changelog
v1 — 2026-04-07 (Current)
Rebuilt from an OpenClaw agent skill into a full agent-native knowledge engine: npm package (npm install -g mineru-document-explorer), qmd CLI, MCP server with 15 tools across three groups (Retrieval / Deep Reading / Knowledge Ingestion), multi-format support (MD, PDF, DOCX, PPTX), hybrid search (BM25 + vector + LLM reranking), and LLM Wiki knowledge base pattern.
v0 — 2026-03-30 (Previous)
OpenClaw-native agent skill (doc-search CLI). Four capabilities: Logic Retrieval, Semantic Retrieval, Keyword Retrieval, Evidence Extraction. See the v0 repository.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。