PDF Agent MCP
Enables AI agents to efficiently process large local and online PDFs through selective extraction of text, images, and metadata. It provides tools for content search and document outline navigation to optimize context window usage.
README
PDF Agent MCP
🌐 Visit the Landing Page for an overview and easy download
⚠️ Important Setup Instructions
Before using this extension, you MUST configure Claude Desktop properly:
Required Configuration
- Install Node.js LTS: Visit nodejs.org and download the LTS version
- Configure Claude Desktop:
- Go to Claude > Settings > Extensions > Advanced Settings
- Disable "Use Built-in Node.js for MCP"
- Restart Claude Desktop
This extension will NOT work with Claude's built-in Node.js. You must use your system's Node.js installation.
Troubleshooting
If you experience issues loading the extension:
- Verify Node.js is installed: Run
node --versionin your terminal - Ensure "Use Built-in Node.js for MCP" is disabled in Claude Desktop settings
- Restart Claude Desktop completely
- Check the logs at
~/Library/Logs/Claude/mcp-server-PDF Agent MCP.log(macOS) or%LOCALAPPDATA%\Claude\Logs\mcp-server-PDF Agent MCP.log(Windows)
A Model Context Protocol server designed for agentic reading and selective PDF processing. Enables AI systems to efficiently navigate and extract content from PDFs without overwhelming context windows.
Features
- Metadata Extraction: Get PDF properties, page count, and file information
- Text Extraction: Native text extraction with hybrid processing for better results
- Image Conversion: Convert PDF pages to optimized images for visual analysis
- Content Search: Pattern/regex search with context snippets
- Table of Contents: Extract bookmarks and document outline
- Flexible Path Support: Use absolute paths or relative paths from
~/pdf-agent/
Usage Guide
PDF Agent MCP solves the common problem of context window overflow when working with PDFs in AI tools.
Important: Do not drag PDFs into the chat - this will load the entire PDF content traditionally and bypass the intelligent processing. Instead, provide file paths or URLs to activate the PDF Agent tools for selective processing.
How to Use
For Local PDFs:
- Provide the absolute file path to your PDF
- Quick tip: Right-click your PDF → "Open with Chrome" → copy the address bar URL for the absolute path
For Online PDFs:
- Simply provide the PDF URL - the agent will download and process it locally
Key Benefits
- Selective Reading: The AI first examines metadata and outline, then opens only relevant pages
- Token Efficiency: Avoids images when possible, uses them only when necessary for visual analysis
- Scalable: Works with large documents (1000+ page textbooks) and multiple PDFs simultaneously
- Search Capability: Built-in pattern/regex search across PDF content
Approach
This MCP uses agentic search with simple tools rather than complex alternatives:
- No embedding creation, chunking, or vector storage required
- No multi-agent coordination or handoff complexity
- Just clean, effective tools that modern AI systems can use intelligently
Perfect for researchers, students, and professionals working with extensive PDF libraries.
AI Assistant Prompt for Optimal Usage
Copy this prompt into your AI assistant's custom instructions or context for best results:
When working with PDFs using the PDF Agent MCP tools, follow this strategic approach:
### 1. Query Analysis & PDF Identification
- **Think carefully** about the user's search query and information needs
- **Identify which PDF(s)** are most likely to contain the answer
- Consider the document type, domain, and likely structure based on the query
### 2. Exploratory Phase (Always Start Here)
- **Get metadata** first using `get_pdf_metadata` to understand document size, creation date, and properties
- **Extract table of contents** with `get_pdf_outline` to understand document structure and navigation
- **Analyze the outline** to identify which sections are most relevant to the query
### 3. Strategic Content Extraction
Based on the outline and metadata:
- **Use page ranges** (`"5:10"`, `"20:"`) to focus on specific sections rather than entire documents
- **Extract images** with `get_pdf_images` when visual content is critical (charts, diagrams, tables, equations)
- **Choose text extraction strategy**: `hybrid` (default) for most cases, `native` for clean PDFs, `ocr` for scanned documents
### 4. Advanced Search Strategies
- **Use multiple search queries** with different keywords and synonyms
- **Apply regex patterns** for flexible matching: `/budget|cost|expense/gi` instead of single terms
- **Combine searches**: Start broad, then narrow down with specific terms
- **Use context characters** (150+ chars) to understand search result context
- **Implement early stopping** with `max_results` for large documents
### 5. Iterative Refinement
- **Start with targeted searches** based on outline analysis
- **Follow up with broader searches** if initial queries don't yield results
- **Extract specific page ranges** identified through search results
- **Use visual analysis** (images) when text extraction seems incomplete or when layout matters
### 6. Performance Optimization
- **Avoid processing entire large PDFs** - always use page ranges when possible
- **Use search with early stopping** before extracting large sections
- **Prefer search over full text extraction** for finding specific information
- **Extract images selectively** only when visual analysis is needed
### 7. Multi-Document Workflows
- **Process documents in parallel** when comparing multiple PDFs
- **Use consistent search terms** across documents for comparison
- **Combine results strategically** rather than processing everything at once
### Key Principles:
- **Strategic before comprehensive**: Understand document structure before diving deep
- **Search before extract**: Use pattern matching to locate relevant content first
- **Visual when necessary**: Extract images only when text extraction is insufficient
- **Iterative refinement**: Start targeted, expand scope as needed
- **Context preservation**: Always maintain enough context around search results
This approach maximizes efficiency, minimizes token usage, and provides more accurate, focused results than traditional "dump entire PDF" methods.
Installation
Option 1: DXT Package (Recommended)
- First, ensure you have completed the Required Configuration above
- Download the latest
pdf-agent-mcp.dxtfile from the releases - Double-click the
.dxtfile to install it in Claude Desktop
Option 2: Manual Installation
- First, ensure you have completed the Required Configuration above
- Clone this repository
- Build the project:
npm install && npm run build - Find your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add the following:
{
"mcpServers": {
"pdf-agent": {
"command": "node",
"args": [
"PATH_TO_REPO/server/index.js"
]
}
}
}
Replace PATH_TO_REPO with the actual path to your cloned repository.
Development
# Install dependencies
npm install
# Build the project
npm run build
# Create DXT package
npm run build:dxt
# Pack the final .dxt file for distribution
dxt pack
Viewing Logs
To debug issues, you can view the MCP server logs:
# View logs (macOS)
open "$HOME/Library/Logs/Claude/mcp-server-PDF Agent MCP.log"
# Stream logs in real-time (macOS)
tail -f "$HOME/Library/Logs/Claude/mcp-server-PDF Agent MCP.log"
# Clear/delete logs (macOS)
rm "$HOME/Library/Logs/Claude/mcp-server-PDF Agent MCP.log"
# View logs (Windows)
notepad "%LOCALAPPDATA%\Claude\Logs\mcp-server-PDF Agent MCP.log"
# Clear/delete logs (Windows)
del "%LOCALAPPDATA%\Claude\Logs\mcp-server-PDF Agent MCP.log"
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。