Document Intelligence MCP Server
Enables Claude to read and analyze PDF documents with automatic OCR processing for scanned files. Features intelligent text extraction, caching for performance, and secure file access with search capabilities.
README
Building an MCP Server with OCR: From Setup Struggles to Document Intelligence- Proof of Concept
A real-world journey of setting up Anthropic's Model Context Protocol server with advanced PDF processing capabilities
The Challenge: Making Claude Desktop Read Your Documents
Ever wished your AI assistant could read through your scanned PDFs, legal documents, or HOA covenants without you having to manually extract text? That's exactly what we set out to accomplish in this session - building a custom MCP (Model Context Protocol) server that not only reads PDFs but intelligently handles scanned documents using OCR.
Starting Point: Following the Official Tutorial
We began by following Anthropic's official MCP server tutorial to build a basic weather server. What seemed like a straightforward process quickly became a Windows-specific debugging adventure.
The Setup Struggles
Problem #1: Missing Configuration File The first hurdle was the infamous "cannot find claude_desktop_config.json" error. This configuration file doesn't exist by default - you need to create it manually in the right location:
- Windows:
%APPDATA%\Claude\claude_desktop_config.json - macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
Problem #2: UV Package Manager Permissions
The tutorial requires uv (a fast Python package manager), but our initial installation had permission issues. The solution was using PowerShell as Administrator with the bypass flag:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Problem #3: Project Scripts Configuration
Even with uv working, the server couldn't run because the pyproject.toml was missing the crucial [project.scripts] section:
[project.scripts]
weather = "weather:main"
Building the Weather Server Foundation
Once we overcame the setup issues, we had a basic weather server running with these tools:
get_forecast- Simulated weather forecasts for any cityget_alerts- Weather alerts for US states
The server successfully connected to Claude Desktop, proving our MCP infrastructure was working.
The Real Goal: Document Intelligence
With the foundation in place, we tackled the main objective - adding PDF reading capabilities with OCR support for scanned documents. This required several components:
1. Basic PDF Reading
Using PyPDF2 for extractable text PDFs:
def extract_pdf_text(file_path: str, page_numbers: list[int] = None):
# Extract text from regular PDFs
2. OCR Integration
For scanned documents, we integrated:
- pytesseract - Python wrapper for Google's Tesseract OCR engine
- pdf2image - Converts PDF pages to images for OCR processing
- Pillow - Image processing library
3. Intelligent Detection System
The server automatically determines whether a PDF needs OCR:
def has_extractable_text(file_path: str) -> bool:
# Checks if PDF has meaningful extractable text
# Falls back to OCR for scanned documents
4. Caching System
Perhaps the most valuable feature - OCR results are cached to avoid reprocessing:
- Cached files use naming pattern:
document_ocr_[hash].txt - Hash ensures cache invalidation when source PDF changes
- Dramatically improves performance for repeat access
Security Considerations
The server includes built-in security measures:
- Path Validation: Only allows access to predefined directories
- File Type Restrictions: Limited to PDF files
- Permission Checks: Validates file access before processing
ALLOWED_PDF_DIRECTORIES = [
"/path/to/your/documents",
"/path/to/your/pdfs",
"/path/to/your/downloads"
]
Real-World Test: HOA Document Analysis
To validate our system, we processed actual HOA covenant documents:
- Input: 5.1 MB scanned PDF with 40+ pages
- Processing: Full OCR extraction and caching
- Output: Complete one-page summary of key provisions
- Result: Instant future access via cached text
The system successfully identified property restrictions, assessment procedures, architectural controls, and enforcement mechanisms from a complex legal document.
Final MCP Tools Arsenal
Our completed server provides these capabilities:
Document Tools:
read_pdf- Read entire documents or specific pages with automatic OCRlist_pdfs- Inventory available documents with scan/cache statussearch_pdf_content- Full-text search within documents
Weather Tools:
get_forecast- Weather forecasts for any locationget_alerts- Weather alerts by state
Smart Features:
- Automatic scanned PDF detection
- Intelligent OCR fallback
- Persistent caching system
- Security-first file access
Lessons Learned
1. Windows Development Gotchas
- PowerShell execution policies can block installations
- Path separators matter in configuration files
- Permission issues are common with package managers
2. OCR Implementation Insights
- System dependencies (Tesseract, Poppler) are required
- Caching is essential for practical OCR performance
- Hybrid approach (text extraction + OCR fallback) works best
3. MCP Architecture Benefits
- Modular tool design allows easy capability expansion
- Security model provides controlled file system access
- Integration with Claude Desktop creates seamless user experience
Performance Impact
The caching system provides dramatic performance improvements:
- First Access: ~30-60 seconds for OCR processing
- Subsequent Access: <1 second from cache
- Storage Overhead: ~10-20% of original PDF size for text cache
What's Next?
This foundation opens up numerous possibilities:
- Integration with cloud OCR services for better accuracy
- Support for additional document formats (DOCX, images)
- Semantic search using embeddings
- Document comparison and analysis tools
- Automated summarization and extraction pipelines
Code Availability
The complete MCP server code includes:
- Comprehensive error handling
- Type hints throughout
- Detailed documentation
- Production-ready security measures
- Extensible architecture for additional tools
Conclusion
Building this MCP server transformed a basic tutorial into a powerful document intelligence system. What started as debugging configuration issues evolved into a practical tool for extracting insights from scanned legal documents.
The real value isn't just in the technical implementation - it's in democratizing access to document analysis. Now anyone can ask their AI assistant to "summarize my HOA covenants" or "what are the key restrictions in my lease?" and get instant, accurate responses from scanned PDFs.
The journey from setup struggles to document intelligence showcases both the power of the MCP architecture and the practical challenges of real-world AI development. Sometimes the best learning happens when things don't work as expected.
This MCP server demonstrates the potential of combining traditional document processing with modern AI capabilities. By handling the technical complexities behind the scenes, we enable natural language interaction with complex documents - transforming how people access and understand their important paperwork.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。