PDF Redaction MCP Server
Enables loading, reviewing, and redacting sensitive content in PDF documents through text-based or area-based redaction methods. Supports customizable redaction appearance and saves redacted PDFs with comprehensive error handling.
README
PDF Redaction MCP Server
A Model Context Protocol (MCP) server for PDF redaction using PyMuPDF (fitz). This server provides tools for loading PDFs, identifying and redacting sensitive text, and saving redacted documents.
Features
- 📄 Load and read PDF files - Extract text content from PDFs for review
- 🔍 Batch text redaction - Search and redact multiple text strings at once for maximum efficiency
- 📋 Redaction tracking - Keep track of what's been redacted to prevent duplicate work
- 🔎 List applied redactions - Audit trail showing which texts have been marked for redaction
- 📐 Area-based redaction - Redact specific rectangular regions by coordinates
- 💾 Save redacted PDFs - Apply redactions and save with automatic naming
- 🎨 Customizable redaction appearance - Choose redaction fill colors
- 🔒 Error handling - Comprehensive error messages via MCP protocol
Installation
This project uses uv for package management. To install:
# Clone the repository
git clone <your-repo-url>
cd redact_mcp
# Install with uv
uv pip install -e .
Usage
Running the Server
You can run the server using either the Python script directly or the FastMCP CLI:
Option 1: Direct Python execution (stdio transport)
python -m redact_mcp.server
Option 2: Using FastMCP CLI
# Stdio transport (default)
fastmcp run redact_mcp.server:mcp
# HTTP transport for remote access
fastmcp run redact_mcp.server:mcp --transport http --port 8000
Installing in MCP Clients
Claude Desktop
Add to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"pdf-redaction": {
"command": "uv",
"args": [
"--directory",
"/path/to/redact_mcp",
"run",
"fastmcp",
"run",
"redact_mcp.server:mcp"
]
}
}
}
Other MCP Clients
Use the FastMCP CLI to generate configuration for other clients:
# For Cursor
fastmcp install cursor redact_mcp.server:mcp
# For Gemini CLI
fastmcp install gemini-cli redact_mcp.server:mcp
# Generate generic MCP JSON configuration
fastmcp install mcp-json redact_mcp.server:mcp
Available Tools
1. load_pdf
Load a PDF file and extract its text content.
Parameters:
pdf_path(string): Path to the PDF file to load
Returns: The full text content of the PDF, organized by pages
Example:
Load the PDF at /path/to/document.pdf
2. redact_text
Redact all instances of specific texts in a loaded PDF. This tool now accepts multiple texts at once for efficient batch redaction. It automatically tracks which texts have already been redacted to prevent duplicate work.
Parameters:
pdf_path(string): Path to the loaded PDF filetexts_to_redact(list of strings): List of text strings to search for and redactfill_color(tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
Returns: Summary of redaction operations, including which texts were newly redacted and which were skipped (already redacted)
Examples:
# Single text
Redact ["confidential"] in /path/to/document.pdf
# Multiple texts at once (recommended for efficiency)
Redact ["John Doe", "123-45-6789", "john.doe@email.com"] in /path/to/document.pdf
Note: The tool tracks which texts have been redacted and will skip any texts that were already processed, preventing duplicate redactions.
3. redact_area
Redact a specific rectangular area on a PDF page.
Parameters:
pdf_path(string): Path to the loaded PDF filepage_number(int): Page number (1-indexed)x0(float): Left x coordinatey0(float): Top y coordinatex1(float): Right x coordinatey1(float): Bottom y coordinatefill_color(tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
Returns: Confirmation message
Example:
Redact the area from (100, 100) to (300, 150) on page 1 of /path/to/document.pdf
4. save_redacted_pdf
Apply all pending redactions and save the PDF.
Parameters:
pdf_path(string): Path to the loaded PDF fileoutput_path(string, optional): Custom output path. If not provided, appends "_redacted" to original filename
Returns: Path to the saved redacted PDF
Example:
Save the redacted version of /path/to/document.pdf
5. list_loaded_pdfs
List all currently loaded PDF files.
Parameters: None
Returns: List of loaded PDF paths with page counts
6. list_applied_redactions
List all redactions that have been applied to loaded PDF(s). New tool for tracking redaction progress and avoiding duplicate work.
Parameters:
pdf_path(string, optional): Path to a specific PDF. If not provided, lists redactions for all loaded PDFs
Returns: List of texts that have been marked for redaction in each PDF
Examples:
# List redactions for a specific PDF
List applied redactions for /path/to/document.pdf
# List redactions for all loaded PDFs
List all applied redactions
Use Cases:
- Check what has already been redacted before adding more redactions
- Verify redaction progress during a multi-step process
- Avoid duplicate redaction attempts
- Generate a report of what was redacted
7. close_pdf
Close a loaded PDF and free its resources. This also clears the redaction tracking for that PDF.
Parameters:
pdf_path(string): Path to the PDF file to close
Returns: Confirmation message
Workflow Example
Here's a typical workflow using this MCP server:
-
Load a PDF
Load the PDF at /Users/me/documents/sensitive.pdf -
Review the content The tool will return the full text content, which you can review to identify sensitive information.
-
Redact sensitive text (batch mode - recommended)
Redact ["Social Security Number", "123-45-6789", "John Doe", "jane.smith@email.com"] in /Users/me/documents/sensitive.pdfPro tip: Redacting multiple texts at once is much faster than calling the tool multiple times.
-
Check what has been redacted (optional)
List applied redactions for /Users/me/documents/sensitive.pdfThis shows you which texts have already been marked for redaction.
-
Add more redactions if needed
Redact ["Additional Text", "Another Secret"] in /Users/me/documents/sensitive.pdfThe tool will skip any texts that were already redacted in step 3.
-
Redact specific areas (optional)
Redact the area from (50, 100) to (200, 120) on page 2 of /Users/me/documents/sensitive.pdf -
Save the redacted PDF
Save the redacted version of /Users/me/documents/sensitive.pdfThis will create
/Users/me/documents/sensitive_redacted.pdf -
Close the PDF (optional)
Close /Users/me/documents/sensitive.pdf
Technical Details
Performance Tips
Batch Redaction is Faster:
# ❌ Slower: Multiple individual calls
Redact ["John Doe"] in document.pdf
Redact ["123-45-6789"] in document.pdf
Redact ["jane@email.com"] in document.pdf
# ✅ Faster: Single batch call
Redact ["John Doe", "123-45-6789", "jane@email.com"] in document.pdf
Why batch redaction is better:
- Reduces tool invocation overhead
- Scans the PDF only once
- Applies all redactions in a single pass
- Automatically prevents duplicate redactions
- Provides a single summary of all operations
Best Practice: Collect all texts to redact first, then make one batch call.
Dependencies
- FastMCP (>=2.12.0): Python framework for building MCP servers
- PyMuPDF (>=1.24.0): PDF manipulation library (imported as
fitz)
Architecture
- In-memory storage: Loaded PDFs are kept in memory for fast access during redaction operations
- Redaction tracking: The server tracks which texts have been redacted to prevent duplicate work
- Batch processing: Multiple texts can be redacted in a single tool call for improved performance
- Lazy application: Redaction annotations are added but not applied until
save_redacted_pdfis called - Error handling: Uses FastMCP's
ToolErrorfor proper error propagation to MCP clients - Context logging: All operations log to the MCP context for transparency
Limitations (Current Version)
- Text-only redaction: This version focuses on text redaction. Image redaction is not yet implemented.
- Memory usage: PDFs are kept in memory while loaded. Very large PDFs may consume significant memory.
- Single session: The in-memory store is not persistent across server restarts.
Development
Running Tests
# Install development dependencies
uv pip install -e ".[dev]"
# Run tests (when implemented)
pytest
Code Structure
redact_mcp/
├── src/
│ └── redact_mcp/
│ ├── __init__.py # Package initialization
│ └── server.py # Main MCP server implementation
├── pyproject.toml # Package configuration
└── README.md # This file
License
Apache-2.0
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Acknowledgments
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。