MCP Screenshot Server
Enterprise-grade screenshot capture server for AI agents with multi-format support, PII masking, multi-monitor support, and security controls for capturing full screens, specific windows, or custom regions across Linux, macOS, and Windows.
README
📸 MCP Screenshot Server
Give AI agents visual superpowers to see, analyze, and document your applications like senior UX designers.
This enterprise-grade MCP server transforms AI from code-only assistants into visual experts capable of UI analysis, accessibility auditing, documentation generation, and responsive design testing.
🔗 Repository
This package is now maintained in its own repository: https://github.com/Digital-Defiance/mcp-screenshot
This repository is part of the AI Capabilitites Suite on GitHub.
🤔 Why Do AI Agents Need Visual Capabilities?
AI agents today are powerful but visually blind:
- ❌ Can read HTML/CSS but can't see actual layouts
- ❌ Can suggest UI improvements without seeing the real user experience
- ❌ Can't detect accessibility issues like poor contrast or spacing
- ❌ Can't create visual documentation or bug reports
- ❌ Can't analyze responsive design across different screen sizes
Result: You're stuck manually creating screenshots, documentation, and visual analysis that AI should handle.
🎯 Revolutionary Use Cases
📚 "AI, create professional documentation"
You: "Document the checkout process with screenshots"
AI: *Captures each step, analyzes UI flow*
AI: "Generated complete user guide with 6 annotated screenshots, highlighting key interactions and error states..."
🔍 "AI, audit this page for accessibility"
You: "Check if this form meets WCAG guidelines"
AI: *Captures form, analyzes visual elements*
AI: "Found 4 accessibility issues: contrast ratio 2.1:1 (needs 4.5:1), missing focus indicators, touch targets too small..."
🐛 "AI, create a detailed bug report"
You: "The mobile navigation is broken"
AI: *Captures different viewport sizes*
AI: "Created bug report with before/after screenshots, exact breakpoint where it fails, and CSS fix suggestions..."
🎨 "AI, compare these design variations"
You: "Which hero section design works better?"
AI: *Captures both versions, analyzes visual hierarchy*
AI: "Version B has 28% better visual flow—CTA more prominent, text hierarchy clearer, better use of whitespace..."
📱 "AI, test responsive design"
You: "How does this look on different screen sizes?"
AI: *Captures multiple viewport sizes*
AI: "Layout breaks at 768px—sidebar overlaps content. Here's the media query fix with visual proof..."
✨ What This Changes
Before: AI worked blind, relying on code descriptions
- ❌ "The button looks wrong" → AI guesses the issue
- ❌ "Create documentation" → AI writes generic text
- ❌ "Check accessibility" → AI only reviews code
- ❌ "Test responsive design" → AI can't see actual breakpoints
After: AI sees and analyzes your actual user interface
- ✅ Visual debugging - AI identifies exact pixel-level issues
- ✅ Smart documentation - AI creates guides with real screenshots and annotations
- ✅ Accessibility audits - AI measures actual contrast ratios and spacing
- ✅ Responsive testing - AI captures and compares different screen sizes
- ✅ Design analysis - AI evaluates visual hierarchy and user experience
- ✅ Professional reports - AI creates detailed visual evidence for bugs and improvements
🚀 Features
- Multi-format Support: PNG, JPEG, WebP, BMP with configurable quality
- Flexible Capture: Full screen, specific windows, or custom regions
- Privacy Protection: PII masking with OCR-based detection for emails, phone numbers, and credit cards
- Security Controls: Path validation, rate limiting, audit logging, and configurable policies
- Cross-platform: Linux (X11/Wayland), macOS, Windows with native APIs
- Multi-monitor Support: Capture from specific displays in multi-monitor setups
- Enterprise Security: Window exclusion, audit logging, rate limiting
- AI-Optimized: Structured responses perfect for AI agent workflows
Installation
NPM Installation
npm install @ai-capabilities-suite/mcp-screenshot
System Requirements
Linux:
- X11:
imagemagickpackage (providesimportcommand) - Wayland:
grimpackage
# Ubuntu/Debian
sudo apt-get install imagemagick grim
# Fedora
sudo dnf install ImageMagick grim
# Arch
sudo pacman -S imagemagick grim
macOS:
- Built-in
screencapturecommand (no additional dependencies) - Screen Recording permission required (System Preferences > Security & Privacy > Privacy > Screen Recording)
Windows:
- No additional dependencies required
MCP Configuration
Add to your MCP settings file (e.g., ~/.kiro/settings/mcp.json or .kiro/settings/mcp.json):
{
"mcpServers": {
"screenshot": {
"command": "node",
"args": ["/path/to/mcp-screenshot/dist/cli.js"],
"env": {
"SCREENSHOT_ALLOWED_DIRS": "/home/user/screenshots,/tmp",
"SCREENSHOT_MAX_CAPTURES_PER_MIN": "60",
"SCREENSHOT_ENABLE_AUDIT_LOG": "true"
}
}
}
}
🛠️ 5 Professional MCP Tools
Purpose-built for AI agents to capture, analyze, and work with visual information:
The server exposes 5 comprehensive MCP tools that enable AI agents to see and understand your applications:
1. screenshot_capture_full
Capture full screen or specific display.
Parameters:
display(string, optional): Display ID to capture (defaults to primary display)format(string, optional): Image format -png,jpeg,webp, orbmp(default:png)quality(number, optional): Compression quality 1-100 for lossy formats (default: 90)savePath(string, optional): File path to save screenshot (returns base64 if not provided)enablePIIMasking(boolean, optional): Enable PII detection and masking (default: false)
Example:
{
"name": "screenshot_capture_full",
"arguments": {
"format": "png",
"savePath": "/home/user/screenshots/desktop.png",
"enablePIIMasking": true
}
}
Response:
{
"status": "success",
"filePath": "/home/user/screenshots/desktop.png",
"metadata": {
"width": 1920,
"height": 1080,
"format": "png",
"fileSize": 245678,
"timestamp": "2024-12-01T10:30:00.000Z",
"display": {
"id": "0",
"name": "Primary Display",
"resolution": { "width": 1920, "height": 1080 },
"position": { "x": 0, "y": 0 },
"isPrimary": true
},
"piiMasking": {
"emailsRedacted": 2,
"phonesRedacted": 1,
"creditCardsRedacted": 0,
"customPatternsRedacted": 0
}
}
}
2. screenshot_capture_window
Capture specific application window by ID or title pattern.
Parameters:
windowId(string, optional): Window identifier (usewindowIdorwindowTitle)windowTitle(string, optional): Window title pattern to match (usewindowIdorwindowTitle)includeFrame(boolean, optional): Include window frame and title bar (default: false)format(string, optional): Image format (default:png)quality(number, optional): Compression quality 1-100 (default: 90)savePath(string, optional): File path to save screenshot
Example:
{
"name": "screenshot_capture_window",
"arguments": {
"windowTitle": "Chrome",
"includeFrame": false,
"format": "jpeg",
"quality": 85
}
}
Response:
{
"status": "success",
"data": "iVBORw0KGgoAAAANSUhEUgAA...",
"mimeType": "image/jpeg",
"metadata": {
"width": 1280,
"height": 720,
"format": "jpeg",
"fileSize": 89234,
"timestamp": "2024-12-01T10:31:00.000Z",
"window": {
"id": "12345",
"title": "Google Chrome",
"processName": "chrome",
"pid": 5678,
"bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 }
}
}
}
3. screenshot_capture_region
Capture specific rectangular region of the screen.
Parameters:
x(number, required): X coordinate of top-left cornery(number, required): Y coordinate of top-left cornerwidth(number, required): Width of region in pixelsheight(number, required): Height of region in pixelsformat(string, optional): Image format (default:png)quality(number, optional): Compression quality 1-100 (default: 90)savePath(string, optional): File path to save screenshot
Example:
{
"name": "screenshot_capture_region",
"arguments": {
"x": 100,
"y": 100,
"width": 800,
"height": 600,
"format": "png"
}
}
Response:
{
"status": "success",
"data": "iVBORw0KGgoAAAANSUhEUgAA...",
"mimeType": "image/png",
"metadata": {
"width": 800,
"height": 600,
"format": "png",
"fileSize": 123456,
"timestamp": "2024-12-01T10:32:00.000Z",
"region": {
"x": 100,
"y": 100,
"width": 800,
"height": 600
}
}
}
4. screenshot_list_displays
List all connected displays with resolution and position information.
Parameters: None
Example:
{
"name": "screenshot_list_displays",
"arguments": {}
}
Response:
{
"status": "success",
"displays": [
{
"id": "0",
"name": "Primary Display",
"resolution": { "width": 1920, "height": 1080 },
"position": { "x": 0, "y": 0 },
"isPrimary": true
},
{
"id": "1",
"name": "Secondary Display",
"resolution": { "width": 1920, "height": 1080 },
"position": { "x": 1920, "y": 0 },
"isPrimary": false
}
]
}
5. screenshot_list_windows
List all visible windows with title, process, and position information.
Parameters: None
Example:
{
"name": "screenshot_list_windows",
"arguments": {}
}
Response:
{
"status": "success",
"windows": [
{
"id": "12345",
"title": "Google Chrome",
"processName": "chrome",
"pid": 5678,
"bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 },
"isMinimized": false
},
{
"id": "67890",
"title": "Terminal",
"processName": "gnome-terminal",
"pid": 9012,
"bounds": { "x": 200, "y": 200, "width": 800, "height": 600 },
"isMinimized": false
}
]
}
Security Configuration
The server enforces security policies to control screenshot operations. Configure via environment variables or security policy file.
Environment Variables
SCREENSHOT_ALLOWED_DIRS: Comma-separated list of allowed directories for saving screenshotsSCREENSHOT_MAX_CAPTURES_PER_MIN: Maximum captures per minute (default: 60)SCREENSHOT_ENABLE_AUDIT_LOG: Enable audit logging (default: true)SCREENSHOT_BLOCKED_WINDOWS: Comma-separated list of window title patterns to exclude
Security Policy File
Create a security-policy.json file:
{
"allowedDirectories": [
"/home/user/screenshots",
"/tmp/screenshots"
],
"blockedWindowPatterns": [
".*Password.*",
".*1Password.*",
".*LastPass.*",
".*Bitwarden.*",
".*Authentication.*"
],
"maxCapturesPerMinute": 60,
"enableAuditLog": true
}
Load the policy when starting the server:
import { MCPScreenshotServer } from '@ai-capabilities-suite/mcp-screenshot';
import * as fs from 'fs';
const policy = JSON.parse(fs.readFileSync('security-policy.json', 'utf-8'));
const server = new MCPScreenshotServer(policy);
await server.start();
Error Handling
All tools return structured error responses with error codes and remediation suggestions.
Error Codes
| Code | Description | Remediation |
|---|---|---|
PERMISSION_DENIED |
Insufficient permissions to capture | Grant Screen Recording permission (macOS) or check user permissions |
INVALID_PATH |
File path outside allowed directories | Use a path within configured allowed directories |
WINDOW_NOT_FOUND |
Specified window does not exist | Use screenshot_list_windows to find available windows |
DISPLAY_NOT_FOUND |
Specified display does not exist | Use screenshot_list_displays to find available displays |
UNSUPPORTED_FORMAT |
Requested format not supported | Use png, jpeg, webp, or bmp |
CAPTURE_FAILED |
Screenshot capture failed | Check permissions and try again |
RATE_LIMIT_EXCEEDED |
Too many captures in time window | Wait before making additional requests |
INVALID_REGION |
Invalid region coordinates or dimensions | Ensure coordinates are non-negative and dimensions are positive |
OUT_OF_MEMORY |
Insufficient memory for operation | Reduce capture size or close other applications |
ENCODING_FAILED |
Image encoding failed | Try different format or reduce quality |
FILE_SYSTEM_ERROR |
File system operation failed | Check permissions and disk space |
Error Response Format
{
"status": "error",
"error": {
"code": "WINDOW_NOT_FOUND",
"message": "Window with ID '12345' not found",
"details": {
"windowId": "12345"
},
"remediation": "Verify the window exists and is visible. Use screenshot_list_windows to see available windows."
}
}
Troubleshooting
Linux Issues
Problem: import: command not found or grim: command not found
Solution: Install required packages:
# X11
sudo apt-get install imagemagick
# Wayland
sudo apt-get install grim
Problem: Black screen or empty captures
Solution: Check display server environment variables:
echo $DISPLAY # Should show :0 or similar for X11
echo $WAYLAND_DISPLAY # Should show wayland-0 or similar for Wayland
macOS Issues
Problem: PERMISSION_DENIED error
Solution: Grant Screen Recording permission:
- Open System Preferences > Security & Privacy > Privacy
- Select "Screen Recording" from the list
- Add your terminal application or Node.js to the allowed list
- Restart the application
Problem: Retina display captures are double resolution
Solution: This is expected behavior. Retina displays have 2x pixel density. Use the width and height from metadata to determine actual dimensions.
Windows Issues
Problem: Capture fails with access denied
Solution: Run the application with administrator privileges or check Windows Defender settings.
Problem: Multi-monitor captures show wrong display
Solution: Use screenshot_list_displays to get correct display IDs and positions.
General Issues
Problem: RATE_LIMIT_EXCEEDED error
Solution: The server limits captures to prevent abuse. Wait 60 seconds or adjust maxCapturesPerMinute in security policy.
Problem: INVALID_PATH error when saving
Solution: Ensure the save path is within allowed directories configured in security policy.
Problem: PII masking not working
Solution:
- Ensure tesseract.js is properly installed
- Check that
eng.traineddatalanguage file is available - PII masking requires OCR which may be slow on large images
Problem: Large file sizes
Solution:
- Use JPEG format with lower quality (60-80) for smaller files
- Use WebP format for best compression
- Reduce capture region size if possible
Problem: Out of memory errors
Solution:
- Capture smaller regions instead of full screen
- Reduce quality settings
- Close other applications to free memory
- Use streaming for very large captures
Programmatic Usage
TypeScript/JavaScript
import { MCPScreenshotServer } from '@ai-capabilities-suite/mcp-screenshot';
// Create server with custom security policy
const server = new MCPScreenshotServer({
allowedDirectories: ['/home/user/screenshots'],
maxCapturesPerMinute: 30,
enableAuditLog: true,
blockedWindowPatterns: ['.*Password.*']
});
// Start server
await server.start();
// Server will handle MCP protocol requests via stdio
// Keep process running
process.on('SIGINT', async () => {
await server.stop();
process.exit(0);
});
Direct Capture Engine Usage
import { createCaptureEngine } from '@ai-capabilities-suite/mcp-screenshot';
// Create platform-specific capture engine
const engine = createCaptureEngine();
// Capture full screen
const fullScreen = await engine.captureScreen();
// List and capture windows
const windows = await engine.getWindows();
const window = windows.find(w => w.title.includes('Chrome'));
if (window) {
const buffer = await engine.captureWindow(window.id, false);
}
// Capture region
const region = await engine.captureRegion(100, 100, 800, 600);
// List displays
const displays = await engine.getDisplays();
console.log(`Found ${displays.length} displays`);
Development
This package is part of the AI Capabilities Suite monorepo.
Build
npm run build
Test
# Run all tests
npm test
# Run specific test suites
npm test -- capture
npm test -- security
npm test -- property
# Run with coverage
npm test -- --coverage
Project Structure
packages/mcp-screenshot/
├── src/
│ ├── capture/ # Platform-specific capture engines
│ ├── processing/ # Image processing and encoding
│ ├── privacy/ # PII detection and masking
│ ├── security/ # Security policy enforcement
│ ├── storage/ # File operations
│ ├── tools/ # MCP tool implementations
│ ├── interfaces/ # TypeScript interfaces
│ ├── types/ # Type definitions
│ ├── errors/ # Error classes
│ ├── server.ts # MCP server implementation
│ └── cli.ts # CLI entry point
├── README.md
├── TESTING.md
└── package.json
Contributing
Contributions are welcome! Please ensure:
- All tests pass (
npm test) - Code follows TypeScript best practices
- New features include tests and documentation
- Security considerations are addressed
License
MIT
Support
For issues and questions:
- GitHub Issues: Create an issue
- Documentation: See TESTING.md for testing guide
- Security: Report security issues privately to security@example.com
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。