MCP Scrcpy Vision
Provides AI agents with real-time vision and control over Android devices through screen streaming, UI automation, and fast input control via scrcpy protocol.
README
mcp-scrcpy-vision
An MCP server that gives AI agents complete vision and control over Android devices.
Features:
- Real-time Vision: Continuous screen streaming via scrcpy H.264 + ffmpeg
- Fast Input Control: When streaming, input uses scrcpy control protocol (~5-10ms latency vs ~100-300ms with adb shell)
- UI Automation: Element detection via uiautomator with tap coordinates
- Full Input Control: Tap, swipe, long press, pinch, drag-drop, text, keycodes
- System Access: Shell commands, file transfer, clipboard, notifications
- Multi-device: Control multiple Android devices simultaneously
- WiFi ADB: Connect wirelessly for untethered automation
Quick Start
1. Prerequisites
Required:
- Node.js 18+
- ADB (Android Platform Tools) in PATH
- Android device with USB debugging enabled
For streaming (recommended for fast input):
2. Install
git clone https://github.com/anthropics/mcp-scrcpy-vision.git
cd mcp-scrcpy-vision
npm install
npm run build
3. Configure
Create .env file:
# Required for streaming + fast input
SCRCPY_SERVER_PATH="C:\scrcpy-win64-v3.2\scrcpy-server"
SCRCPY_SERVER_VERSION="3.2"
# Optional (defaults shown)
ADB_PATH="adb"
FFMPEG_PATH="ffmpeg"
DEFAULT_MAX_SIZE="1024"
DEFAULT_MAX_FPS="30"
DEFAULT_FRAME_FPS="2"
4. Add to MCP Client
Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
}
Cursor (Settings > MCP):
{
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
5. Connect Device
- Enable USB debugging on Android device (Settings > Developer Options > USB Debugging)
- Connect via USB
- Accept RSA fingerprint prompt on device
- Verify:
adb devicesshould show your device
How It Works
Two Modes of Operation
1. Snapshot Mode (No streaming required)
- Uses
android.vision.snapshotfor screenshots - Input uses ADB shell commands (~100-300ms per action)
- Works without scrcpy/ffmpeg
- Best for simple automation or when streaming isn't available
2. Streaming Mode (Recommended)
- Start with
android.vision.startStream - Continuous JPEG frames available via resource URI
- Input uses scrcpy control protocol (~5-10ms per action)
- 10-20x faster than snapshot mode
- Best for real-time control and rapid interactions
Performance Comparison
| Operation | Snapshot Mode | Streaming Mode |
|---|---|---|
| Tap | ~100-300ms | ~5-10ms |
| Swipe | ~300-500ms | ~50-100ms |
| Type text | ~50ms/char | ~5ms total |
| Screenshot | ~500ms | ~33ms (30fps) |
Tools Reference (32 tools)
Device Management
| Tool | Parameters | Description |
|---|---|---|
android.devices.list |
- | List connected devices |
android.devices.info |
serial |
Get device info (model, SDK, etc.) |
android.adb.enableTcpip |
serial, port? |
Enable WiFi debugging |
android.adb.getDeviceIp |
serial |
Get device WiFi IP |
android.adb.connectWifi |
ipAddress, port? |
Connect via WiFi |
android.adb.disconnectWifi |
ipAddress? |
Disconnect WiFi |
Vision
| Tool | Parameters | Description |
|---|---|---|
android.vision.startStream |
serial, maxSize?, maxFps?, frameFps? |
Start continuous stream (enables fast input) |
android.vision.stopStream |
serial |
Stop stream |
android.vision.snapshot |
serial |
Take PNG screenshot (works without streaming) |
android.ui.dump |
serial |
Get UI hierarchy XML |
android.ui.findElement |
serial, text?, resourceId?, className?, contentDesc? |
Find elements with tap coords |
Input Control
Note: These automatically use fast scrcpy control when streaming, otherwise fall back to ADB.
| Tool | Parameters | Description |
|---|---|---|
android.input.tap |
serial, x, y |
Tap at coordinates |
android.input.swipe |
serial, x1, y1, x2, y2, durationMs? |
Swipe gesture |
android.input.longPress |
serial, x, y, durationMs? |
Long press |
android.input.pinch |
serial, centerX, centerY, startDistance, endDistance, durationMs? |
Pinch zoom |
android.input.dragDrop |
serial, startX, startY, endX, endY, durationMs? |
Drag and drop |
android.input.text |
serial, text |
Type text |
android.input.keyevent |
serial, keycode |
Send keycode |
App Control
| Tool | Parameters | Description |
|---|---|---|
android.app.start |
serial, packageName, activity? |
Launch app |
android.app.stop |
serial, packageName |
Force-stop app |
android.apps.list |
serial, system? |
List installed apps |
android.activity.current |
serial |
Get foreground activity |
System
| Tool | Parameters | Description |
|---|---|---|
android.shell.exec |
serial, command |
Execute shell command |
android.file.push |
serial, localPath, remotePath |
Push file to device |
android.file.pull |
serial, remotePath, localPath |
Pull file from device |
android.file.list |
serial, path |
List directory |
android.clipboard.get |
serial |
Get clipboard |
android.clipboard.set |
serial, text |
Set clipboard |
android.notifications.get |
serial |
Get notifications |
Screen Control
| Tool | Parameters | Description |
|---|---|---|
android.screen.wake |
serial |
Wake screen |
android.screen.sleep |
serial |
Sleep screen |
android.screen.isOn |
serial |
Check if screen is on |
android.screen.unlock |
serial |
Unlock (unsecured only) |
Resources
The server exposes these MCP resources:
android://devices- JSON list of connected devicesandroid://device/<serial>/frame/latest.jpg- Latest JPEG frame (when streaming)
Usage Examples
Basic Automation Loop (Streaming Mode)
1. Start stream: android.vision.startStream { serial: "ABC123" }
2. Read resource: android://device/ABC123/frame/latest.jpg
3. AI analyzes image, decides to tap "Login" button
4. Find element: android.ui.findElement { serial: "ABC123", text: "Login" }
5. Tap at returned coordinates: android.input.tap { serial: "ABC123", x: 540, y: 1200 }
6. Wait 500ms, read resource again, repeat
7. When done: android.vision.stopStream { serial: "ABC123" }
Simple Screenshot Mode
1. Take screenshot: android.vision.snapshot { serial: "ABC123" }
2. AI analyzes image
3. Find and tap: android.ui.findElement + android.input.tap
4. Take another screenshot to verify
WiFi Connection Workflow
1. Connect device via USB
2. android.adb.enableTcpip { serial: "ABC123" }
3. android.adb.getDeviceIp { serial: "ABC123" } → "192.168.1.50"
4. Disconnect USB cable
5. android.adb.connectWifi { ipAddress: "192.168.1.50" }
6. Now use "192.168.1.50:5555" as serial for all commands
App Testing Example
1. android.app.start { serial: "ABC123", packageName: "com.example.app" }
2. android.vision.startStream { serial: "ABC123" }
3. Wait for app to load, read frame
4. android.ui.findElement { serial: "ABC123", resourceId: "username_field" }
5. android.input.tap { serial: "ABC123", x: 540, y: 300 }
6. android.input.text { serial: "ABC123", text: "testuser@example.com" }
7. android.input.keyevent { serial: "ABC123", keycode: 66 } // Enter
8. Read frame, verify login succeeded
9. android.vision.stopStream { serial: "ABC123" }
Common Keycodes
| Key | Code | Key | Code |
|---|---|---|---|
| HOME | 3 | BACK | 4 |
| VOLUME_UP | 24 | VOLUME_DOWN | 25 |
| POWER | 26 | ENTER | 66 |
| DELETE | 67 | TAB | 61 |
| MENU | 82 | APP_SWITCH | 187 |
| WAKEUP | 224 | SLEEP | 223 |
Troubleshooting
No devices found
adb kill-server
adb start-server
adb devices
Ensure USB debugging is enabled and RSA fingerprint accepted.
Scrcpy version mismatch
SCRCPY_SERVER_VERSION must exactly match your scrcpy-server file. Check the scrcpy release version you downloaded.
ffmpeg not found
- Windows: Download from https://ffmpeg.org/download.html, extract, add bin folder to PATH
- macOS:
brew install ffmpeg - Linux:
apt install ffmpegoryum install ffmpeg
Or set FFMPEG_PATH in .env to the full path.
uiautomator dump fails
Some devices need screen on. Try android.screen.wake first.
Clipboard not working (Android 10+)
Android 10+ restricts clipboard access. Use UI automation to paste instead.
Stream won't start
- Check scrcpy-server path is correct
- Verify version numbers match
- Try running scrcpy standalone first to verify it works
Notes & Limitations
- Fast input when streaming: When a stream is active, tap/swipe/text/keyevent use the scrcpy control protocol (~5-10ms latency). Without streaming, falls back to
adb shell input(~100-300ms). - One stream per device at a time
- Snapshot works without scrcpy - useful fallback when streaming is not needed
- Clipboard has platform limitations on Android 10+
- Notifications may require permissions on newer Android
- Pinch gesture currently simulates single-finger; true multi-touch requires the streaming session
Security Warning
This MCP server provides full control over connected Android devices:
- Execute arbitrary shell commands
- Read/write files on device
- Control UI and input
- Access clipboard and notifications
Only connect devices you own and trust the AI agent.
Development
npm run dev # Development with tsx
npm run build # Compile TypeScript
npm start # Run production build
See claude.md for developer documentation. See agents.md for AI agent integration guide.
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。