Linux Desktop MCP Server
Enables AI assistants to interact with native Linux desktop applications through AT-SPI2 accessibility interfaces. Provides semantic element targeting, natural language search, and automation capabilities (clicking, typing, keyboard shortcuts) across GTK, Qt, and Electron applications.
README
Linux Desktop MCP Server
Built with Claude Code - This entire MCP server was developed using Claude Code, Anthropic's AI-powered coding assistant. We're proud to showcase what's possible with AI-assisted development!
An MCP server that provides Chrome-extension-level semantic element targeting for native Linux desktop applications using AT-SPI2 (Assistive Technology Service Provider Interface).
Features
- Semantic Element References: Just like Chrome extension's
ref_1,ref_2system - Role Detection: Identifies buttons, text fields, links, menus, etc.
- State Detection: Tracks focused, enabled, checked, editable states
- Natural Language Search: Find elements by description ("save button", "search field")
- Cross-Platform Input: Works on X11, Wayland, and XWayland
- GTK/Qt/Electron Support: Works with any application that exposes accessibility
Installation
System Dependencies
# Ubuntu/Debian
sudo apt install python3-pyatspi gir1.2-atspi-2.0 at-spi2-core
# For X11 input simulation
sudo apt install xdotool
# For Wayland input simulation (recommended)
# Install ydotool from source or your package manager
# Then start the daemon:
sudo ydotoold &
Python Package
# From PyPI
pip install linux-desktop-mcp
# Or from source
git clone https://github.com/yourusername/linux-desktop-mcp.git
cd linux-desktop-mcp
pip install -e .
Enable Accessibility
Ensure accessibility is enabled in your desktop environment:
- GNOME: Settings → Accessibility → Enable accessibility features
- KDE: System Settings → Accessibility
- Most modern desktops have this enabled by default
Configuration
Add to ~/.claude/settings.json:
{
"mcpServers": {
"linux-desktop": {
"command": "linux-desktop-mcp"
}
}
}
Or if installed from source:
{
"mcpServers": {
"linux-desktop": {
"command": "python",
"args": ["-m", "linux_desktop_mcp"]
}
}
}
Available Tools
desktop_snapshot
Capture the accessibility tree with semantic element references.
Parameters:
app_name: str (optional) - Filter to specific application
max_depth: int (default: 15) - Tree traversal depth
Returns:
Tree of elements with ref_ids:
- ref_1: [application] Firefox
- ref_2: [frame] "GitHub - Mozilla Firefox"
- ref_3: [button] "Back" (clickable)
- ref_4: [entry] "Search or enter address" (editable, focused)
desktop_find
Find elements by natural language query.
Parameters:
query: str - "save button", "search field", "menu containing File"
app_name: str (optional)
Returns:
Matching elements with refs, states, and actions
desktop_click
Click an element by reference or coordinates.
Parameters:
ref: str - Element reference (e.g., "ref_5")
element: str - Human description for logging
coordinate: [x, y] - Fallback if no ref
button: left|right|middle
click_type: single|double
modifiers: [ctrl, shift, alt, super]
desktop_type
Type text into an element.
Parameters:
text: str - Text to type
ref: str - Element to focus first (optional)
element: str - Human description
clear_first: bool - Ctrl+A, Delete before typing
submit: bool - Press Enter after
desktop_key
Press keyboard keys/shortcuts.
Parameters:
key: str - Key name (Return, Tab, Escape, a, etc.)
modifiers: [ctrl, shift, alt, super]
desktop_capabilities
Check available automation capabilities.
Example Usage
Example 1: Navigating to a Website in Firefox
User: "Open GitHub in Firefox"
Claude uses:
1. desktop_snapshot(app_name="Firefox")
→ Returns UI tree with elements like:
- ref_5: [entry] "Search or enter address" (editable, focused)
- ref_12: [button] "Go" (clickable)
2. desktop_click(ref="ref_5", element="URL bar")
→ Clicks to focus the address bar
3. desktop_type(text="https://github.com", ref="ref_5", clear_first=True, submit=True)
→ Types the URL and presses Enter
Result: Firefox navigates to GitHub
Example 2: Saving a File in LibreOffice
User: "Save this document as 'report.odt'"
Claude uses:
1. desktop_key(key="s", modifiers=["ctrl"])
→ Opens the Save dialog
2. desktop_snapshot(app_name="LibreOffice")
→ Returns dialog elements including:
- ref_8: [entry] "File name:" (editable)
- ref_15: [button] "Save" (clickable)
3. desktop_type(text="report.odt", ref="ref_8", clear_first=True)
→ Types the filename
4. desktop_click(ref="ref_15", element="Save button")
→ Clicks Save
Result: Document saved as report.odt
Example 3: Searching in a Code Editor
User: "Search for 'TODO' comments in VS Code"
Claude uses:
1. desktop_find(query="search", app_name="Code")
→ Finds search-related elements
2. desktop_key(key="f", modifiers=["ctrl", "shift"])
→ Opens global search panel
3. desktop_snapshot(app_name="Code")
→ Returns search panel elements:
- ref_22: [entry] "Search" (editable, focused)
- ref_25: [checkbox] "Match Case"
4. desktop_type(text="TODO", ref="ref_22", submit=True)
→ Types search query and executes search
Result: VS Code shows all TODO comments across the project
Example 4: Window Targeting for Multi-Window Automation
User: "Help me copy data from the spreadsheet to the email"
Claude uses:
1. desktop_context(list_available=True)
→ Lists all available windows
2. desktop_target_window(app_name="LibreOffice Calc", color="green")
→ Targets spreadsheet with green border
3. desktop_target_window(app_name="Thunderbird", color="blue")
→ Targets email client with blue border
4. desktop_snapshot()
→ Only shows elements from targeted windows (reduced context)
5. [Proceeds with copy/paste operations between windows]
Result: Claude can efficiently work across multiple applications
Platform Support
| Feature | X11 | Wayland | XWayland |
|---|---|---|---|
| AT-SPI discovery | Full | Full | Full |
| Click by ref | Full | Full | Full |
| Type text | Full | Full | Full |
| ydotool input | Full | Full | Full |
| xdotool input | Full | No | Yes |
Troubleshooting
"AT-SPI2 not available"
sudo apt install python3-pyatspi gir1.2-atspi-2.0 at-spi2-core
"AT-SPI2 registry not running"
Ensure accessibility is enabled in your desktop settings. You may need to log out and back in.
"No input backend available" (Wayland)
# Install and start ydotool daemon
sudo ydotoold &
Elements not showing up
Some applications may not expose accessibility information. Modern GTK3/4, Qt5/6, and Electron apps generally work well.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ MCP Protocol Layer │
│ (JSON-RPC over stdio, tool defs) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Reference Manager │
│ (ref_1, ref_2 mapping, lifecycle, GC) │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────┴───────────────────┐
│ │
┌─────────────────────┐ ┌─────────────────────────┐
│ AT-SPI2 Backend │ │ Input Backends │
│ (pyatspi) │ │ (ydotool/xdotool/wtype) │
└─────────────────────┘ └─────────────────────────┘
Contributing
This project was created with Claude Code and we warmly welcome contributions! Whether you want to:
- Report bugs or request features
- Submit pull requests
- Fork and build your own version
- Improve documentation
We're very open to help and collaboration. See CONTRIBUTING.md for guidelines.
Privacy Policy
Linux Desktop MCP is a local desktop automation tool that:
- Runs entirely on your local machine - No data is transmitted to external servers
- Does not collect any personal data - No analytics, telemetry, or usage tracking
- Does not store credentials - All authentication and authorization is handled by your local system
- Accesses only what you explicitly target - The accessibility tree is read only for windows/applications you interact with
- No network connectivity required - The MCP server operates completely offline
The only data accessed is the accessibility tree information exposed by your desktop applications (UI element names, roles, and states), which is used solely for local automation and is not persisted or transmitted anywhere.
Contact: For privacy-related questions, open an issue on GitHub.
License
MIT - See LICENSE for details.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。
mcp-server-qdrant
这个仓库展示了如何为向量搜索引擎 Qdrant 创建一个 MCP (Managed Control Plane) 服务器的示例。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。