systemd-mcp
Provides AI assistants with safe, read-only access to Linux systemd services, including status monitoring, log querying, and dependency analysis, with optional granular permissions for service management actions.
README
systemd-mcp
A Model Context Protocol (MCP) server for systemd integration. Give your AI assistant eyes and hands on your Linux services.
Status: v0.5.0 (NEVERHANG v2.0)
Author: Claude + MOD
License: MIT
Organization: ArktechNWA
Why?
AI assistants are blind to your system. They can write code but can't see if nginx crashed, can't tail logs, can't restart a stuck daemon.
"Just give it shell access" — bad idea. Shell access is all-or-nothing. One hallucinated rm -rf or hung systemctl and you're in trouble. No guardrails, no visibility, no recovery.
systemd-mcp is an intelligent interface, not a wrapper:
| Problem | systemd-mcp Solution |
|---|---|
| Commands can hang forever | NEVERHANG v2.0 — tiered timeouts, circuit breaker |
| No memory between calls | A.L.A.N. database — persistent state, learns your system |
| Failures cascade | Circuit breaker opens, commands fail fast, auto-recovery |
| AI has no operational intuition | Health trends, P95 latency, success rates — data it can reason about |
| All-or-nothing permissions | Granular: read-only default, whitelist/blacklist, permission tiers |
This is the difference between "run commands for me" and "understand my infrastructure."
Philosophy
- Safety by default — Read-only out of the box
- User controls exposure — Whitelist, blacklist, permission levels
- NEVERHANG v2.0 — Circuit breaker, adaptive timeouts, A.L.A.N. database, self-healing
- Graceful fallback — Optional Haiku AI for log analysis
- Structured output — JSON for machines, summaries for AI
Features
Perception (Read)
- List all units with filtering (type, state, pattern)
- Detailed unit status with resource usage
- Failed units at a glance
- Timer schedules (last run, next run)
- Dependency trees
- Journal queries with filters (time, priority, grep)
- Live log streaming
- Boot analysis
Action (Write)
- Start/stop/restart services
- Enable/disable boot behavior
- Reload configurations
- Daemon reload (after unit file changes)
Analysis (Optional AI Fallback)
- "Why did this fail?" synthesis
- Boot time breakdown
- Complex log analysis
Permission Model
Users are (rightfully) cautious about AI touching their systems. systemd-mcp provides granular control.
Permission Levels
| Level | Description | Default |
|---|---|---|
read |
Status, logs, timers, dependencies | ON |
restart |
Restart already-running services | OFF |
start_stop |
Start stopped / stop running services | OFF |
enable_disable |
Modify boot behavior | OFF |
daemon_reload |
Reload systemd manager | OFF |
Unit Filtering
{
"permissions": {
"read": true,
"restart": true,
"start_stop": false,
"enable_disable": false,
"daemon_reload": false,
"whitelist": [
"myapp-*.service",
"nginx.service",
"postgresql.service"
],
"blacklist": [
"sshd.service",
"firewalld.service",
"systemd-*.service",
"dbus.service"
]
}
}
Rules:
- Blacklist always wins (even if whitelisted)
- Empty whitelist = all units allowed (subject to blacklist)
- Patterns support
*wildcards - System-critical units blacklisted by default
Default Blacklist
These units are blocked by default (override with --bypass-permissions):
sshd.service # Don't lock yourself out
firewalld.service # Don't break the firewall
iptables.service # Don't break the firewall
systemd-*.service # Don't break systemd itself
dbus.service # Don't break D-Bus
polkit.service # Don't break permissions
Bypass Mode
For power users who know what they're doing:
# Trust me, I know what I'm doing
systemd-mcp --bypass-permissions
# Or in config
{
"bypass_permissions": true
}
With bypass enabled:
- All permission levels = true
- Whitelist/blacklist ignored
- Full systemd access
- You own the consequences
Environment Variable Override
# Enable specific permissions via env
SYSTEMD_MCP_ALLOW_RESTART=1
SYSTEMD_MCP_ALLOW_START_STOP=1
SYSTEMD_MCP_BYPASS=1
SSH Remote Host Support (v0.2.0)
Run systemd commands on a remote host via SSH instead of locally.
Configuration
# Via environment variable
SYSTEMD_MCP_SSH_HOST=vps-claude node build/index.js
# Via config file (~/.config/systemd-mcp/config.json)
{
"ssh": {
"enabled": true,
"host": "vps-claude"
}
}
Requirements
- SSH host must be accessible without password prompt (use SSH keys)
- SSH config alias (e.g.,
vps-claude) or fulluser@hostformat supported - Remote host must have systemd and journalctl
Claude Code Integration with SSH
# Monitor remote server
claude mcp add --transport stdio systemd-ssh -- \
bash -c "SYSTEMD_MCP_SSH_HOST=my-server node /path/to/build/index.js"
Multi-Instance Pattern (v0.3.0)
Run multiple instances to monitor both local and remote systems simultaneously.
Setup
# Local instance (default)
claude mcp add --transport stdio systemd -s user -- \
node /path/to/build/index.js
# Remote instance via SSH
claude mcp add --transport stdio systemd-ssh -s user -- \
bash -c "SYSTEMD_MCP_SSH_HOST=my-server node /path/to/build/index.js"
Result
Claude Code sees both as separate tool namespaces:
| MCP Name | Tools | Target |
|---|---|---|
systemd |
mcp__systemd__* |
Local machine |
systemd-ssh |
mcp__systemd-ssh__* |
Remote via SSH |
Query both in parallel:
"Check nginx status on both local and remote"
→ mcp__systemd__systemd_unit_status({ units: "nginx" })
→ mcp__systemd-ssh__systemd_unit_status({ units: "nginx" })
Same codebase, multiple targets, unified visibility.
Tools
Status & Discovery
systemd_list_units
List units with optional filtering.
systemd_list_units({
type?: "service" | "timer" | "socket" | "mount" | "target" | "all",
state?: "running" | "failed" | "inactive" | "activating" | "all",
pattern?: string // glob pattern, e.g. "nginx*"
})
systemd_unit_status
Detailed status of one or more units.
systemd_unit_status({
units: string | string[], // "nginx.service" or ["nginx", "postgres"]
logs?: number // Include N recent log lines (default: 10)
})
Returns:
{
"unit": "nginx.service",
"status": "running",
"status_icon": "✓",
"pid": 1234,
"memory": "45.2M",
"cpu": "0.1%",
"uptime": "5d 12h 30m",
"started_at": "2025-12-24T10:30:00Z",
"recent_logs": ["..."],
"summary": "nginx is healthy, running 5 days with stable memory"
}
systemd_failed_units
Quick view of what's broken.
systemd_failed_units()
Returns:
{
"failed_count": 1,
"units": [
{
"unit": "scout.service",
"failed_at": "2025-12-29T04:00:12Z",
"exit_code": 1,
"last_log": "API key not found"
}
],
"summary": "1 failed unit: scout.service (API key not found)"
}
systemd_timers
Timer status overview.
systemd_timers({
pattern?: string // filter by pattern
})
Returns:
{
"timers": [
{
"timer": "scout.timer",
"service": "scout.service",
"last_run": "2025-12-29T04:00:00Z",
"next_run": "2025-12-30T04:00:00Z",
"schedule": "*-*-* 04:00:00",
"last_result": "success"
}
]
}
systemd_dependencies
Show unit dependency tree.
systemd_dependencies({
unit: string,
direction?: "requires" | "wanted_by" | "both"
})
systemd_cat_unit
View unit file contents (v0.3.0).
systemd_cat_unit({
unit: string // e.g., "nginx" or "nginx.service"
})
Returns:
{
"unit": "nginx.service",
"content": "# /usr/lib/systemd/system/nginx.service\n[Unit]\nDescription=...",
"lines": 24
}
Resource Monitoring (v0.4.0)
systemd_unit_resources
Get current resource usage snapshot.
systemd_unit_resources({
unit: string
})
Returns memory, CPU time, tasks, network I/O, disk I/O with human-readable formatting.
systemd_sample_resources
Sample resource usage over time and calculate trends.
systemd_sample_resources({
unit: string,
samples?: number, // 2-10, default: 5
interval_ms?: number // 100-5000, default: 1000
})
Returns:
{
"unit": "nginx.service",
"sampling": { "samples": 5, "interval_ms": 1000, "duration_ms": 4000 },
"cpu": { "delta_ns": 12500000, "percent": 0.31 },
"memory": {
"min": 45678592, "max": 46123008, "avg": 45900800,
"stable": true
},
"io": { "read_rate_human": "1.2 KB/s", "write_rate_human": "0 B/s" },
"network": { "ingress_rate_human": "4.5 KB/s", "egress_rate_human": "2.1 KB/s" }
}
Journal/Logs
systemd_journal_query
Query journal with filters.
systemd_journal_query({
unit?: string | string[],
since?: string, // "-1h", "-30m", "2025-12-29", ISO timestamp
until?: string,
priority?: "emerg" | "alert" | "crit" | "err" | "warning" | "notice" | "info" | "debug",
grep?: string, // filter log content
limit?: number, // max lines (default: 100)
output?: "short" | "json" | "verbose"
})
systemd_journal_tail
Stream recent/live logs. Async streaming supported.
systemd_journal_tail({
unit: string,
lines?: number, // initial lines (default: 50)
follow?: boolean // live tail (default: false)
})
systemd_boot_log
Important events from current boot.
systemd_boot_log({
priority?: "err" | "warning" | "notice", // minimum priority
limit?: number
})
Actions
systemd_start
Start unit(s). Requires start_stop permission.
systemd_start({ units: string | string[] })
systemd_stop
Stop unit(s). Requires start_stop permission.
systemd_stop({ units: string | string[] })
systemd_restart
Restart unit(s). Requires restart permission.
systemd_restart({ units: string | string[] })
systemd_reload
Reload unit configuration (SIGHUP). Requires restart permission.
systemd_reload({ units: string | string[] })
systemd_enable
Enable unit for boot. Requires enable_disable permission.
systemd_enable({ units: string | string[], now?: boolean })
systemd_disable
Disable unit from boot. Requires enable_disable permission.
systemd_disable({ units: string | string[], now?: boolean })
systemd_daemon_reload
Reload systemd manager. Requires daemon_reload permission.
systemd_daemon_reload()
Analysis
systemd_analyze_boot
Boot time analysis.
systemd_analyze_boot({
blame?: boolean, // show time per unit
critical_chain?: boolean
})
systemd_diagnose
AI-powered failure diagnosis. Gathers context and optionally uses Haiku fallback.
systemd_diagnose({
unit: string,
use_ai?: boolean // use Haiku fallback for synthesis (default: true if configured)
})
Returns:
{
"unit": "scout.service",
"status": "failed",
"exit_code": 1,
"context": {
"logs": "[... recent logs ...]",
"dependencies": ["network-online.target"],
"environment": "No ANTHROPIC_API_KEY"
},
"synthesis": {
"analysis": "Service failed due to missing API key in environment...",
"suggested_fix": "Add Environment=ANTHROPIC_API_KEY=... to unit file",
"confidence": "high"
}
}
Health & Resilience
systemd_health
Get NEVERHANG v2.0 health status, circuit breaker state, and A.L.A.N. database stats.
systemd_health()
Returns:
{
"status": "healthy",
"circuit_breaker": {
"state": "closed",
"failures": 0,
"last_failure": null,
"opened_at": null
},
"health_monitor": {
"consecutive_failures": 0,
"last_check": "2025-12-30T10:15:00Z",
"degraded": false
},
"database": {
"path": "/home/user/.cache/systemd-mcp/systemd-mcp.db",
"command_history_count": 1247,
"health_check_count": 86,
"oldest_command": "2025-12-23T14:30:00Z"
},
"config": {
"ssh_enabled": false,
"adaptive_timeout": true,
"timeouts": {
"status": 5000,
"query": 10000,
"action": 30000,
"heavy": 60000,
"diagnostic": 90000
}
}
}
NEVERHANG v2.0 Architecture
Every systemd command can hang. systemctl status on a wedged service waits forever. journalctl -f never returns.
NEVERHANG v2.0 guarantees your MCP server stays responsive. No command hangs forever. System health is monitored. Failures are classified and handled intelligently.
Category-Based Timeouts
Commands are classified by expected duration:
| Category | Timeout | Examples |
|---|---|---|
status |
5s | systemctl status, systemctl is-active |
query |
10s | journalctl queries, systemctl list-units |
action |
30s | start, stop, restart, enable, disable |
heavy |
60s | Boot analysis, log streaming |
diagnostic |
90s | AI-powered diagnosis with log synthesis |
A.L.A.N. Database
As Long As Necessary — SQLite database for persistent state across restarts.
~/.cache/systemd-mcp/systemd-mcp.db
What it stores:
- Circuit breaker state — Survives restarts, tracks open/closed/half-open state
- Command history — 7 days of execution records (success, failure, latency)
- Health checks — 24 hours of background ping results
- P95 latency — Per-command performance metrics for adaptive timeout
Automatic cleanup: Old records pruned on startup (7d commands, 24h health checks).
Circuit Breaker
Protects against cascade failures when systemd is unresponsive.
| State | Behavior |
|---|---|
| Closed | Normal operation |
| Open | Commands blocked, returns immediately with CIRCUIT_OPEN |
| Half-Open | Testing recovery with limited requests |
Configuration:
- 5 failures in 60s → Circuit opens
- Open duration: 30s
- Recovery threshold: 2 successes to close
Persistence: State survives server restarts via A.L.A.N. database.
Health Monitor
Background thread monitors systemd health independently.
- Healthy: Check every 30s
- Degraded: Check every 5s (more aggressive)
- Ping command:
systemctl --version(minimal overhead) - SSH support: Uses SSH host when configured
Adaptive Timeout
When enabled, adjusts timeouts based on observed latency:
adjusted_timeout = max(base_timeout, P95_latency * 2)
Uses last 100 executions of each command category from A.L.A.N. database.
Failure Taxonomy
Every failure is classified for intelligent error handling:
| Type | Description |
|---|---|
timeout |
Command exceeded time limit |
connection_failed |
SSH connection failed (remote mode) |
auth_failed |
Permission denied |
circuit_open |
Circuit breaker is open |
command_error |
Non-zero exit code |
permission_denied |
Unit blacklisted or permission level insufficient |
cancelled |
Operation cancelled by client |
Process Management
- All subprocesses tracked with PIDs
- Hung processes killed after timeout
- Zombie cleanup on shutdown
- Graceful shutdown handlers (SIGINT, SIGTERM)
Configuration
{
"neverhang": {
"status_timeout_ms": 5000,
"query_timeout_ms": 10000,
"action_timeout_ms": 30000,
"heavy_timeout_ms": 60000,
"diagnostic_timeout_ms": 90000,
"circuit_failure_threshold": 5,
"circuit_failure_window_ms": 60000,
"circuit_open_duration_ms": 30000,
"circuit_recovery_threshold": 2,
"health_check_interval_ms": 30000,
"health_degraded_interval_ms": 5000,
"health_check_timeout_ms": 2000,
"adaptive_timeout": true
}
}
Why This Architecture?
MCP servers are single-threaded JSON-RPC handlers. When Claude calls systemctl status on a wedged service, the entire connection blocks. Claude waits. The user sees nothing. Eventually something times out at a higher layer and the interaction is ruined.
NEVERHANG v1 solved the immediate problem: timeouts. But it was stateless - every invocation started fresh with no memory of what happened before.
A.L.A.N. transforms reactive timeouts into operational intelligence.
Without persistence:
- Server restarts → circuit resets → retries broken systemd → fails again
- Every timeout is static, regardless of actual system behavior
- No visibility into patterns or trends
With A.L.A.N.:
- Circuit state survives restarts (we don't re-learn through failure)
- P95 latency per category enables adaptive timeouts
- Health trends reveal patterns invisible to stateless systems
- Success rates become diagnostic signals, not just individual outcomes
Emergent Behaviors
When circuit breaker + adaptive timeout + health monitoring + persistence combine:
Self-Healing with Memory
- Gradual recovery through half-open state testing
- Pattern recognition (recurring vs. one-off failures)
- Adaptive thresholds based on historical success rates
Intelligent Degradation
- Health monitor shifts 30s → 5s intervals when degraded
- Persists across restarts—server doesn't start naive
- Latency trends visible for root cause analysis
Operational Visibility for AI
Claude doesn't have intuition about "the system feels sluggish." Claude operates on data:
| Signal | What Claude Can Do |
|---|---|
| Circuit open | Don't retry, explain to user |
| P95 jumped 50ms → 2000ms | Something changed, investigate |
| Success rate dropped to 70% | Pattern, not fluke—dig deeper |
| Health trend degrading | Proactive warning before failure |
What "Fully Functioning" Looks Like
| Scenario | System Behavior |
|---|---|
| Normal | Commands execute, latency tracked, circuit closed |
| Transient failure | Recorded, circuit tracks but stays closed, next attempt proceeds |
| Systemic failure | Circuit opens → commands return CIRCUIT_OPEN immediately → health monitor increases frequency → auto-recovery when systemd responds |
| Degraded performance | Adaptive timeout adjusts, commands complete, health endpoint shows degradation |
| Post-restart | Reads state from A.L.A.N., doesn't start naive, degradation patterns preserved |
This is the difference between a tool and an intelligent subsystem. A.L.A.N. is the memory that makes NEVERHANG wise instead of just cautious.
Fallback AI
Optional Haiku integration for complex log analysis.
{
"fallback": {
"enabled": true,
"provider": "anthropic",
"model": "claude-haiku-4-5",
"api_key_env": "SYSTEMD_MCP_FALLBACK_KEY",
"max_context_lines": 200,
"max_tokens": 500
}
}
When used:
systemd_diagnosewithuse_ai: true- Complex failure analysis
- Boot time optimization suggestions
Not used for:
- Simple status queries
- Log retrieval
- Start/stop/restart actions
Configuration
Config File
~/.config/systemd-mcp/config.json or specified via --config:
{
"permissions": {
"read": true,
"restart": false,
"start_stop": false,
"enable_disable": false,
"daemon_reload": false,
"whitelist": [],
"blacklist": [
"sshd.service",
"firewalld.service",
"systemd-*.service"
]
},
"neverhang": {
"status_timeout_ms": 5000,
"query_timeout_ms": 10000,
"action_timeout_ms": 30000,
"heavy_timeout_ms": 60000,
"diagnostic_timeout_ms": 90000,
"circuit_failure_threshold": 5,
"circuit_failure_window_ms": 60000,
"circuit_open_duration_ms": 30000,
"circuit_recovery_threshold": 2,
"health_check_interval_ms": 30000,
"health_degraded_interval_ms": 5000,
"health_check_timeout_ms": 2000,
"adaptive_timeout": true
},
"fallback": {
"enabled": false
}
}
Claude Code Integration
# Clone and build
git clone https://github.com/ArkTechNWA/systemd-mcp.git
cd systemd-mcp
npm install && npm run build
# Register with Claude Code (read-only by default)
claude mcp add --transport stdio systemd -- node $(pwd)/build/index.js
# Or with permissions enabled
claude mcp add --transport stdio systemd -- \
bash -c "SYSTEMD_MCP_ALLOW_RESTART=1 node $(pwd)/build/index.js"
# Or full bypass (you own the consequences)
claude mcp add --transport stdio systemd -- \
bash -c "SYSTEMD_MCP_BYPASS=1 node $(pwd)/build/index.js"
Installation
# npm (when published)
npm install -g @arktechnwa/systemd-mcp
# From source
git clone https://github.com/ArktechNWA/systemd-mcp.git
cd systemd-mcp
npm install
npm link
Requirements
- Linux with systemd
- Node.js 18+
- systemctl, journalctl in PATH
- Optional: Anthropic API key for fallback AI
Examples
Read-only monitoring (default)
systemd-mcp
# Can: list units, check status, query logs
# Cannot: start, stop, restart, enable, disable
Service operator
systemd-mcp --config operator.json
# operator.json enables restart + start_stop
# Can manage services but not boot behavior
Full access
systemd-mcp --bypass-permissions
# Full systemd control
# You own the consequences
Security Considerations
- Default safe — Read-only by default
- Blacklist critical — sshd, firewall, systemd protected by default
- No credential exposure — Environment variables not leaked in logs
- Audit trail — All actions logged
- User responsibility — Bypass mode exists but user must enable it
Contributing
Contributions welcome! Please read CONTRIBUTING.md (coming soon).
License
MIT License - See LICENSE file.
Credits
Created by Claude in collaboration with MOD.
Part of the ArktechNWA MCP Toolshed — Claude's public-facing open source contributions.
Built because AI assistants deserve to see and understand the systems they help maintain.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。