K8s Doctor MCP

K8s Doctor MCP

AI-powered Kubernetes diagnostics that analyzes pod crashes, logs, and cluster health to provide root cause analysis and actionable solutions for common issues like CrashLoopBackOff, OOM kills, and connection errors.

Category
访问服务器

README

🏥 K8s Doctor MCP

AI-powered Kubernetes cluster diagnostics and intelligent debugging recommendations

npm version npm downloads License Node Kubernetes

English | 한국어

Demo

<!-- Add your demo GIF here --> K8s Doctor Demo

Why K8s Doctor?

When a Kubernetes issue strikes, developers typically run through an endless loop of:

  • kubectl get pods
  • kubectl logs
  • kubectl describe
  • Frantically searching StackOverflow...

K8s Doctor changes the game. It's not just a kubectl wrapper - it's an AI-powered diagnostic tool that:

  • 🔍 Analyzes root causes - Goes beyond simple status checks
  • 🧠 Detects error patterns - Recognizes common issues (Connection Refused, OOM, DNS failures)
  • 💡 Provides actionable solutions - Gives you exact kubectl commands to fix problems
  • 📊 Exit code analysis - Explains what exit 137, 143, 1 actually mean
  • 🎯 Log pattern matching - Finds the signal in thousands of log lines
  • 🏥 Health scoring - Rates your pod/cluster health 0-100

Features

Tool Description
diagnose-pod Comprehensive pod diagnostics - analyzes status, events, resources, and provides health score
debug-crashloop CrashLoopBackOff specialist - decodes exit codes, analyzes logs, finds root cause
analyze-logs Smart log analysis - detects error patterns, suggests fixes for common issues
check-resources Resource usage - validates CPU/Memory limits, warns about OOM risks
full-diagnosis Cluster health check - scans all nodes and pods for issues
check-events Event analysis - filters and analyzes Warning events
list-namespaces Namespace listing - quick overview of all namespaces
list-pods Pod listing - shows problematic pods with status indicators

Installation

Via npm (recommended)

npm install -g @zerry_jin/k8s-doctor-mcp

From source

git clone https://github.com/ongjin/k8s-doctor-mcp.git
cd k8s-doctor-mcp
npm install && npm run build

Setup with Claude Code

# After npm global install
claude mcp add --scope project k8s-doctor -- k8s-doctor-mcp

# Or from source build
claude mcp add --scope project k8s-doctor -- node /path/to/k8s-doctor-mcp/dist/index.js

Quick Setup (Auto-approve Tools)

Tired of manually approving tool execution every time? Follow these steps to enable auto-approval.

🖥️ For Claude Desktop App Users

  1. Restart the Claude Desktop App.
  2. Ask your first question using k8s-doctor.
  3. When the permission dialog appears, check the box "Always allow requests from this server" and click Allow. (Future requests will execute automatically without prompts.)

⌨️ For Claude Code (CLI) Users

If you are using the claude terminal command, manage permissions via the interactive menu:

  1. Run claude in your terminal.
  2. Type /permissions in the prompt and press Enter.
  3. Select Global Permissions (or Project Permissions) > Allowed Tools.
  4. Enter mcp__k8s-doctor__* to allow all tools, or add specific tools individually.

💡 Tip: For most use cases, allowing diagnose-pod, debug-crashloop, and analyze-logs is sufficient. These three cover 90% of debugging scenarios.

Recommended configuration:

# Balanced approach - allow main diagnostic tools
claude config add allowedTools \
  "mcp__k8s-doctor__diagnose-pod" \
  "mcp__k8s-doctor__debug-crashloop" \
  "mcp__k8s-doctor__analyze-logs" \
  "mcp__k8s-doctor__full-diagnosis"

Prerequisites

  • kubectl configured and working (kubectl cluster-info should succeed)
  • kubeconfig file in default location (~/.kube/config) or KUBECONFIG env var set
  • Node.js 18 or higher
  • Access to a Kubernetes cluster (local like minikube/kind, or remote)

Usage Examples

Example 1: Diagnose a CrashLooping Pod

You: "My pod 'api-server' in namespace 'production' is CrashLooping. What's wrong?"

Claude (using k8s-doctor):
🔍 CrashLoopBackOff 진단

Exit Code: 137 (OOM Killed)
Root Cause: Container was killed due to Out Of Memory

Solution:
Increase memory limit:
```yaml
resources:
  limits:
    memory: "512Mi"  # Increase from current value

Relevant logs:

  • Line 1234: Error: JavaScript heap out of memory
  • Line 1256: FATAL ERROR: Reached heap limit

### Example 2: Analyze Application Logs

You: "Analyze logs for pod 'backend-worker' and tell me what's failing"

Claude (using analyze-logs): 📝 Log Analysis

Detected Error Patterns:

🔴 Database Connection Error (15 occurrences) Possible Causes:

  • DB service not ready
  • Wrong connection string
  • Authentication failed

Solutions:

  • Check DB pod status
  • Verify environment variables (ConfigMap/Secret)
  • Check service endpoints: kubectl get endpoints

🟡 Timeout (8 occurrences) Likely cause: Response time too slow or network delay Solution: Increase timeout values or optimize service performance


### Example 3: Cluster Health Check

You: "Check overall cluster health"

Claude (using full-diagnosis): 🏥 Cluster Health Diagnosis

Overall Score: 72/100 💛

Nodes: 3/3 Ready ✅ Pods: 45/52 Running

  • CrashLoop: 2 🔥
  • Pending: 5 ⏳

Critical Issues: 🔴 Pod "payment-service" CrashLooping (exit 1) 🔴 Pod "worker-3" OOM Killed

Recommendations:

  • Fix 2 CrashLoop pods immediately
  • Check if pending pods lack resources

## How It Works

1. **Connects to your cluster** via kubeconfig (same as kubectl)
2. **Gathers comprehensive data** - pod status, events, logs, resource usage
3. **Applies pattern matching** - recognizes common error patterns from production experience
4. **Analyzes root causes** - doesn't just show status, explains WHY it's failing
5. **Provides solutions** - gives exact commands and YAML to fix issues

## Error Patterns Detected

K8s Doctor recognizes these common patterns:

- 🔴 **Connection Refused** - Service not ready, wrong port, network policy
- 🔴 **Database Connection Errors** - DB auth, wrong connection strings
- 🔴 **Out of Memory** - OOM kills, memory leaks, undersized limits
- 🟠 **File Not Found** - ConfigMap not mounted, wrong paths
- 🟠 **Permission Denied** - SecurityContext issues, fsGroup problems
- 🟠 **DNS Resolution Failed** - CoreDNS issues, wrong service names
- 🟡 **Port Already in Use** - Multiple processes on same port
- 🟡 **Timeout** - Slow responses, network delays
- 🟡 **SSL/TLS Errors** - Expired certs, missing CA bundles

## Architecture

k8s-doctor-mcp/ ├── src/ │ ├── index.ts # MCP server with all tools │ ├── types.ts # TypeScript type definitions │ ├── diagnostics/ │ │ ├── pod-diagnostics.ts # Pod health analysis │ │ └── cluster-health.ts # Cluster-wide diagnostics │ ├── analyzers/ │ │ └── log-analyzer.ts # Smart log pattern matching │ └── utils/ │ ├── k8s-client.ts # Kubernetes API client │ └── formatters.ts # Output formatting utilities └── package.json


## Security Considerations

- K8s Doctor uses **read-only** Kubernetes API calls (list, get, describe)
- Requires same permissions as `kubectl get/describe/logs`
- Never modifies cluster state
- kubeconfig credentials stay local
- No data sent to external servers

## Troubleshooting

### "kubeconfig not found"
```bash
# Verify kubectl works
kubectl cluster-info

# Check kubeconfig location
echo $KUBECONFIG

# Test with explicit path
export KUBECONFIG=~/.kube/config

"Permission denied"

# Check your cluster permissions
kubectl auth can-i get pods --all-namespaces

# You need at least read access to:
# - pods, events, namespaces, nodes

"Connection refused to cluster"

# Verify cluster connectivity
kubectl get nodes

# For local clusters (minikube/kind)
minikube status
kind get clusters

Development

# Clone and install
git clone https://github.com/ongjin/k8s-doctor-mcp.git
cd k8s-doctor-mcp
npm install

# Development mode
npm run dev

# Build
npm run build

# Test with Claude Code
npm run build
claude mcp add --scope project k8s-doctor-dev -- node $(pwd)/dist/index.js

Contributing

Contributions welcome! Especially:

  • 🆕 New error pattern detections
  • 🌍 Internationalization (more languages)
  • 📊 Metrics integration (Prometheus, etc.)
  • 🧪 Test coverage
  • 📖 Documentation improvements

Roadmap

  • [ ] Metrics Server integration (real-time CPU/Memory usage)
  • [ ] Network policy diagnostics
  • [ ] Storage/PVC troubleshooting
  • [ ] Helm chart analysis
  • [ ] Multi-cluster support
  • [ ] Interactive debugging mode
  • [ ] Export reports (PDF, HTML)

License

MIT © zerry

Acknowledgments

Built with:

Star History

If this tool saves you debugging time, please ⭐ star the repo!

Author

zerry

  • GitHub: @zerry
  • Created for the DevOps community who are tired of kubectl hell 😅

Made with ❤️ for Kubernetes users drowning in logs

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选