MCP 服务器

datadog

Okay, here's how you can access monitor and cluster logs from Datadog, broken down into steps and considerations: **1. Accessing Monitor Logs:** * **From the Monitor Page:** 1. **Navigate to Monitors:** In the Datadog UI, go to "Monitors" -> "Manage Monitors". 2. **Find the Monitor:** Locate the specific monitor you're interested in. You can use the search bar, filters (e.g., by tag, name, status), or browse the list. 3. **Monitor Status and Events:** Click on the monitor's name. This will take you to the monitor's details page. Here you'll see: * **Monitor Status:** The current status of the monitor (OK, Alert, Warning, No Data). * **Events Timeline:** A timeline of events related to the monitor. This is where you'll find when the monitor triggered, when it recovered, and any associated messages. * **Event Details:** Click on a specific event in the timeline to see more details. This often includes: * The time the event occurred. * The message associated with the event (which might contain information about the cause of the alert). * Links to related logs, metrics, or traces (if configured). This is a crucial part for troubleshooting. * **Using the Event Explorer:** 1. **Navigate to Event Explorer:** In the Datadog UI, go to "Events" -> "Explorer". 2. **Filter by Monitor:** Use the search bar or filters to narrow down the events to those related to your specific monitor. You can filter by: * `monitor:<monitor_name>` (replace `<monitor_name>` with the name of your monitor) * `monitor_id:<monitor_id>` (replace `<monitor_id>` with the ID of your monitor) You can find the monitor ID on the monitor's details page. * `status:<alert|warning|ok|no data>` to filter by the status of the monitor. 3. **Analyze Events:** The Event Explorer allows you to see a stream of events related to your monitor. You can: * Sort events by time. * View the event message. * Click on an event to see more details. * Use facets on the left-hand side to further refine your search. * **Linking Monitors to Logs (Important for Effective Troubleshooting):** * **Use Tags:** The most effective way to link monitors to logs is to use consistent tagging. When you create your monitor, add tags that are also present in your logs. For example, if your monitor is for a specific service, tag both the monitor and the logs from that service with `service:my-service`. * **Use Log Patterns in Monitor Messages:** If your monitor message includes specific patterns that appear in your logs (e.g., an error code, a transaction ID), you can use these patterns to search for related logs. * **Use the `{{log.id}}` variable in monitor messages:** If you are creating monitors based on log patterns, you can include the `{{log.id}}` variable in the monitor message. This will include the unique ID of the log message that triggered the monitor, making it very easy to find the exact log in the Log Explorer. **2. Accessing Cluster Logs (Kubernetes, ECS, etc.):** * **Ensure Log Collection is Configured:** The first and most important step is to make sure you've properly configured Datadog to collect logs from your cluster. This typically involves: * **Installing the Datadog Agent:** The Datadog Agent needs to be running on your cluster nodes (or as a DaemonSet in Kubernetes). * **Configuring Log Collection:** You need to tell the Datadog Agent where to find the logs. This usually involves configuring the Agent to monitor specific log files or to collect logs from standard output/standard error of your containers. Datadog provides specific integrations for Kubernetes, ECS, and other container orchestration platforms. Follow the official Datadog documentation for your specific platform. * **Using the Datadog Operator for Kubernetes (Recommended):** For Kubernetes, the Datadog Operator simplifies the deployment and management of the Datadog Agent and related resources. It can automatically configure log collection based on your Kubernetes resources. * **Using the Log Explorer:** 1. **Navigate to Log Explorer:** In the Datadog UI, go to "Logs" -> "Explorer". 2. **Filter by Cluster:** Use the search bar or facets to filter the logs to those from your cluster. Common filters include: * `kubernetes.cluster.name:<cluster_name>` (for Kubernetes) * `ecs.cluster.name:<cluster_name>` (for ECS) * `host:<hostname>` (to filter by specific nodes in the cluster) * `source:<source_name>` (if you've configured a specific source for your cluster logs) * `service:<service_name>` (to filter by specific services running in the cluster) * `container_name:<container_name>` (to filter by specific containers) 3. **Analyze Logs:** The Log Explorer provides powerful tools for analyzing your cluster logs: * **Search:** Use the search bar to find specific keywords, error messages, or patterns. * **Facets:** Use the facets on the left-hand side to filter and group your logs. * **Time Series:** Create time series graphs based on log data (e.g., count the number of error logs over time). * **Live Tail:** View a live stream of logs as they are generated. * **Log Patterns:** Identify common log patterns to help you understand the behavior of your applications. * **Using Dashboards:** 1. **Create or Edit a Dashboard:** In the Datadog UI, go to "Dashboards" -> "New Dashboard" or edit an existing dashboard. 2. **Add Log Widgets:** Add widgets to your dashboard that display log data. You can use: * **Log Stream Widget:** Displays a stream of logs based on your query. * **Log Count Widget:** Displays the number of logs that match your query over a specific time period. * **Top List Widget:** Displays the top values for a specific attribute in your logs (e.g., the top error messages). 3. **Configure the Widget:** Configure the widget to filter the logs to those from your cluster and to display the information you're interested in. **Example: Kubernetes Log Access** Let's say you want to see the logs from a specific pod in your Kubernetes cluster. 1. **Ensure the Datadog Agent is running as a DaemonSet in your Kubernetes cluster.** This is the recommended way to collect logs. 2. **Verify that the Datadog Agent is configured to collect logs from your containers.** The Datadog Operator can automate this. 3. **In the Log Explorer, use the following filters:** * `kubernetes.cluster.name:<your_cluster_name>` * `kubernetes.pod.name:<your_pod_name>` * `kubernetes.namespace.name:<your_namespace>` **Important Considerations:** * **Log Volume:** Collecting logs from a large cluster can generate a significant amount of data. Consider using log filtering and sampling to reduce the volume of logs you're collecting. * **Security:** Be careful about what information you're logging. Avoid logging sensitive data such as passwords or API keys. Use log masking or redaction to protect sensitive information. * **Retention:** Datadog has log retention policies. Make sure you understand the retention policies and that you're retaining logs for as long as you need them. * **Cost:** Datadog's pricing is based on log volume. Be aware of the cost implications of collecting logs from your cluster. * **Structured Logging:** Using structured logging (e.g., JSON) makes it much easier to query and analyze your logs in Datadog. Encourage your developers to use structured logging in their applications. **In summary, accessing monitor and cluster logs in Datadog requires proper configuration of the Datadog Agent, understanding of the Log Explorer and Event Explorer, and the use of appropriate filters and queries. Linking monitors to logs through tagging and log patterns is crucial for effective troubleshooting.** I've provided a comprehensive guide. Let me know if you have any specific questions or scenarios you'd like me to elaborate on.

README

Datadog 模型上下文协议 (MCP) 🔍

一个基于 Python 的工具，用于与 Datadog API 交互并从您的基础设施中获取监控数据。此 MCP 提供通过简单界面轻松访问监控状态和 Kubernetes 日志的功能。

Datadog 功能 🌟

监控状态跟踪: 获取和分析特定监控的状态
Kubernetes 日志分析: 从 Kubernetes 集群中提取和格式化错误日志

前提条件 📋

Python 3.11+
Datadog API 和应用程序密钥（具有正确的权限）
访问 Datadog 站点

安装 🔧

通过 Smithery 安装

要通过 Smithery 自动为 Claude Desktop 安装 Datadog：

npx -y @smithery/cli install @didlawowo/mcp-collection --client claude

所需软件包：

datadog-api-client
fastmcp
loguru
icecream
python-dotenv
uv

环境设置 🔑

创建一个包含您的 Datadog 凭据的 .env 文件：

DD_API_KEY=your_api_key
DD_APP_KEY=your_app_key

设置 Claude Desktop 以使用 MCP 🖥️

安装 Claude Desktop

# 假设您使用的是 macOS
brew install claude-desktop

# 或者从官方网站下载
https://claude.ai/desktop

设置 Datadog MCP 配置：

# 在 mac 上是
~/Library/Application\ Support/Claude/claude_desktop_config.json


# 将此添加到您的 claude config json
```json
    "Datadog-MCP-Server": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "datadog-api-client",
        "--with",
        "fastmcp",
        "--with",
        "icecream",
        "--with",
        "loguru",
        "--with",
        "python-dotenv",
        "fastmcp",
        "run",
        "/your-path/mcp-collection/datadog/main.py"
      ],
      "env": {
        "DD_API_KEY": "xxxx",
        "DD_APP_KEY": "xxx"
      }
    },

用法 💻

获取日志

获取监控

架构 🏗

FastMCP Base: 利用 FastMCP 框架进行工具管理
模块化设计: 用于监控和日志的独立功能
类型安全: 完整的 Python 类型提示支持
API 抽象: 封装了 Datadog API 调用并进行了错误处理

我将添加一个关于 MCP 和 Claude Desktop 设置的部分：

模型上下文协议 (MCP) 简介 🤖

什么是 MCP？

模型上下文协议 (MCP) 是一个框架，允许 AI 模型以标准化方式与外部工具和 API 交互。它使像 Claude 这样的模型能够：

访问外部数据
执行命令
与 API 交互
在对话中保持上下文

一些 MCP 服务器的例子

https://github.com/punkpeye/awesome-mcp-servers?tab=readme-ov-file

MCP 设置教程

https://medium.com/@pedro.aquino.se/how-to-use-mcp-tools-on-claude-desktop-app-and-automate-your-daily-tasks-1c38e22bc4b0

工作原理 - 可用功能 🛠️

LLM 使用提供的函数来获取数据并使用它

1. 获取监控状态

get_monitor_states(
    name: str,           # 要搜索的监控名称
    timeframe: int = 1   # 回溯的小时数
)

例子：


response = get_monitor_states(name="traefik")

# 示例输出
{
    "id": "12345678",
    "name": "traefik",
    "status": "OK",
    "query": "avg(last_5m):avg:traefik.response_time{*} > 1000",
    "message": "响应时间过高",
    "type": "metric alert",
    "created": "2024-01-14T10:00:00Z",
    "modified": "2024-01-14T15:30:00Z"
}

2. 获取 Kubernetes 日志

get_k8s_logs(
    cluster: str,            # Kubernetes 集群名称
    timeframe: int = 5,      # 回溯的小时数
    namespace: str = None    # 可选的命名空间过滤器
)

例子：

logs = get_k8s_logs(
    cluster="prod-cluster",
    timeframe=3,
    namespace="default"
)

# 示例输出
{
    "timestamp": "2024-01-14T22:00:00Z",
    "host": "worker-1",
    "service": "nginx-ingress",
    "pod_name": "nginx-ingress-controller-abc123",
    "namespace": "default",
    "container_name": "controller",
    "message": "连接被拒绝",
    "status": "error"
}

# 作为 MCP 扩展安装
cd datadog
task install-mcp

4. 验证安装

在 Claude chat desktop 中

检查 claude 中的 datadog 连接

设置 claude

5. 使用 Datadog MCP 工具

安全注意事项 🔒

将 API 密钥存储在 .env 中
MCP 在隔离环境中运行
每个工具都有定义的权限
实施了速率限制

故障排除 🔧

使用 MCP Inspector

# 启动 MCP Inspector 进行调试
task run-mcp-inspector

MCP Inspector 提供：

MCP 服务器状态的实时视图
函数调用日志
错误跟踪
API 响应监控

常见问题和解决方案

API 身份验证错误
```
Error: (403) Forbidden
```
➡️ 检查您的 .env 文件中的 DD_API_KEY 和 DD_APP_KEY
MCP 连接问题
```
Error: Failed to connect to MCP server
```
➡️ 验证您的 claude_desktop_config.json 路径和内容
未找到监控
```
Error: No monitor found with name 'xxx'
```
➡️ 检查监控名称的拼写和大小写
日志可以在这里找到

alt text

贡献 🤝

请随意：

提交错误问题
提交改进的 PR
添加新功能

备注 📝

API 调用会发送到 Datadog EU 站点
监控状态的默认时间范围为 1 小时
页面大小限制已设置为处理大多数用例