datadog
Okay, here's how you can access monitor and cluster logs from Datadog, broken down into steps and considerations: **1. Accessing Monitor Logs:** * **From the Monitor Page:** 1. **Navigate to Monitors:** In the Datadog UI, go to "Monitors" -> "Manage Monitors". 2. **Find the Monitor:** Locate the specific monitor you're interested in. You can use the search bar, filters (e.g., by tag, name, status), or browse the list. 3. **Monitor Status and Events:** Click on the monitor's name. This will take you to the monitor's details page. Here you'll see: * **Monitor Status:** The current status of the monitor (OK, Alert, Warning, No Data). * **Events Timeline:** A timeline of events related to the monitor. This is where you'll find when the monitor triggered, when it recovered, and any associated messages. * **Event Details:** Click on a specific event in the timeline to see more details. This often includes: * The time the event occurred. * The message associated with the event (which might contain information about the cause of the alert). * Links to related logs, metrics, or traces (if configured). This is a crucial part for troubleshooting. * **Using the Event Explorer:** 1. **Navigate to Event Explorer:** In the Datadog UI, go to "Events" -> "Explorer". 2. **Filter by Monitor:** Use the search bar or filters to narrow down the events to those related to your specific monitor. You can filter by: * `monitor:<monitor_name>` (replace `<monitor_name>` with the name of your monitor) * `monitor_id:<monitor_id>` (replace `<monitor_id>` with the ID of your monitor) You can find the monitor ID on the monitor's details page. * `status:<alert|warning|ok|no data>` to filter by the status of the monitor. 3. **Analyze Events:** The Event Explorer allows you to see a stream of events related to your monitor. You can: * Sort events by time. * View the event message. * Click on an event to see more details. * Use facets on the left-hand side to further refine your search. * **Linking Monitors to Logs (Important for Effective Troubleshooting):** * **Use Tags:** The most effective way to link monitors to logs is to use consistent tagging. When you create your monitor, add tags that are also present in your logs. For example, if your monitor is for a specific service, tag both the monitor and the logs from that service with `service:my-service`. * **Use Log Patterns in Monitor Messages:** If your monitor message includes specific patterns that appear in your logs (e.g., an error code, a transaction ID), you can use these patterns to search for related logs. * **Use the `{{log.id}}` variable in monitor messages:** If you are creating monitors based on log patterns, you can include the `{{log.id}}` variable in the monitor message. This will include the unique ID of the log message that triggered the monitor, making it very easy to find the exact log in the Log Explorer. **2. Accessing Cluster Logs (Kubernetes, ECS, etc.):** * **Ensure Log Collection is Configured:** The first and most important step is to make sure you've properly configured Datadog to collect logs from your cluster. This typically involves: * **Installing the Datadog Agent:** The Datadog Agent needs to be running on your cluster nodes (or as a DaemonSet in Kubernetes). * **Configuring Log Collection:** You need to tell the Datadog Agent where to find the logs. This usually involves configuring the Agent to monitor specific log files or to collect logs from standard output/standard error of your containers. Datadog provides specific integrations for Kubernetes, ECS, and other container orchestration platforms. Follow the official Datadog documentation for your specific platform. * **Using the Datadog Operator for Kubernetes (Recommended):** For Kubernetes, the Datadog Operator simplifies the deployment and management of the Datadog Agent and related resources. It can automatically configure log collection based on your Kubernetes resources. * **Using the Log Explorer:** 1. **Navigate to Log Explorer:** In the Datadog UI, go to "Logs" -> "Explorer". 2. **Filter by Cluster:** Use the search bar or facets to filter the logs to those from your cluster. Common filters include: * `kubernetes.cluster.name:<cluster_name>` (for Kubernetes) * `ecs.cluster.name:<cluster_name>` (for ECS) * `host:<hostname>` (to filter by specific nodes in the cluster) * `source:<source_name>` (if you've configured a specific source for your cluster logs) * `service:<service_name>` (to filter by specific services running in the cluster) * `container_name:<container_name>` (to filter by specific containers) 3. **Analyze Logs:** The Log Explorer provides powerful tools for analyzing your cluster logs: * **Search:** Use the search bar to find specific keywords, error messages, or patterns. * **Facets:** Use the facets on the left-hand side to filter and group your logs. * **Time Series:** Create time series graphs based on log data (e.g., count the number of error logs over time). * **Live Tail:** View a live stream of logs as they are generated. * **Log Patterns:** Identify common log patterns to help you understand the behavior of your applications. * **Using Dashboards:** 1. **Create or Edit a Dashboard:** In the Datadog UI, go to "Dashboards" -> "New Dashboard" or edit an existing dashboard. 2. **Add Log Widgets:** Add widgets to your dashboard that display log data. You can use: * **Log Stream Widget:** Displays a stream of logs based on your query. * **Log Count Widget:** Displays the number of logs that match your query over a specific time period. * **Top List Widget:** Displays the top values for a specific attribute in your logs (e.g., the top error messages). 3. **Configure the Widget:** Configure the widget to filter the logs to those from your cluster and to display the information you're interested in. **Example: Kubernetes Log Access** Let's say you want to see the logs from a specific pod in your Kubernetes cluster. 1. **Ensure the Datadog Agent is running as a DaemonSet in your Kubernetes cluster.** This is the recommended way to collect logs. 2. **Verify that the Datadog Agent is configured to collect logs from your containers.** The Datadog Operator can automate this. 3. **In the Log Explorer, use the following filters:** * `kubernetes.cluster.name:<your_cluster_name>` * `kubernetes.pod.name:<your_pod_name>` * `kubernetes.namespace.name:<your_namespace>` **Important Considerations:** * **Log Volume:** Collecting logs from a large cluster can generate a significant amount of data. Consider using log filtering and sampling to reduce the volume of logs you're collecting. * **Security:** Be careful about what information you're logging. Avoid logging sensitive data such as passwords or API keys. Use log masking or redaction to protect sensitive information. * **Retention:** Datadog has log retention policies. Make sure you understand the retention policies and that you're retaining logs for as long as you need them. * **Cost:** Datadog's pricing is based on log volume. Be aware of the cost implications of collecting logs from your cluster. * **Structured Logging:** Using structured logging (e.g., JSON) makes it much easier to query and analyze your logs in Datadog. Encourage your developers to use structured logging in their applications. **In summary, accessing monitor and cluster logs in Datadog requires proper configuration of the Datadog Agent, understanding of the Log Explorer and Event Explorer, and the use of appropriate filters and queries. Linking monitors to logs through tagging and log patterns is crucial for effective troubleshooting.** I've provided a comprehensive guide. Let me know if you have any specific questions or scenarios you'd like me to elaborate on.
README
Datadog 模型上下文协议 (MCP) 🔍
一个基于 Python 的工具,用于与 Datadog API 交互并从您的基础设施中获取监控数据。此 MCP 提供通过简单界面轻松访问监控状态和 Kubernetes 日志的功能。
Datadog 功能 🌟
- 监控状态跟踪: 获取和分析特定监控的状态
- Kubernetes 日志分析: 从 Kubernetes 集群中提取和格式化错误日志
前提条件 📋
- Python 3.11+
- Datadog API 和应用程序密钥(具有正确的权限)
- 访问 Datadog 站点
安装 🔧
通过 Smithery 安装
要通过 Smithery 自动为 Claude Desktop 安装 Datadog:
npx -y @smithery/cli install @didlawowo/mcp-collection --client claude
所需软件包:
datadog-api-client
fastmcp
loguru
icecream
python-dotenv
uv
环境设置 🔑
创建一个包含您的 Datadog 凭据的 .env 文件:
DD_API_KEY=your_api_key
DD_APP_KEY=your_app_key
设置 Claude Desktop 以使用 MCP 🖥️
- 安装 Claude Desktop
# 假设您使用的是 macOS
brew install claude-desktop
# 或者从官方网站下载
https://claude.ai/desktop
- 设置 Datadog MCP 配置:
# 在 mac 上是
~/Library/Application\ Support/Claude/claude_desktop_config.json
# 将此添加到您的 claude config json
```json
"Datadog-MCP-Server": {
"command": "uv",
"args": [
"run",
"--with",
"datadog-api-client",
"--with",
"fastmcp",
"--with",
"icecream",
"--with",
"loguru",
"--with",
"python-dotenv",
"fastmcp",
"run",
"/your-path/mcp-collection/datadog/main.py"
],
"env": {
"DD_API_KEY": "xxxx",
"DD_APP_KEY": "xxx"
}
},
用法 💻


架构 🏗
- FastMCP Base: 利用 FastMCP 框架进行工具管理
- 模块化设计: 用于监控和日志的独立功能
- 类型安全: 完整的 Python 类型提示支持
- API 抽象: 封装了 Datadog API 调用并进行了错误处理
我将添加一个关于 MCP 和 Claude Desktop 设置的部分:
模型上下文协议 (MCP) 简介 🤖
什么是 MCP?
模型上下文协议 (MCP) 是一个框架,允许 AI 模型以标准化方式与外部工具和 API 交互。它使像 Claude 这样的模型能够:
- 访问外部数据
- 执行命令
- 与 API 交互
- 在对话中保持上下文
一些 MCP 服务器的例子
https://github.com/punkpeye/awesome-mcp-servers?tab=readme-ov-file
MCP 设置教程
工作原理 - 可用功能 🛠️
LLM 使用提供的函数来获取数据并使用它
1. 获取监控状态
get_monitor_states(
name: str, # 要搜索的监控名称
timeframe: int = 1 # 回溯的小时数
)
例子:
response = get_monitor_states(name="traefik")
# 示例输出
{
"id": "12345678",
"name": "traefik",
"status": "OK",
"query": "avg(last_5m):avg:traefik.response_time{*} > 1000",
"message": "响应时间过高",
"type": "metric alert",
"created": "2024-01-14T10:00:00Z",
"modified": "2024-01-14T15:30:00Z"
}
2. 获取 Kubernetes 日志
get_k8s_logs(
cluster: str, # Kubernetes 集群名称
timeframe: int = 5, # 回溯的小时数
namespace: str = None # 可选的命名空间过滤器
)
例子:
logs = get_k8s_logs(
cluster="prod-cluster",
timeframe=3,
namespace="default"
)
# 示例输出
{
"timestamp": "2024-01-14T22:00:00Z",
"host": "worker-1",
"service": "nginx-ingress",
"pod_name": "nginx-ingress-controller-abc123",
"namespace": "default",
"container_name": "controller",
"message": "连接被拒绝",
"status": "error"
}
# 作为 MCP 扩展安装
cd datadog
task install-mcp
4. 验证安装
在 Claude chat desktop 中
检查 claude 中的 datadog 连接

5. 使用 Datadog MCP 工具
安全注意事项 🔒
- 将 API 密钥存储在
.env中 - MCP 在隔离环境中运行
- 每个工具都有定义的权限
- 实施了速率限制
故障排除 🔧
使用 MCP Inspector
# 启动 MCP Inspector 进行调试
task run-mcp-inspector
MCP Inspector 提供:
- MCP 服务器状态的实时视图
- 函数调用日志
- 错误跟踪
- API 响应监控
常见问题和解决方案
-
API 身份验证错误
Error: (403) Forbidden➡️ 检查您的 .env 文件中的 DD_API_KEY 和 DD_APP_KEY
-
MCP 连接问题
Error: Failed to connect to MCP server➡️ 验证您的 claude_desktop_config.json 路径和内容
-
未找到监控
Error: No monitor found with name 'xxx'➡️ 检查监控名称的拼写和大小写
-
日志可以在这里找到

贡献 🤝
请随意:
- 提交错误问题
- 提交改进的 PR
- 添加新功能
备注 📝
- API 调用会发送到 Datadog EU 站点
- 监控状态的默认时间范围为 1 小时
- 页面大小限制已设置为处理大多数用例
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。