Data Product Hub

Data Product Hub

A production-ready MCP server that provides comprehensive dbt project quality assessment for any GitHub repository, enabling AI agents to analyze dbt models, check metadata coverage, and map data lineage.

Category
访问服务器

README

Data Product Hub

NOTE: This project is still under construction and in a state of flux. It is being tested internally so setup instructions below may not work as intended.

Universal MCP Server for dbt Project Analysis - Works with Any GitHub Repository

A production-ready Model Context Protocol (MCP) server that provides comprehensive dbt project quality assessment for any GitHub repository. Powered by GitHub App authentication for secure, scalable access to public and private repositories. Purpose-built for AI agents and modern data workflows.

🚀 What is Data Product Hub?

Data Product Hub transforms any dbt project on GitHub into an agent-accessible data quality platform that:

  • Analyzes ANY GitHub dbt repository with AI-powered suggestions and best practices
  • Works with public and private repos via secure GitHub App authentication
  • Supports subdirectory dbt projects (detects dbt/, transform/, analytics/ folders)
  • Checks metadata coverage across your entire data product portfolio
  • Maps data lineage and dependency relationships
  • Integrates with Git for enhanced context and change analysis
  • Exposes MCP tools for seamless AI agent integration
  • Deploys anywhere - FastMCP Cloud (recommended), Docker, Kubernetes

Features

🔧 Universal MCP Tools (Work with Any GitHub Repository)

  • analyze_dbt_model(model_name, repo_url) - Basic dbt model analysis
  • analyze_dbt_model_with_ai(model_name, repo_url) - NEW: AI-powered analysis with user's OpenAI key
  • check_metadata_coverage(repo_url) - Project-wide metadata assessment
  • get_project_lineage(repo_url) - Data dependency mapping
  • assess_data_product_quality(model_name, repo_url) - Comprehensive quality scoring
  • validate_github_repository(repo_url) - Validate repo access and dbt structure
  • analyze_dbt_model_with_git_context(model_name, repo_url) - dbt analysis + Git history
  • get_composite_server_status() - Server capabilities and GitHub integration status

🌐 Deployment Flexibility

  • Local CLI - dph -f ./project
  • Hostable MCP Server - dph serve --mcp-host 0.0.0.0
  • Container Deployment - Docker + Kubernetes + Helm charts
  • FastMCP Cloud - One-click cloud deployment

🔗 Agent Integration

  • Compatible with Claude Code, Cursor, and any MCP-enabled AI agent
  • JSON-first output for automation and CI/CD pipelines
  • Structured responses for programmatic consumption

Quick Start

🎯 GitHub Repository Analysis (Recommended)

1. Install the GitHub App on your dbt repositories:

  • Visit: https://github.com/apps/data-product-hub/installations/new
  • Select repositories containing dbt projects
  • Grant read permissions

2. (Optional) Enable AI features by adding your OpenAI API key:

  • Go to Repository Settings → Environments
  • Create or use any of these environment names: production, prod, data-analysis, main, or staging
  • Add OPENAI_API_KEY as an Environment Secret
  • Set the value to your OpenAI API key (sk-proj-...)
  • This enables the analyze_dbt_model_with_ai tool
  • Note: All other tools work without an API key - only AI-powered analysis requires it

3. Use via Claude Desktop:

// Add to ~/.claude_desktop_config.json
{
  "mcpServers": {
    "data-product-hub": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-fetch", "https://data-product-hub.fastmcp.app/mcp"]
    }
  }
}

4. Ask Claude to analyze any dbt repository:

"Analyze the customer_metrics model in https://github.com/company/analytics-dbt"
"Get AI-powered suggestions for the user_events model in github.com/company/dbt-models"
"Check metadata coverage for github.com/myorg/data-warehouse"
"Get project lineage for github.com/startup/dbt-models"

🖥️ Local CLI Usage (Backwards Compatible)

# Install package
pip install data-product-hub

# CLI analysis
dph -f ./my-dbt-project --metadata-only

# Start local MCP server
dph --mcp-server -f ./my-dbt-project

🔌 Programmatic Integration

from fastmcp import Client

# Connect to the universal MCP server
client = Client("https://data-product-hub.fastmcp.app/mcp")

async with client:
    # Basic analysis of any GitHub repository
    analysis = await client.call_tool(
        "analyze_dbt_model",
        {
            "model_name": "customer_summary",
            "repo_url": "https://github.com/company/analytics-dbt"
        }
    )

    # AI-powered analysis (requires OpenAI API key in environment secrets)
    ai_analysis = await client.call_tool(
        "analyze_dbt_model_with_ai",
        {
            "model_name": "customer_summary",
            "repo_url": "https://github.com/company/analytics-dbt"
        }
    )

    # Check metadata coverage across any project
    coverage = await client.call_tool(
        "check_metadata_coverage",
        {"repo_url": "github.com/myorg/data-warehouse"}
    )

Deployment Options

1. Use the Hosted Service (Recommended)

Ready to use immediately:

  • MCP Server: https://data-product-hub.fastmcp.app/mcp
  • GitHub App: https://github.com/apps/data-product-hub/installations/new

Quick Setup:

  1. Install the GitHub App on your dbt repositories
  2. Add the MCP server to Claude Desktop configuration
  3. Start analyzing any dbt repository via Claude

2. Deploy Your Own Instance

For organizations wanting their own instance:

Prerequisites:

  • Fork this repository
  • Create your own GitHub App with read permissions
  • Get GitHub App ID and base64-encoded private key

Deployment:

  1. Deploy to FastMCP Cloud with entry point: server.py
  2. Set your GitHub App credentials as environment variables
  3. Share your GitHub App installation URL with users

📖 Complete Deployment Guide

2. Docker Deployment

# Using Docker Compose
docker-compose up

# Custom container
docker run -p 8080:8080 \
  -v ./my-dbt-project:/dbt-project \
  data-product-hub:latest

3. Kubernetes Deployment

# Deploy with Helm
helm install data-product-hub ./charts/data-product-hub \
  --set persistence.hostPath="/path/to/dbt-project" \
  --set dbtAi.database="snowflake"

📖 Full Kubernetes Guide

Configuration

The Data Product Hub MCP server is ready to use - no configuration required for end users! Just install the GitHub App and start analyzing.

For Local CLI Usage Only

# Database configuration (local CLI only)
DATABASE=snowflake  # snowflake, postgres, redshift, bigquery

# OpenAI API (optional - for AI features in local CLI)
OPENAI_API_KEY=your-openai-api-key
DBT_AI_BASIC_MODEL=gpt-4o-mini
DBT_AI_ADVANCED_MODEL=gpt-4o

Supported Databases

  • Snowflake (default)
  • PostgreSQL
  • Amazon Redshift
  • Google BigQuery

Architecture

Data Product Hub implements a composite MCP architecture:

Your Data Product Hub Server
├── Core dbt Analysis
├── Git Integration (via Git MCP server)
├── Future: Monte Carlo Integration
├── Future: DataHub Integration
└── Future: Snowflake Performance Integration

This allows AI agents to get comprehensive data product insights from a single MCP endpoint.

Use Cases

For Data Teams

  • Automated quality checks in CI/CD pipelines
  • Documentation coverage monitoring
  • Lineage analysis for impact assessment
  • Agent-driven data workflows

for AI Agents

  • Data product understanding before making changes
  • Quality assessment as part of automated reviews
  • Context-aware suggestions with Git history
  • Comprehensive data product insights

For Platform Teams

  • Centralized data quality hub
  • Production-ready MCP server deployment
  • Multi-tool integration platform
  • Kubernetes-native scaling

Migrating from dbt-ai

If you're upgrading from the legacy dbt-ai package:

# Old command
dbt-ai -f ./project --metadata-only

# New command (identical functionality) - use the short dph command!
dph -f ./project --metadata-only

All CLI functionality is 100% backwards compatible.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License


Data Product Hub - Transforming dbt projects into agent-accessible data quality platforms. 🚀

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选
mcp-server-qdrant

mcp-server-qdrant

这个仓库展示了如何为向量搜索引擎 Qdrant 创建一个 MCP (Managed Control Plane) 服务器的示例。

官方
精选