Data Planning Agent

Data Planning Agent

Transforms high-level business intents into structured Data Product Requirement Prompts through AI-powered conversational refinement. Guides users through clarifying questions to gather comprehensive requirements for automated Business Intelligence dashboard generation.

Category
访问服务器

README

Data Planning Agent

An MCP (Model Context Protocol) agent that transforms high-level business intents into structured Data Product Requirement Prompts (Data PRPs) through AI-powered conversational refinement.

Overview

The Data Planning Agent is the first component in a multi-agent system for automated Business Intelligence dashboard generation. It helps data scientists and analysts gather comprehensive requirements by:

  1. Starting with a vague business intent
  2. Refining through AI-guided clarifying questions
  3. Generating a structured, machine-readable Data PRP document

The output Data PRP serves as input for the Data Discovery Agent, enabling automated data source identification and analysis.

Features

  • 🤖 AI-Powered Conversations: Uses Gemini 2.5 Pro for intelligent requirement gathering
  • Smart Questioning: Asks up to 4 focused questions at a time, biased toward multiple choice for efficiency
  • 📋 Structured Output: Generates standardized Data PRP markdown documents
  • 💾 Flexible Storage: Supports both GCS (gs://) and local file paths
  • 🎨 Organizational Context: Load custom context files to tailor agent behavior to your organization
  • 🔌 MCP Integration: Full MCP server implementation (stdio + HTTP transports)
  • 🖥️ Interactive CLI: Test conversations directly from the command line
  • 🎯 Cursor Compatible: Works seamlessly as a Cursor MCP server

Installation

Prerequisites

  • Python 3.10 or higher
  • Poetry for dependency management
  • Gemini API key

Setup

  1. Clone the repository:
cd /home/user/git/data-planning-agent
  1. Install dependencies using Poetry:
poetry install
  1. Create a .env file from the example:
cp .env.example .env
  1. Configure your environment variables in .env:
# Required
GEMINI_API_KEY=your-gemini-api-key-here

# Optional (with defaults)
GEMINI_MODEL=gemini-2.5-pro
OUTPUT_DIR=./output
MCP_TRANSPORT=stdio
LOG_LEVEL=INFO

Usage

Interactive CLI Mode

The easiest way to test the Planning Agent:

poetry run planning-agent

This launches an interactive session that guides you through:

  1. Entering your initial business intent
  2. Answering clarifying questions
  3. Generating and saving the final Data PRP

MCP Server Mode (for Cursor Integration)

Run as an MCP server for integration with Cursor:

# Stdio transport (default)
poetry run python -m data_planning_agent.mcp

# HTTP transport
MCP_TRANSPORT=http poetry run python -m data_planning_agent.mcp

Using with Cursor

Add this configuration to your ~/.cursor/mcp.json:

{
  "mcpServers": {
    "data-planning-agent": {
      "command": "poetry",
      "args": ["run", "python", "-m", "data_planning_agent.mcp"],
      "cwd": "/home/user/git/data-planning-agent",
      "env": {
        "GEMINI_API_KEY": "your-gemini-api-key-here",
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

Then use these MCP tools in Cursor:

1. start_planning_session

Start a new planning session:

{
  "initial_intent": "We want to provide the merchandising team insights into trending items in region 7"
}

Returns a session ID and initial clarifying questions.

2. continue_conversation

Continue the conversation with responses:

{
  "session_id": "your-session-id",
  "user_response": "a) Regional managers, they need both summary and detail"
}

Returns follow-up questions or completion notification.

3. generate_data_prp

Generate the final Data PRP:

{
  "session_id": "your-session-id",
  "output_path": "gs://my-bucket/planning/data_prp.md",
  "save_to_file": true
}

Returns the generated Data PRP markdown and file location.

Example Conversation Flow

User: "We want to provide the merchandising team insights into trending items in region 7"

Agent: Based on your intent, I have a few questions:

1. What is the primary audience for this analysis?
   a) Executives (high-level summary)
   b) Regional managers (summary + detail)
   c) Data analysts (detailed data)
   d) Other (please specify)

2. What key metrics define "trending" for your use case?
   a) Unit sales volume
   b) Revenue growth
   c) Profit margin
   d) Multiple metrics (please specify)

3. What time frame should we analyze?
   a) Last 4 weeks
   b) Last 8 weeks
   c) Last quarter
   d) Custom period (please specify)

4. Do you need comparisons to previous periods?
   a) Yes, week-over-week
   b) Yes, year-over-year
   c) Yes, both
   d) No comparisons needed

User: "b) Regional managers
       a) Unit sales volume
       b) Last 8 weeks  
       a) Yes, week-over-week"

Agent: [Asks follow-up questions or generates Data PRP]

Data PRP Output Format

The generated Data PRP follows this structure:

# Data Product Requirement Prompt

## 1. Executive Summary

* **Objective:** [One-sentence business goal]
* **Target Audience:** [Who will use this]
* **Key Question:** [Primary question to answer]

## 2. Business Context

[Detailed paragraph explaining the scenario and decisions to be made]

## 3. Data Requirements

### 3.1. Key Metrics

* [Metric 1]
* [Metric 2]

### 3.2. Dimensions & Breakdowns

* [Dimension 1]
* [Dimension 2]

### 3.3. Filters

* [Filter 1]
* [Filter 2]

## 4. Success Criteria

* **Primary Metric:** [Main success indicator]
* **Timeline:** [Delivery expectations]

Organizational Context

The Planning Agent can be customized to your organization by loading context files that influence all AI interactions.

What is Organizational Context?

Context files are markdown documents that provide the AI with:

  • Company-specific terminology and standards
  • Standard operating procedures (SOPs)
  • Data governance policies
  • Technical constraints
  • Communication preferences

How to Use Context

  1. Create a context directory (local or GCS):

    mkdir ./context
    
  2. Add markdown files with your organizational knowledge:

    # context/01_organization.md
    # context/02_sop.md
    # context/03_constraints.md
    
  3. Configure the agent to use your context:

    # .env
    CONTEXT_DIR=./context
    # or for GCS:
    # CONTEXT_DIR=gs://my-bucket/planning-context/
    
  4. Files are loaded automatically when the agent starts

Example Context Files

See the context.example/ directory for real examples:

  • 01_organization.md: Organizational background, team structure, communication style
  • 02_sop.md: Standard operating procedures, terminology standards, data governance
  • 03_constraints.md: Technical constraints, preferred analysis patterns, budget considerations

Benefits

  • Consistency: Agent uses your terminology and follows your SOPs
  • Governance: Automatically applies your data governance policies
  • Efficiency: No need to repeat organizational context in every conversation
  • Flexibility: Update context files without changing code

Context Behavior

  • Context is prepended to all AI prompts (initial questions, follow-ups, PRP generation)
  • Context is hidden from users - it silently guides agent behavior
  • Context is optional - agent works normally without it
  • Multiple files are concatenated alphabetically
  • Supports both local and GCS storage

Configuration

All configuration is managed through environment variables. See .env.example for the complete list:

Variable Description Default
GEMINI_API_KEY Gemini API key (required) -
GEMINI_MODEL Gemini model to use gemini-2.5-pro
OUTPUT_DIR Default output directory ./output
CONTEXT_DIR Context directory (local or GCS) None
MCP_TRANSPORT Transport mode (stdio or http) stdio
MCP_HOST HTTP server host 0.0.0.0
MCP_PORT HTTP server port 8080
MAX_CONVERSATION_TURNS Max conversation turns 10
LOG_LEVEL Logging level INFO

Architecture

Components

  • MCP Server (src/data_planning_agent/mcp/)

    • Stdio and HTTP transports
    • JSON-RPC 2.0 protocol
    • SSE support for real-time updates
  • Clients (src/data_planning_agent/clients/)

    • GeminiClient: Gemini API wrapper for conversations
    • StorageClient: GCS and local file I/O
  • Core Logic (src/data_planning_agent/core/)

    • ConversationManager: Session state management
    • RequirementRefiner: Conversation orchestration
    • PRPGenerator: Data PRP markdown generation
  • Models (src/data_planning_agent/models/)

    • PlanningSession: Session data model
    • DataProductRequirementPrompt: PRP schema
  • CLI (src/data_planning_agent/cli/)

    • Interactive command-line interface

Integration with Data Discovery Agent

┌─────────────────────┐
│  Planning Agent     │  1. Gathers requirements
│  (This repo)        │     through conversation
└──────────┬──────────┘
           │
           │ Data PRP.md
           ▼
┌─────────────────────┐
│ Data Discovery      │  2. Searches for relevant
│ Agent               │     datasets using PRP
└──────────┬──────────┘
           │
           │ Discovered datasets
           ▼
┌─────────────────────┐
│ Query Generation    │  3. Generates SQL queries
│ Agent               │     for analysis
└─────────────────────┘

Testing

Run tests with pytest:

# All tests
poetry run pytest

# Unit tests only
poetry run pytest tests/unit/

# With coverage
poetry run pytest --cov=data_planning_agent --cov-report=html

Development

Code Quality

Format code with Black:

poetry run black src/ tests/

Lint with Ruff:

poetry run ruff check src/ tests/

Project Structure

data-planning-agent/
├── src/data_planning_agent/
│   ├── mcp/              # MCP server implementation
│   ├── clients/          # External service clients
│   ├── core/             # Business logic
│   ├── models/           # Data models
│   └── cli/              # Command-line interface
├── tests/                # Test suite
├── pyproject.toml        # Poetry configuration
├── .env.example          # Environment variables template
└── README.md             # This file

Troubleshooting

Common Issues

Issue: GEMINI_API_KEY not set

  • Solution: Ensure your .env file contains a valid Gemini API key

Issue: Session timeout or max turns reached

  • Solution: Increase MAX_CONVERSATION_TURNS in .env

Issue: GCS write permission denied

  • Solution: Ensure your GCP credentials have write access to the bucket

Issue: Cursor can't connect to MCP server

  • Solution: Check that MCP_TRANSPORT=stdio and the cwd path is correct

License

Apache License 2.0 - See LICENSE for details.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Related Projects

Support

For issues, questions, or contributions, please open an issue on GitHub.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选