ai-analyst

ai-analyst

Enables natural language data queries with statistical rigor and a semantic layer, supporting 23+ analytical tools for product analytics.

Category
访问服务器

README

Local-AI-Analyst

📦 Public Archive - November 2025

This project is being archived as a complete reference implementation of a local-first AI-powered analytics assistant. It demonstrates best practices for MCP integration, dbt semantic bridging, and extending data pipeline infrastructure. It was built to showcase duckDB, dlt, dbt, and Ibis (Boring Semantic Layer).


AI-powered data analyst with semantic layer, statistical rigor, and natural language insights

License: MIT Python 3.10+ Tests MCP Compatible

Status: v1.0 Production Ready | Works with Claude Desktop & ChatGPT Desktop | Quick Start →


What You Get

  • Natural Language Queries - "What's our conversion rate by plan type?" No SQL required.
  • Statistical Rigor - Automatic significance testing, confidence intervals, sample size validation
  • 23+ Analytical Tools - Multi-query workflows, intelligent caching, conversation memory
  • Production Ready - Built on semantic layer principles with real data execution
# Ask Claude Desktop or ChatGPT Desktop:
"How many users do we have?"
"Compare engagement by plan type"
"Run a comprehensive conversion analysis"

How it works: Claude/ChatGPT Desktop → MCP Protocol → AI Analyst → Semantic Layer → Your Data


Quick Start

Prerequisites

  • Claude Desktop OR ChatGPT Desktop (or both!)
  • Python 3.10+ (download)
  • 5 minutes for setup

Quick Install

# 1. Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Clone and setup
git clone https://github.com/sbdk-dev/claude-analyst.git
cd claude-analyst
./scripts/setup.sh

# 3. Test the system
cd semantic-layer
uv run python test_all_functionality.py
# Expected: Tests Passed: 7 | Success Rate: 100.0%

Option A: Claude Desktop Setup

# 1. Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
  "mcpServers": {
    "ai-analyst": {
      "command": "/opt/homebrew/bin/uv",
      "args": ["run", "python", "run_mcp_server.py"],
      "cwd": "/FULL/PATH/TO/claude-analyst/semantic-layer"
    }
  }
}

# 2. Find your uv path
which uv  # Use this full path in config above

# 3. Restart Claude Desktop
# 4. Ask: "List available data models"

Option B: ChatGPT Desktop Setup

# 1. Set your OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"

# 2. Start the OpenAI API server
cd semantic-layer
uv run python run_openai_server.py
# Server starts on http://localhost:8000

# 3. Configure ChatGPT Desktop
# Settings → Beta Features → Actions → Add Custom Action
# URL: http://localhost:8000
# Auth: Bearer token (optional)

# 4. Start chatting!

Detailed guides: QUICK_START.md | Claude Desktop Setup


Core Features

1. Execution-First (Prevents AI Fabrication)

Pattern: Build → Execute → Annotate

Every answer is based on REAL query results, not AI guesses.

Query: "What's our user breakdown by plan?"
→ Generates SQL from semantic model
→ Executes against real database
→ Interprets actual results
→ "700 Free (70%), 250 Pro (25%), 50 Enterprise (5%)"

2. Statistical Testing by Default

Automatic significance testing when comparing groups:

Query: "Is the difference in engagement statistically significant?"
→ Auto-runs appropriate test (chi-square, t-test)
→ Reports p-values, effect sizes, confidence intervals
→ "Pro users 2.3x higher DAU (p<0.001, n=250 vs n=700)"

3. Multi-Query Workflows

Built-in analytical workflows for comprehensive analysis:

  • Conversion Analysis - Funnel metrics, drop-off identification, cohort comparison
  • Feature Usage - Adoption rates, user segmentation, engagement patterns
  • Revenue Optimization - LTV analysis, churn prediction, growth opportunities

4. Intelligent Query Optimization

  • 95% cache hit rate for repeated queries
  • Automatic performance learning
  • Parallel execution for complex workflows
  • Sub-100ms response times

Available Data Models

The system includes sample product analytics data:

Users (1,000 users)

  • Dimensions: plan_type, industry, company_size, signup_date
  • Metrics: total_users, conversion_rate, churn_rate

Events (34,000+ events)

  • Dimensions: event_type, feature_name, event_timestamp
  • Metrics: total_events, events_per_user, feature_adoption

Engagement

  • Dimensions: metric_date, cohort_month
  • Metrics: DAU, MAU, stickiness, retention (D1/D7/D30)

Example Queries

Basic Analytics

"How many users do we have?"
"What's our conversion rate from free to paid?"
"Show me the top 5 features by usage"
"What's our DAU trend this month?"

Comparative Analysis

"Compare engagement between Pro and Free users"
"Is the difference in conversion rate statistically significant?"
"How does feature adoption vary by industry?"

Workflows

"Run a comprehensive conversion analysis"
"Analyze feature usage patterns across user segments"
"What optimization opportunities do you see in our data?"

Why This Matters

The Problem: AI Analysts Can Fabricate Numbers

Traditional AI analysis can make up results when writing observations before executing queries.

The Solution: Semantic Layer + Execution-First

  1. Semantic Layer - Business metrics defined once, queried many ways
  2. Execution-First - Always run the query before interpreting results
  3. Statistical Rigor - Automatic validation and significance testing

Built on Research

  • Semantic Layer Design: Rasmus Engelbrecht's practical guide
  • Fabrication Prevention: Build → Execute → Annotate pattern
  • Production Stack: Boring Semantic Layer + Ibis + FastMCP + DuckDB

Architecture

┌─────────────────────────────────┐
│  Claude Desktop / ChatGPT      │
└────────────┬────────────────────┘
             │ MCP Protocol
┌────────────▼────────────────────┐
│    AI Analyst MCP Server        │
│  ┌──────────────────────────┐  │
│  │  Multi-Query Workflows   │  │
│  │  Query Optimization      │  │
│  │  Conversation Memory     │  │
│  │  Statistical Testing     │  │
│  └───────────┬──────────────┘  │
│              │                  │
│  ┌───────────▼──────────────┐  │
│  │    Semantic Layer        │  │
│  │  (Business Metrics)      │  │
│  └───────────┬──────────────┘  │
└──────────────┼──────────────────┘
               │
         ┌─────▼──────┐
         │   DuckDB   │
         └────────────┘

Key Components:

  • 23 MCP Tools - Complete analytical toolkit
  • Semantic Layer - Users, events, engagement models
  • Intelligence Layer - Statistical testing, natural language generation
  • Optimization Engine - Caching, performance learning, parallel execution
  • Conversation Memory - 24-hour context window, preference learning

Tech Stack

MCP Server: FastMCP - Production-grade MCP framework Semantic Layer: Boring Semantic Layer - Business metrics abstraction Query Engine: Ibis - Portable dataframe abstraction Database: DuckDB - Analytical database (prototype) Statistical Testing: scipy - Significance testing and effect sizes


Verify Installation

After setup, confirm everything works:

cd semantic-layer

# Test semantic layer
uv run python -c "
from mcp_server.semantic_layer_integration import SemanticLayerManager
import asyncio
async def test():
    manager = SemanticLayerManager()
    await manager.initialize()
    models = await manager.get_available_models()
    print(f'SUCCESS: {len(models)} models loaded')
    print(f'Available: {[m[\"name\"] for m in models]}')
asyncio.run(test())
"

Expected output:

SUCCESS: 3 models loaded
Available: ['users', 'events', 'engagement']

If you see this, restart Claude Desktop and try: "List available data models"


Project Status

Current Version: 1.0 Production Ready Release Date: November 2025 Test Pass Rate: 100% (7/7 tests) Completion: All v1.0 features implemented and validated Performance: 95% cache hit rate, <100ms query response


Support & Contributing

Issues: Found a bug or have a feature request? Open an issue

Contributing: Pull requests welcome! Please follow the existing code style.


License

MIT License - See LICENSE for details


Acknowledgments

Inspiration & Research:

Technology:


Last Updated: November 2025 Version: 1.0 Author: Matt Strautmann Status: v1.0 Production Ready | 100% Test Pass Rate

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选
mcp-server-qdrant

mcp-server-qdrant

这个仓库展示了如何为向量搜索引擎 Qdrant 创建一个 MCP (Managed Control Plane) 服务器的示例。

官方
精选
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选