StatTools

StatTools

MCP server that lets AI agents discover and call R and Python statistical functions without writing code.

Category
访问服务器

README

StatTools

MCP server that lets AI agents discover and call R and Python statistical functions without writing code.

What It Does

  • Search ~48k functions on a fresh clone after build-index, and ~336k after the full Phase 7 + 7b tarball waves: "mixed effects model" finds lme4::lmer
  • Validate before executing: stat_resolve checks safety, generates parameter schema
  • Execute with structured JSON input/output: no R syntax, no script files, no console parsing
  • Track session state: data handles, model handles, resolved functions
  • Call methods on Python objects: model.fit(X, y), model.predict(X_test), scaler.transform(X)
  • Auto-index after install: stat_install makes new packages immediately searchable

Architecture

Agent (Claude Code / Cursor / custom)
  | MCP protocol (stdio)
  v
TypeScript MCP Server
  |-- SQLite FTS5 search index (~48k fresh-clone baseline, ~336k after the Phase 7 + 7b tarball waves)
  |-- R Worker Pool (persistent subprocess, hot-standby, recycle/crash recovery)
  |-- Python Worker (persistent subprocess, sklearn/statsmodels/scipy/pandas)
  +-- Session state (handles, resolved functions, install jobs)

Quick Start

Prerequisites

  • Node.js 22.x (enforced — see .nvmrc)
  • R >= 4.1 with jsonlite package installed
  • Python 3 with sklearn/statsmodels/scipy/pandas (optional — for Python workflows)

Install & Build

cd stattools
nvm use          # Use pinned Node 22.x
npm install
npm run build

Build the Search Index

npm run build-index

Indexes all installed R packages + CRAN metadata (~2 minutes).

Connect to Claude Code

Add to ~/.claude/settings.json. Use the full path to your Node 22 binary — better-sqlite3 will crash under a different Node version:

{
  "mcpServers": {
    "stattools": {
      "command": "/path/to/.nvm/versions/node/v22.x.x/bin/node",
      "args": ["/absolute/path/to/stattools/dist/index.js"],
      "env": {
        "STATTOOLS_DATA_ROOTS": "/Users/me/data:/tmp",
        "R_PATH": "/path/to/Rscript",
        "PATH": "/path/to/R/bin:/path/to/node/bin:/usr/bin:/bin"
      }
    }
  }
}

Find your Node 22 path with nvm which 22. R_PATH and PATH must include Rscript for the R worker pool to function.

Tools

Tool Purpose
stat_search Search functions by natural language. Returns ranked results with safety class.
stat_resolve Validate a function + get full parameter schema. Required before stat_call.
stat_call Execute a resolved function with JSON arguments. Returns structured results.
stat_method Call a method on a Python session object (fit, predict, transform, score).
stat_load_data Load CSV/TSV/RDS into session. Supports runtime="python" for pandas.
stat_session View session state: handles, resolved functions, worker status, install jobs.
stat_describe Inspect a handle: schema, head, dimensions, summary, str.
stat_install Install a CRAN package (async). Auto-indexes on completion.

Example: R Workflow

stat_search({ query: "linear regression" })
  -> stats::lm (safe), MASS::lm.ridge (safe), ...

stat_resolve({ package: "stats", function: "lm" })
  -> { resolved: true, safety_class: "safe", schema: { formula, data, ... } }

stat_load_data({ file_path: "/tmp/sales.csv" })
  -> { object_id: "sales", dimensions: { rows: 1000, cols: 8 }, ... }

stat_call({ package: "stats", function: "lm", args: { formula: "revenue ~ ad_spend", data: "sales" } })
  -> { r_squared: 0.73, coefficients: { ad_spend: { estimate: 2.3, p_value: 0.001 }, ... } }

Example: Python Workflow

stat_load_data({ file_path: "/tmp/data.csv", runtime: "python", name: "df" })
  -> { object_id: "df", class: "DataFrame", dimensions: { rows: 500, cols: 10 } }

stat_resolve({ package: "sklearn.linear_model", function: "LinearRegression" })
  -> { resolved: true, runtime: "python", schema: { ... } }

stat_call({ package: "sklearn.linear_model", function: "LinearRegression", args: {}, assign_to: "model" })
  -> { objects_created: [{ id: "model", type: "model" }] }

stat_method({ object: "model", method: "fit", positional_args: ["X_train", "y_train"] })
  -> { coefficients: [2.3, -0.5], intercept: 1.2 }

stat_method({ object: "model", method: "predict", positional_args: ["X_test"], assign_to: "preds" })
  -> { class: "ndarray", shape: [100], ... }

Safety Model

Functions are classified into tiers:

Class Behavior
safe Fully callable. Pure computation.
callable_with_caveats Callable with warnings (e.g., NSE, graphics, RNG).
unsafe Blocked. File writes, network, system modification.
unclassified Blocked by default. Discoverable but not callable.

2,024 safety overrides in CSV (~2,048 classified in the built DB including Python defaults). Unclassified functions are blocked — extend coverage by adding entries to data/safety_overrides.csv.

Search Quality

Benchmark: 111 queries across 12 categories.

Fresh clone (after build-index only): ~48k functions, ~570 classified. Benchmark pass rate depends on which packages are installed locally and whether tarball extraction has been run. Expect ~90% on a standard R installation.

Expanded index (after the full Phase 7 + 7b tarball waves + ranking/callability updates): ~336k functions, ~2.0k classified. 100% top-3 and 93% top-1 on 97/97 installable queries (MRR: 0.962) — tested on a machine with a rich local R library including the easystats suite. ML, IO, visualization, mixed-models, wrangling, and diagnostics categories are at 100% top-1; weaker categories (testing, bayesian) sit at 83%.

The headline 100% number requires both a rich local R library and tarball extraction. Your mileage will vary based on which packages are installed.

Environment Variables

Variable Default Description
STATTOOLS_DATA_ROOTS Current directory Colon-separated list of allowed data directories
R_PATH Rscript Path to Rscript binary

Setup Validation

After build + index, verify everything works:

npm run validate     # Checks Node, R, build, index, server, and runs a real workflow

This runs 14 checks including safety-override integrity, starting the MCP server, inspecting Python runtime health, and executing a complete search → resolve → load → call → session workflow.

For real external-client validation through Claude Code CLI, including exact prompts for OLS, mixed-effects, reshape, ggplot2, and glmnet, see AGENT_WORKFLOW_RUNBOOK.md.

Development

nvm use              # Enforce Node 22.x
npm test             # Run the hermetic default test suite
npm run test:tarball-live  # Optional live CRAN tarball smoke test
npm run test:benchmark     # Run the heavy 111-query benchmark separately
npm run test:watch   # Watch mode
npm run build        # Compile TypeScript
npm run build-index  # Rebuild search index
npm run apply-safety-overrides  # Sync safety_overrides.csv into the current DB
npm run check-safety-overrides  # Fail if safety_overrides.csv has orphan or duplicate IDs
npm run validate     # Full setup validation

Status: Beta for Tier A workflows (v0.2.0)

Phase 6 closed with a four-round agent eval going from 80% → 84% → 92% → 98% weighted pass rate on a 25-task representative workflow set. The single remaining non-pass is an upstream R-package bug. See phase6-retrospective.md for the full story.

What works reliably:

  • Search: ~90% top-3 on a fresh clone. On the fully expanded Phase 7 + 7b index, the benchmark is 100% top-3 and 93% top-1 on 99 installable queries (MRR 0.963).
  • Core R workflows: OLS, logistic, t-test, ANOVA, correlation, random forest, PCA, k-means, mixed effects (lme4 random intercept/slope/GLMM), survival (Kaplan-Meier, Cox PH, Weibull), robust SE, broom tidy, VIF, stepwise selection, time series (auto.arima, STL, forecast), Bayesian regression (rstanarm), polynomial regression with model comparison, fixest panel regression — all validated end-to-end through agent evals.
  • Data loading: CSV/TSV/RDS via file_path, built-in R datasets via dataset (mtcars, iris, sleepstudy, lung, cbpp, Grunfeld, AirPassengers, ...), pandas DataFrame via runtime="python". Handles register identically.
  • NSE-heavy verbs (dplyr, tidyr, ggplot2::aes): stat_call's expressions and dot_expressions fields take R expression strings, parsed via rlang::parse_expr and forwarded as quosures. dplyr data-mask pronouns like n() and tidyselect helpers like everything() / -Species resolve correctly. stat_resolve returns an nse_hint field for ~15 known NSE functions with worked examples.
  • Multi-object dispatch (anova(m1, m2), AIC(m1, m2)): stat_call's dot_args field resolves session handle IDs as positional ... args.
  • Class coercion (factor/ts/matrix): stat_call's coerce field accepts whitelisted specs (factor, ts(frequency=N), etc.) and applies them before the call. stat_resolve's class_hint field tells you when to use it.
  • Python workflows: structured errors with python_state (spawn_failed / modules_missing / crashed / healthy), python_path, missing_modules, recent_stderr, and hint — no separate stat_session round trip required.
  • Verbose R functions: console output is captured/suppressed so it does not pollute the NDJSON channel.
  • Handle system: models and data persist in session across calls.
  • Install + auto-reindex: stat_install installs and makes packages immediately searchable.
  • Worker stability: hot-standby pool, crash recovery, handle persistence across recycles.

What works with caveats:

  • Python install path: the server uses whatever python3 / PYTHON_PATH resolves to at startup. If you pip install into a different interpreter, the server won't see the modules. Install into the binary stat_session reports under python.path, or set PYTHON_PATH explicitly.
  • Bayesian: rstanarm/brms are slow (MCMC compilation) and classified as callable_with_caveats. bayestestR::hdi(stanreg_model) currently throws a names-length error on rstanarm fits (upstream bug) — use bayestestR::describe_posterior(model, ci_method="HDI") instead.
  • lm(weights = ...): the weights arg is captured via model.frame, not the rlang/dplyr NSE machinery. expressions={"weights": "1/hp"} is rejected. Workaround: extract the column with stat_extract and pass the resulting numeric vector handle.
  • S3 dispatch on first positional arg (randomForest, survival::Surv, etc.): when both formula and x are passed, R silently falls through to .default (matrix mode). Workaround: use matrix form (x=, y=) with coerce={y:"factor"} for classification, or pass the formula as the first positional arg.

What doesn't work yet:

  • Only ~2.0k of ~336k functions are classified as callable. The rest are discoverable but blocked by the fail-closed safety model. Extend coverage by adding entries to data/safety_overrides.csv.
  • ~14.9k packages are still stubs (no function-level metadata). data/tarball_targets_phase7.txt covers 8,500 priority packages.
  • Tarball expansion is network-bound and incremental. npm test is hermetic; npm run test:tarball-live requires live CRAN access.
  • Top-1 search accuracy is 93%; weakest in testing and bayesian categories at 83%. Top-3 remains 100%.
  • No multi-tenant support — single-user local server only.

Known environment requirements:

  • Node 22.x (enforced; better-sqlite3 will crash on other versions)
  • R >= 4.1 with jsonlite
  • macOS or Linux (not tested on Windows)
  • For Python workflows: python3 with sklearn, scipy, statsmodels, pandas

Tier A Packages

Deeply classified packages with safety overrides, curated aliases, and workflow tests:

Core Stats: stats, base, utils, MASS, boot, cluster Tidyverse: dplyr, tidyr, ggplot2, readr, purrr, stringr, forcats, tibble, scales Modeling: lme4, nlme, mgcv, glmnet, survival, sandwich, car, lmtest, forecast ML: caret, randomForest, rpart, nnet, e1071 Model Output: broom, emmeans, marginaleffects, performance, parameters, effectsize Bayesian: rstanarm, brms, bayestestR Specialized: psych, lavaan, vegan, datawizard, insight, haven, data.table, fixest Python: sklearn (linear_model, ensemble, tree, svm, neighbors, cluster, decomposition, preprocessing, metrics, model_selection), statsmodels, scipy.stats, pandas

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选