StatTools
MCP server that lets AI agents discover and call R and Python statistical functions without writing code.
README
StatTools
MCP server that lets AI agents discover and call R and Python statistical functions without writing code.
What It Does
- Search ~48k functions on a fresh clone after
build-index, and ~336k after the full Phase 7 + 7b tarball waves:"mixed effects model"findslme4::lmer - Validate before executing:
stat_resolvechecks safety, generates parameter schema - Execute with structured JSON input/output: no R syntax, no script files, no console parsing
- Track session state: data handles, model handles, resolved functions
- Call methods on Python objects:
model.fit(X, y),model.predict(X_test),scaler.transform(X) - Auto-index after install:
stat_installmakes new packages immediately searchable
Architecture
Agent (Claude Code / Cursor / custom)
| MCP protocol (stdio)
v
TypeScript MCP Server
|-- SQLite FTS5 search index (~48k fresh-clone baseline, ~336k after the Phase 7 + 7b tarball waves)
|-- R Worker Pool (persistent subprocess, hot-standby, recycle/crash recovery)
|-- Python Worker (persistent subprocess, sklearn/statsmodels/scipy/pandas)
+-- Session state (handles, resolved functions, install jobs)
Quick Start
Prerequisites
- Node.js 22.x (enforced — see
.nvmrc) - R >= 4.1 with
jsonlitepackage installed - Python 3 with sklearn/statsmodels/scipy/pandas (optional — for Python workflows)
Install & Build
cd stattools
nvm use # Use pinned Node 22.x
npm install
npm run build
Build the Search Index
npm run build-index
Indexes all installed R packages + CRAN metadata (~2 minutes).
Connect to Claude Code
Add to ~/.claude/settings.json. Use the full path to your Node 22 binary — better-sqlite3 will crash under a different Node version:
{
"mcpServers": {
"stattools": {
"command": "/path/to/.nvm/versions/node/v22.x.x/bin/node",
"args": ["/absolute/path/to/stattools/dist/index.js"],
"env": {
"STATTOOLS_DATA_ROOTS": "/Users/me/data:/tmp",
"R_PATH": "/path/to/Rscript",
"PATH": "/path/to/R/bin:/path/to/node/bin:/usr/bin:/bin"
}
}
}
}
Find your Node 22 path with nvm which 22. R_PATH and PATH must include Rscript for the R worker pool to function.
Tools
| Tool | Purpose |
|---|---|
stat_search |
Search functions by natural language. Returns ranked results with safety class. |
stat_resolve |
Validate a function + get full parameter schema. Required before stat_call. |
stat_call |
Execute a resolved function with JSON arguments. Returns structured results. |
stat_method |
Call a method on a Python session object (fit, predict, transform, score). |
stat_load_data |
Load CSV/TSV/RDS into session. Supports runtime="python" for pandas. |
stat_session |
View session state: handles, resolved functions, worker status, install jobs. |
stat_describe |
Inspect a handle: schema, head, dimensions, summary, str. |
stat_install |
Install a CRAN package (async). Auto-indexes on completion. |
Example: R Workflow
stat_search({ query: "linear regression" })
-> stats::lm (safe), MASS::lm.ridge (safe), ...
stat_resolve({ package: "stats", function: "lm" })
-> { resolved: true, safety_class: "safe", schema: { formula, data, ... } }
stat_load_data({ file_path: "/tmp/sales.csv" })
-> { object_id: "sales", dimensions: { rows: 1000, cols: 8 }, ... }
stat_call({ package: "stats", function: "lm", args: { formula: "revenue ~ ad_spend", data: "sales" } })
-> { r_squared: 0.73, coefficients: { ad_spend: { estimate: 2.3, p_value: 0.001 }, ... } }
Example: Python Workflow
stat_load_data({ file_path: "/tmp/data.csv", runtime: "python", name: "df" })
-> { object_id: "df", class: "DataFrame", dimensions: { rows: 500, cols: 10 } }
stat_resolve({ package: "sklearn.linear_model", function: "LinearRegression" })
-> { resolved: true, runtime: "python", schema: { ... } }
stat_call({ package: "sklearn.linear_model", function: "LinearRegression", args: {}, assign_to: "model" })
-> { objects_created: [{ id: "model", type: "model" }] }
stat_method({ object: "model", method: "fit", positional_args: ["X_train", "y_train"] })
-> { coefficients: [2.3, -0.5], intercept: 1.2 }
stat_method({ object: "model", method: "predict", positional_args: ["X_test"], assign_to: "preds" })
-> { class: "ndarray", shape: [100], ... }
Safety Model
Functions are classified into tiers:
| Class | Behavior |
|---|---|
safe |
Fully callable. Pure computation. |
callable_with_caveats |
Callable with warnings (e.g., NSE, graphics, RNG). |
unsafe |
Blocked. File writes, network, system modification. |
unclassified |
Blocked by default. Discoverable but not callable. |
2,024 safety overrides in CSV (~2,048 classified in the built DB including Python defaults). Unclassified functions are blocked — extend coverage by adding entries to data/safety_overrides.csv.
Search Quality
Benchmark: 111 queries across 12 categories.
Fresh clone (after build-index only): ~48k functions, ~570 classified. Benchmark pass rate depends on which packages are installed locally and whether tarball extraction has been run. Expect ~90% on a standard R installation.
Expanded index (after the full Phase 7 + 7b tarball waves + ranking/callability updates): ~336k functions, ~2.0k classified. 100% top-3 and 93% top-1 on 97/97 installable queries (MRR: 0.962) — tested on a machine with a rich local R library including the easystats suite. ML, IO, visualization, mixed-models, wrangling, and diagnostics categories are at 100% top-1; weaker categories (testing, bayesian) sit at 83%.
The headline 100% number requires both a rich local R library and tarball extraction. Your mileage will vary based on which packages are installed.
Environment Variables
| Variable | Default | Description |
|---|---|---|
STATTOOLS_DATA_ROOTS |
Current directory | Colon-separated list of allowed data directories |
R_PATH |
Rscript |
Path to Rscript binary |
Setup Validation
After build + index, verify everything works:
npm run validate # Checks Node, R, build, index, server, and runs a real workflow
This runs 14 checks including safety-override integrity, starting the MCP server, inspecting Python runtime health, and executing a complete search → resolve → load → call → session workflow.
For real external-client validation through Claude Code CLI, including exact prompts for OLS, mixed-effects, reshape, ggplot2, and glmnet, see AGENT_WORKFLOW_RUNBOOK.md.
Development
nvm use # Enforce Node 22.x
npm test # Run the hermetic default test suite
npm run test:tarball-live # Optional live CRAN tarball smoke test
npm run test:benchmark # Run the heavy 111-query benchmark separately
npm run test:watch # Watch mode
npm run build # Compile TypeScript
npm run build-index # Rebuild search index
npm run apply-safety-overrides # Sync safety_overrides.csv into the current DB
npm run check-safety-overrides # Fail if safety_overrides.csv has orphan or duplicate IDs
npm run validate # Full setup validation
Status: Beta for Tier A workflows (v0.2.0)
Phase 6 closed with a four-round agent eval going from 80% → 84% → 92% → 98% weighted pass rate on a 25-task representative workflow set. The single remaining non-pass is an upstream R-package bug. See phase6-retrospective.md for the full story.
What works reliably:
- Search: ~90% top-3 on a fresh clone. On the fully expanded Phase 7 + 7b index, the benchmark is 100% top-3 and 93% top-1 on 99 installable queries (MRR 0.963).
- Core R workflows: OLS, logistic, t-test, ANOVA, correlation, random forest, PCA, k-means, mixed effects (lme4 random intercept/slope/GLMM), survival (Kaplan-Meier, Cox PH, Weibull), robust SE, broom tidy, VIF, stepwise selection, time series (auto.arima, STL, forecast), Bayesian regression (rstanarm), polynomial regression with model comparison, fixest panel regression — all validated end-to-end through agent evals.
- Data loading: CSV/TSV/RDS via
file_path, built-in R datasets viadataset(mtcars, iris, sleepstudy, lung, cbpp, Grunfeld, AirPassengers, ...), pandas DataFrame viaruntime="python". Handles register identically. - NSE-heavy verbs (dplyr, tidyr, ggplot2::aes):
stat_call'sexpressionsanddot_expressionsfields take R expression strings, parsed viarlang::parse_exprand forwarded as quosures. dplyr data-mask pronouns liken()and tidyselect helpers likeeverything()/-Speciesresolve correctly.stat_resolvereturns annse_hintfield for ~15 known NSE functions with worked examples. - Multi-object dispatch (
anova(m1, m2),AIC(m1, m2)):stat_call'sdot_argsfield resolves session handle IDs as positional...args. - Class coercion (factor/ts/matrix):
stat_call'scoercefield accepts whitelisted specs (factor,ts(frequency=N), etc.) and applies them before the call.stat_resolve'sclass_hintfield tells you when to use it. - Python workflows: structured errors with
python_state(spawn_failed/modules_missing/crashed/healthy),python_path,missing_modules,recent_stderr, andhint— no separatestat_sessionround trip required. - Verbose R functions: console output is captured/suppressed so it does not pollute the NDJSON channel.
- Handle system: models and data persist in session across calls.
- Install + auto-reindex:
stat_installinstalls and makes packages immediately searchable. - Worker stability: hot-standby pool, crash recovery, handle persistence across recycles.
What works with caveats:
- Python install path: the server uses whatever
python3/PYTHON_PATHresolves to at startup. If youpip installinto a different interpreter, the server won't see the modules. Install into the binarystat_sessionreports underpython.path, or setPYTHON_PATHexplicitly. - Bayesian: rstanarm/brms are slow (MCMC compilation) and classified as
callable_with_caveats.bayestestR::hdi(stanreg_model)currently throws a names-length error on rstanarm fits (upstream bug) — usebayestestR::describe_posterior(model, ci_method="HDI")instead. lm(weights = ...): theweightsarg is captured viamodel.frame, not the rlang/dplyr NSE machinery.expressions={"weights": "1/hp"}is rejected. Workaround: extract the column withstat_extractand pass the resulting numeric vector handle.- S3 dispatch on first positional arg (
randomForest,survival::Surv, etc.): when bothformulaandxare passed, R silently falls through to.default(matrix mode). Workaround: use matrix form (x=,y=) withcoerce={y:"factor"}for classification, or pass the formula as the first positional arg.
What doesn't work yet:
- Only ~2.0k of ~336k functions are classified as callable. The rest are discoverable but blocked by the fail-closed safety model. Extend coverage by adding entries to
data/safety_overrides.csv. - ~14.9k packages are still stubs (no function-level metadata).
data/tarball_targets_phase7.txtcovers 8,500 priority packages. - Tarball expansion is network-bound and incremental.
npm testis hermetic;npm run test:tarball-liverequires live CRAN access. - Top-1 search accuracy is 93%; weakest in
testingandbayesiancategories at 83%. Top-3 remains 100%. - No multi-tenant support — single-user local server only.
Known environment requirements:
- Node 22.x (enforced; better-sqlite3 will crash on other versions)
- R >= 4.1 with jsonlite
- macOS or Linux (not tested on Windows)
- For Python workflows: python3 with sklearn, scipy, statsmodels, pandas
Tier A Packages
Deeply classified packages with safety overrides, curated aliases, and workflow tests:
Core Stats: stats, base, utils, MASS, boot, cluster Tidyverse: dplyr, tidyr, ggplot2, readr, purrr, stringr, forcats, tibble, scales Modeling: lme4, nlme, mgcv, glmnet, survival, sandwich, car, lmtest, forecast ML: caret, randomForest, rpart, nnet, e1071 Model Output: broom, emmeans, marginaleffects, performance, parameters, effectsize Bayesian: rstanarm, brms, bayestestR Specialized: psych, lavaan, vegan, datawizard, insight, haven, data.table, fixest Python: sklearn (linear_model, ensemble, tree, svm, neighbors, cluster, decomposition, preprocessing, metrics, model_selection), statsmodels, scipy.stats, pandas
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。