chdb-mcp
An MCP server for chDB, the in-process SQL OLAP engine powered by ClickHouse. Lets agents query Parquet, CSV, JSON, and pandas DataFrames with one tool — no separate server, no Docker.
README
chdb-mcp
An MCP server for chDB, the in-process SQL OLAP engine powered by ClickHouse. Lets agents (Claude Desktop, Cursor, VS Code, Codex CLI, Cline, …) query Parquet, CSV, JSON, and pandas DataFrames with one tool — no separate server, no Docker.
Why chdb-mcp?
- Full ClickHouse engine, in-process. 1000+ functions (
windowFunnel,quantilesTDigest,geoToH3, the-If/-State/-Mergecombinators), typedJSONwith O(1) sub-column reads, native vectors,MergeTreestorage. - Drop-in pandas API.
import datastore as pdcovers ~300 pandas-shaped methods compiled to ClickHouse SQL. v1.0 addsdataframe_query()for zero-copyPython(df). - ~80 formats and 12+ source connectors in core. Parquet, CSV, JSON, Avro, ORC, Arrow, Protobuf, plus
s3(),mongodb(),postgresql(),mysql(),iceberg(),deltaLake()— noINSTALL/LOADchain. - Federate to remote ClickHouse in one statement. (v0.5)
remoteSecure('cluster:9440', 'db.table', ...)joins local Parquet with a production ClickHouse cluster in one optimised plan. - Same SQL as your warehouse. Copy-paste ClickHouse production queries into the agent prompt — no dialect bridge.
Install
pip install chdb-mcp
Connect
Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{ "mcpServers": { "chdb": { "command": "chdb-mcp" } } }
Cursor / VS Code — same JSON in ~/.cursor/mcp.json etc.; one-click badges land in v0.2.
Codex CLI / Claude Code / Copilot / Droid — use the cross-IDE bundle chdb-agent-plugin.
Tools (v0.1)
| Tool | Description |
|---|---|
query(sql, format) |
Run any read-only SQL on the in-process session |
list_databases() |
Enumerate visible databases |
list_tables(database) |
List tables in a database |
describe_table(database, table) |
Column types for a table |
query_file(path, sql, format) |
Query a Parquet/CSV/JSON file via the {file} placeholder |
get_sample_data(database, table, limit) |
First N rows of a table |
list_functions(pattern) |
List ClickHouse SQL functions (optional substring filter) |
Read-only by default — SET readonly=2 blocks INSERT/CREATE/DROP/ALTER while keeping file()/url()/s3() usable. Set CHDB_MCP_WRITE=1 to drop the guard. See Security model.
In query_file, {file} is replaced with file('path', 'format') before execution:
query_file(
path="/data/sales.parquet",
sql="SELECT region, sum(revenue) FROM {file} GROUP BY region",
format="Parquet",
)
Configuration
| Variable | Default | Effect |
|---|---|---|
CHDB_MCP_WRITE |
unset | If 1, allows INSERT/CREATE/DROP/ALTER |
CHDB_MCP_MAX_RESULT_BYTES |
1048576 |
Per-tool result cap. Enforced engine-side (max_result_bytes + result_overflow_mode='break') plus a final Python slice. |
CHDB_MCP_QUERY_TIMEOUT_SEC |
30 |
Wall-clock cap per query (chDB max_execution_time). 0 disables. |
CHDB_MCP_FILE_ALLOWLIST |
empty (unrestricted) | :-separated path prefixes. Opt-in isolation switch — when set, query_file() rejects paths outside the prefixes, and query() rejects external table functions (file/url/s3/remote/hdfs/mongodb/...). When unset, no filesystem gating happens — the host process is trusted. |
CHDB_MCP_SESSION_PATH |
empty | Persistent session directory (default: ephemeral) |
Security model
chDB is in-process. There is no privilege boundary between the MCP server and the host Python interpreter, so the server can't make stronger isolation guarantees than the host already gives it. The model below reflects that.
Trust tiers
- Default (no
CHDB_MCP_FILE_ALLOWLIST) — no filesystem gating.query()andquery_file()can reach anything the host process can reach (anyfile(),url(),s3(),remote()...). Appropriate when the agent is trusted, or when the surrounding host application enforces the security boundary itself. - Opt-in allowlist (
CHDB_MCP_FILE_ALLOWLIST=/data:/tmp/foo) — best-effort defense in depth:query_file()rejects paths whose resolved (symlink-followed) form isn't under any listed prefix.- Both
query()andquery_file()reject SQL containing any table function that isn't on the safe-by-construction list (numbers/values/view/merge/dictionary/generateRandom/...). The "known" set is snapshotted fromsystem.table_functionsat session start, so the gate stays in sync with whatever the running chDB build actually exposes — including new external-source variants (paimon*,prometheusQuery*,iceberg*Azure/S3/HDFS), RCE-class functions (executable,python), and*Clustersiblings, without a hand-maintained denylist that goes stale. - For
query_file(), the scan runs on the user SQL before the{file}placeholder substitution, so aUNION ALL SELECT … FROM file('/etc/passwd', …)smuggled into the query body is caught even though the explicit path is gated. - The scanner is comment- and string-aware (single-pass mask covering line comments, block comments, single-quoted strings with
''/\'/\\escapes), and it normalizes backtick- and double-quote-wrapped identifiers (`file`/"file") before matching so quoted function names can't bypass it. - This is not a sandbox: a determined caller can still try to exfiltrate via undiscovered functions, settings, or future chDB features. Strong enough for casual agent mistakes, not for adversarial input.
- Hard isolation — for adversarial input, wrap the server in OS-level confinement: macOS App Sandbox, Linux user namespaces / seccomp, or Docker with a read-only filesystem mount. Nothing at the MCP layer can substitute for this.
What's protected
- Accidental writes —
SET readonly=2is applied at session start.CHDB_MCP_WRITE=1lifts it. (Note: ClickHouse'sreadonly=2still permitsTEMPORARY TABLEwrites and runtimeSETchanges — by design, not a bug.) - Runaway result sizes —
CHDB_MCP_MAX_RESULT_BYTESis enforced engine-side (max_block_size+max_result_bytes+result_overflow_mode='break'), not just as a post-hoc string slice. Large queries no longer materialize multi-MiB in chDB before truncation. - Runaway wall-clock —
CHDB_MCP_QUERY_TIMEOUT_SEC(default 30s) caps each query via chDB'smax_execution_time. - SQL-identifier injection —
list_tables/describe_table/get_sample_dataarguments are whitelist-regex'd ([A-Za-z_][A-Za-z0-9_]*only) and backtick-quoted before interpolation. - SQL string-literal escape —
list_functions(pattern)andquery_file(path, format)arguments are passed throughquote_string, which escapes both single quotes ('→'') and backslashes (\→\\) so that ClickHouse's\'escape form cannot break out of the literal.
What's NOT protected
- SQL audit. Only the readonly guard — no allow/deny list of statements. Treat the agent as having full
SELECTaccess to anything chDB can reach (subject to the allowlist when set). - Setting tampering. Under
readonly=2, the agent can stillSET max_memory_usage = …to raise resource caps. Lock this down at the host or via OS-level resource limits if it matters. - Memory / CPU caps. chDB's
max_memory_usageapplies, but there's noulimit/cgroupsequivalent imposed by the MCP layer.
For agents acting on untrusted input, run in a throwaway container.
Roadmap
- v0.5 —
query_remote_clickhouse()federation tool - v1.0 —
attach_file(),dataframe_query()(zero-copyPython(df)), HTTP/SSE transport with Bearer auth,.mcpbbundle for Claude Desktop one-click install
Troubleshooting
macOS: "Server disconnected" in Claude Desktop
If ~/Library/Logs/Claude/mcp-server-chdb.log shows PermissionError: Operation not permitted on pyvenv.cfg, your venv sits under a TCC-protected directory (~/Downloads, ~/Documents, ~/Desktop) — Claude Desktop subprocesses can't read those paths.
Fix: install elsewhere. Recommended is uvx (zero-config, isolated under ~/.local/share/uv/):
{ "mcpServers": { "chdb": { "command": "uvx", "args": ["chdb-mcp"] } } }
Or build a venv yourself under ~/.local/share/chdb-mcp/.venv and point Claude Desktop at its chdb-mcp binary.
query_file returns "path is not under any prefix"
The allowlist resolves symlinks on both sides (so /tmp matches /private/tmp on macOS). If you still hit this, check the resolved form printed in the error against python -c "from pathlib import Path; print(Path('YOUR_PATH').resolve())".
"Cannot execute query in readonly mode"
SET readonly=2 blocks DDL/DML by design. Rewrite as a pure SELECT, or restart with CHDB_MCP_WRITE=1.
Per-server logs
~/Library/Logs/Claude/mcp-server-chdb.log # startup diagnostics + stderr
~/Library/Logs/Claude/mcp.log # all servers' JSON-RPC traffic
Development
git clone https://github.com/chdb-io/chdb-mcp && cd chdb-mcp
pip install -e ".[dev]"
pytest && ruff check src tests
License
Apache 2.0 — see LICENSE.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。