Spatial Lakehouse MCP
An Iceberg-native geospatial MCP server powered by DuckDB that provides tools for spatial SQL queries, catalog discovery, and data management. It enables LLM agents to interact with Apache Iceberg lakehouses to perform complex spatial analysis, joins, and aggregations.
README
spatial-lakehouse-mcp
Iceberg-native geospatial MCP server powered by DuckDB.
Provides 18 tools for catalog discovery, spatial queries, analysis, and data management over an Apache Iceberg lakehouse. An LLM agent connects via the Model Context Protocol (MCP) and can explore schemas, run spatial SQL, perform point-in-polygon aggregation, export GeoJSON, and more — all through a single DuckDB connection.
Architecture
MCP Client (LLM)
↕ (Streamable HTTP, port 8082)
spatial-lakehouse-mcp
↕
DuckDB (in-process, :memory:)
├── iceberg extension → LakeKeeper REST Catalog (port 8181)
├── httpfs extension → Garage S3 (port 3900)
└── spatial extension → ST_* geospatial functions
Key design decision: DuckDB v1.4+ natively supports ATTACH to Iceberg REST Catalogs. Once attached, the catalog appears as a regular DuckDB database — SHOW ALL TABLES, DESCRIBE, iceberg_snapshots(), time travel, and full SQL (including spatial functions) all work through a single connection. No pyiceberg dependency.
Infrastructure Stack
| Component | Technology | Default Port |
|---|---|---|
| Object Storage | Garage (S3-compatible) | 3900 |
| Iceberg Catalog | LakeKeeper (REST) | 8181 |
| Catalog Metadata | PostgreSQL | 5432 |
| MCP Server | This project (FastMCP + DuckDB) | 8082 |
Tools (18 total)
Catalog Discovery
| Tool | Description |
|---|---|
list_namespaces |
List schemas in the Iceberg catalog |
list_tables |
List tables, optionally filtered by namespace |
describe_table |
Column names, types, geometry detection |
table_snapshots |
Snapshot history for time-travel queries |
search_tables |
Search tables by name, column, or geometry presence |
Spatial Queries
| Tool | Description |
|---|---|
query |
Read-only SQL with spatial functions |
spatial_filter |
Structured spatial predicates (intersects, within, bbox, within_distance) |
nearest_features |
K-nearest-neighbor search |
get_bbox |
Bounding box / spatial extent |
time_travel_query |
Query at a specific Iceberg snapshot or timestamp |
multi_table_query |
Cross-table analytics with safety rails |
Spatial Analysis
| Tool | Description |
|---|---|
spatial_join |
Join two tables on spatial predicates (intersects, contains, dwithin, etc.) |
aggregate_within |
Point-in-polygon aggregation (count, sum, avg, min, max, stddev) |
buffer_analysis |
Buffer zones with optional dissolve (union) |
Data Management
| Tool | Description |
|---|---|
sample_data |
Preview rows from a table |
table_stats |
Row counts, column stats, geometry summary |
export_geojson |
Export as GeoJSON FeatureCollection |
System
| Tool | Description |
|---|---|
health_check |
Connection, extension, and catalog status |
Project Structure
spatial-lakehouse-mcp/
├── pyproject.toml
├── Dockerfile
├── .env.example
├── validate_stack.py # Pre-flight infrastructure validation
├── src/
│ └── spatial_lakehouse_mcp/
│ ├── __init__.py
│ ├── server.py # FastMCP server + 18 tool definitions
│ ├── config.py # Pydantic settings (SLM_ env prefix)
│ ├── engine.py # DuckDB connection + catalog attachment
│ └── validators.py # SQL safety + input validation
└── tests/
├── conftest.py # Shared fixtures (local DuckDB test data)
└── test_tools.py # 21 tests across all phases
Quick Start (Local Development)
Prerequisites
- Python 3.11+
- A running lakehouse stack (LakeKeeper + Garage + PostgreSQL)
1. Install
git clone https://github.com/aoneil42/Spatial-Lakehouse-MCP.git
cd Spatial-Lakehouse-MCP
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
2. Configure
cp .env.example .env
Edit .env with your infrastructure credentials:
# Required — Garage S3 credentials
GARAGE_KEY_ID=your_garage_key
GARAGE_SECRET_KEY=your_garage_secret
SLM_S3_ENDPOINT=localhost:3900
# Required — LakeKeeper catalog
SLM_CATALOG_URI=http://localhost:8181
SLM_CATALOG_WAREHOUSE=lakehouse
# Auth (leave empty for LakeKeeper allowall dev mode)
SLM_CATALOG_TOKEN=
3. Validate Stack Connectivity
Before running the server, verify the MCP server can reach your infrastructure:
python validate_stack.py
Expected output:
DuckDB Python version: 1.4.4
[1/3 Garage S3 + httpfs] PASS
5 files found, first file has 4096 rows
[2/3 LakeKeeper ATTACH] PASS
ATTACH OK, 7 table(s) found, lakehouse.colorado.lines has 30000 rows
[3/3 Extension coexistence] PASS
All 3 extensions loaded, ST_Point → POINT (-104.99 39.74)
Results: 3 passed, 0 failed
4. Run Tests
pytest -v
Tests run against local in-memory DuckDB tables (no infrastructure required).
5. Start the Server
python -m spatial_lakehouse_mcp.server
The server starts on http://0.0.0.0:8082 using Streamable HTTP transport.
Docker Deployment
docker build -t spatial-lakehouse-mcp .
docker run --env-file .env -p 8082:8082 spatial-lakehouse-mcp
Or add to an existing docker-compose stack:
services:
mcp-server:
build: .
ports:
- "8082:8082"
environment:
GARAGE_KEY_ID: "${GARAGE_KEY_ID}"
GARAGE_SECRET_KEY: "${GARAGE_SECRET_KEY}"
SLM_S3_ENDPOINT: garage:3900
SLM_CATALOG_URI: http://lakekeeper:8181
SLM_CATALOG_WAREHOUSE: lakehouse
depends_on:
lakekeeper:
condition: service_healthy
Configuration Reference
All environment variables use the SLM_ prefix (Spatial Lakehouse MCP), except Garage credentials which use GARAGE_ for compatibility.
| Variable | Default | Description |
|---|---|---|
SLM_CATALOG_URI |
http://localhost:8181 |
LakeKeeper REST endpoint |
SLM_CATALOG_WAREHOUSE |
warehouse |
Warehouse name in LakeKeeper |
SLM_CATALOG_ALIAS |
lakehouse |
DuckDB alias for the attached catalog |
SLM_CATALOG_TOKEN |
(empty) | Bearer token for LakeKeeper auth |
SLM_CATALOG_CLIENT_ID |
(empty) | OAuth2 client ID |
SLM_CATALOG_CLIENT_SECRET |
(empty) | OAuth2 client secret |
SLM_CATALOG_OAUTH2_SCOPE |
(empty) | OAuth2 scope |
SLM_CATALOG_OAUTH2_SERVER_URI |
(empty) | OAuth2 token endpoint |
GARAGE_KEY_ID |
(empty) | Garage S3 access key ID |
GARAGE_SECRET_KEY |
(empty) | Garage S3 secret access key |
SLM_S3_ENDPOINT |
localhost:3900 |
S3 endpoint (host:port, no scheme) |
SLM_S3_REGION |
garage |
S3 region |
SLM_S3_USE_SSL |
false |
Use HTTPS for S3 |
SLM_S3_URL_STYLE |
path |
S3 URL style (path or vhost) |
SLM_MAX_RESULT_ROWS |
100 |
Max rows returned per query |
SLM_QUERY_TIMEOUT_SECONDS |
30 |
Query timeout |
SLM_SERVER_PORT |
8082 |
MCP server port |
Important Notes
ACCESS_DELEGATION_MODE
When running the MCP server outside Docker (on the host), the ATTACH statement uses ACCESS_DELEGATION_MODE 'none'. This bypasses LakeKeeper's remote signing, which otherwise returns S3 URLs containing Docker-internal hostnames (e.g., garage:3900 instead of localhost:3900). The server's local S3 secret provides the correct host-accessible endpoint.
SQL Safety
All user-supplied queries are validated as read-only:
- Only
SELECTandWITH(CTE) queries are allowed - Dangerous keywords (
DROP,DELETE,INSERT,ALTER, etc.) are rejected - Multi-statement queries (
;in the middle) are blocked - Table/column identifiers are validated against injection
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。