MCP 服务器

database-mcp

Automatically discovers database schema, performs data quality checks on tables and columns, and generates natural-language root cause analysis reports using Ollama LLM.

README

database-mcp

A generic MCP server that connects to a DuckDB database, auto-inspects the schema, runs data quality checks across every table and column, and uses a local Ollama LLM to generate a natural-language Root Cause Analysis (RCA) report.

No table names or column names are ever hardcoded. Everything is discovered at runtime from the connection config alone.

Features

Auto-discovery — pass connection details, the server lists all tables; pick one and it figures out every column and type
Type-aware checks — numeric columns get distribution stats + Z-score thresholds; VARCHAR columns get cardinality + top values; TIMESTAMP columns get gap detection
Ollama ReAct loop — llama3.2 (default) iteratively calls tools to drill down, then writes a plain-English RCA report
MCP tools — usable directly from any MCP client (Claude, etc.)
REST API — thin FastAPI layer for programmatic access

Stack

Layer	Tool
MCP framework	FastMCP
Database	DuckDB
LLM	Ollama (`llama3.2` default, `mistral:7b` optional)
REST API	FastAPI + Uvicorn
Tests	Plain Python scripts (`python tests/test_*.py`)

Installation

git clone https://github.com/hargurjeet/database-mcp.git
cd database-mcp

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

pip install -r requirements.txt
cp .env.example .env

Ollama must be running locally:

ollama pull llama3.2
ollama serve

Configuration

Edit .env:

OLLAMA_MODEL=llama3.2          # or mistral:7b
OLLAMA_BASE_URL=http://localhost:11434
REPORTS_PATH=./data/reports/

Usage

1. MCP server

python mcp_server/server.py

Available tools:

Tool	What it does
`tool_list_tables`	Lists all tables — call this first
`tool_get_schema`	Columns + types for a table
`tool_check_null_rates`	Null % per column
`tool_get_row_count`	Total row count
`tool_get_distribution_stats`	Mean / std / min / max for a numeric column
`tool_get_cardinality`	Distinct count + top values for a VARCHAR column
`tool_detect_timestamp_gaps`	Gap analysis for a TIMESTAMP column
`tool_run_full_check`	Runs all applicable checks — returns full summary

All tools accept a config_json string:

{"db_type": "duckdb", "db_path": "./data/warehouse.db", "table": "trips"}

table is only required for table-specific tools. tool_list_tables needs only the connection fields.

2. REST API

uvicorn api.main:app --reload
# Swagger UI at http://localhost:8000/docs

Method	Endpoint	Body / Params	What it does
`GET`	`/tables`	`?db_path=./data/warehouse.db`	List all tables
`POST`	`/check/{table}`	`{"db_type":"duckdb","db_path":"..."}`	Full quality check, returns JSON
`POST`	`/rca/{table}`	`{"db_type":"duckdb","db_path":"..."}`	Full check + Ollama RCA, saves Markdown report
`GET`	`/report/{table}`	—	Retrieve last saved RCA report

Example:

# List tables
curl "http://localhost:8000/tables?db_path=./data/warehouse.db"

# Run full quality check
curl -X POST http://localhost:8000/check/trips \
  -H "Content-Type: application/json" \
  -d '{"db_type":"duckdb","db_path":"./data/warehouse.db"}'

# Generate RCA report (requires Ollama)
curl -X POST http://localhost:8000/rca/trips \
  -H "Content-Type: application/json" \
  -d '{"db_type":"duckdb","db_path":"./data/warehouse.db"}'

3. Run the agent directly

python agent/dispatcher.py '{
  "db_type": "duckdb",
  "db_path": "./data/warehouse.db",
  "table": "trips"
}'

Report is printed to stdout and saved to data/reports/trips_rca.md.

Tests

python tests/test_null_tools.py
python tests/test_schema_tools.py
python tests/test_distribution_tools.py
python tests/test_volume_tools.py
python tests/test_cardinality_tools.py
python tests/test_timestamp_tools.py
python tests/test_api.py

23 tests total. All use in-memory DuckDB — no external dependencies required.

Project structure

database-mcp/
├── api/
│   ├── main.py              # FastAPI app
│   └── routes.py            # Route handlers
│
├── mcp_server/
│   ├── server.py            # FastMCP entrypoint + tool registration
│   ├── introspector.py      # Schema → check plan mapping
│   ├── connectors/
│   │   ├── base.py          # Abstract connector interface
│   │   └── duckdb_connector.py
│   └── tools/
│       ├── schema_tools.py
│       ├── null_tools.py
│       ├── volume_tools.py
│       ├── distribution_tools.py
│       ├── cardinality_tools.py
│       └── timestamp_tools.py
│
├── agent/
│   ├── dispatcher.py        # Ollama ReAct loop
│   ├── ollama_client.py
│   └── prompts.py
│
├── data/reports/            # Saved RCA reports
├── docs/session_log.md      # Full development history
└── tests/

Roadmap

Phase	Status
Phase 1 — DuckDB + core tools + Ollama loop	Complete
Phase 2 — PostgreSQL / MySQL connectors	Skipped
Phase 3 — Prefect scheduled scans	Skipped
Phase 4 — REST API	Complete