MCP 服务器

MCP-Grounded

Multi-agent pipeline for medical image classification with verification-aware abstention, enabling safer predictions by skipping uncertain cases.

README

MCP-Grounded 🩺

A multi-agent pipeline for medical image classification with verification-aware abstention, coordinated via the Model Context Protocol (MCP).

"Instead of always guessing, the AI says — I'm not confident enough, I'll skip this one."

What is this?

MCP-Grounded is a 4-agent AI pipeline that classifies skin lesion images from the HAM10000 dataset. What makes it novel: the final agent can abstain from answering when it isn't confident — making it safer for medical use.

All four agents are real MCP tools, not just described as such.

Pipeline

Skin lesion image
       │
       ▼
┌─────────────────┐
│  BiomedCLIP     │  Agent 1: Extract 512-dim embedding
│  (Extract)      │
└────────┬────────┘
         │
         ▼  ┌─────────────────────────────────────────┐
            │              MCP Server                  │
            │                                          │
            │  ┌──────────┐       ┌──────────┐        │
            │  │ Retrieve │──────▶│  Rerank  │        │
            │  │ Agent 2  │       │  Agent 3 │        │
            │  └──────────┘       └────┬─────┘        │
            │                          │               │
            │                 ┌────────▼──────────┐   │
            │                 │  Verify / Abstain  │   │
            │                 │     Agent 4        │   │
            │                 └────────┬───────────┘   │
            └────────────────────────── │ ─────────────┘
                                        │
                          ┌─────────────┴─────────────┐
                          │                           │
                    conf ≥ τ                     conf < τ
                          │                           │
                       PREDICT                    ABSTAIN

Results

Retrieval Quality

Metric	Value
Recall@1	77.9%
Recall@5	93.5%
Recall@10	96.3%
Recall@50	99.2%

Verification-Aware Abstention (key result)

Threshold τ	Coverage	Selective Accuracy
0.0 (answer all)	100.0%	67.0%
0.5	96.9%	69.0%
0.6	83.4%	77.0%
0.7	52.0%	91.3%
0.8	4.9%	98.6%

At τ = 0.7, selective accuracy improves +24 percentage points over the no-abstention baseline.

Risk–Coverage Curve

Risk-Coverage Curve

As the confidence threshold rises, coverage drops but selective accuracy climbs sharply — proving abstention makes the system safer.

Calibration

Metric	Value
ECE before temperature scaling	0.191
ECE after temperature scaling	0.185
Learned temperature T	0.944

Dataset

HAM10000 — 10,015 dermoscopic images across 7 skin lesion categories:

akiec · bcc · bkl · df · mel · nv · vasc

Split: 70% train / 15% validation / 15% test (stratified).

How to Run

Step 1 — Generate embeddings (Google Colab, GPU)

Open notebook1_embeddings.py in Google Colab with a T4 GPU runtime. Run all cells top to bottom. Downloads HAM10000 and produces embeddings.npz.

Step 2 — Run experiments (Google Colab)

Open notebook2_experiments.py in a new Colab notebook. Upload embeddings.npz. Run all cells. Produces:

All result tables (Recall@K, accuracy, calibration, abstention)
risk_coverage.png
clf_weights.npz

Step 3 — Run the MCP server (local)

pip install "mcp[cli]" numpy torch
python mcp_grounded_server.py

Starts a live MCP server with three callable tools: retrieve, rerank, classify_and_verify.

Requirements

mcp[cli]
numpy
torch
open_clip_torch
scikit-learn
pandas
pillow
tqdm
matplotlib

See requirements.txt.

File Structure

mcp_grounded/
├── notebook1_embeddings.py     # Colab: download HAM10000, extract BiomedCLIP embeddings
├── notebook2_experiments.py    # Colab: retrieval, calibration, abstention experiments
├── mcp_grounded_server.py      # Local: FastMCP server exposing 4 agents as tools
├── risk_coverage.png           # Figure 2: risk-coverage curve
├── requirements.txt
└── README.md

Citation

If you use this work, please cite:

@inproceedings{mcpgrounded2025,
  title     = {MCP-Grounded: A Multi-Agent Pipeline with Verification-Aware Abstention for Medical Image Classification},
  author    = {[Your Name]},
  booktitle = {[Conference Name]},
  year      = {2025}
}