MCP 服务器

blackwall-mcp

A pre-action risk gate for AI agents. Your agent calls the forecast tool before any irreversible action — send email, run SQL, make a payment, delete a file — and gets a risk score (0–100) and a GO / CONFIRM / STOP verdict in a few seconds.

README

blackwall-mcp

A guardrail for AI agents, as an MCP server. Your agent calls one tool — forecast — before any irreversible action (send email, move money, run SQL, delete data, post content). It gets back a risk score (0–100), a reversibility class, a GO / CAUTION / STOP recommendation, and named red flags in a few seconds (~4-8s).

Works in any MCP host: Claude Desktop, Claude Code, Cursor, Windsurf, and any agent framework with MCP support.

The wall between your agent and disaster. A BLUETIER product.

1. Get an API key

Sign up free at https://blackwalltier.com → Dashboard → API keys → Create key. Free tier: ~100 forecasts/month, no card. Your key looks like bw_live_….

2. Add the server to your MCP host

Claude Desktop

Edit claude_desktop_config.json (Settings → Developer → Edit Config):

{
  "mcpServers": {
    "blackwall": {
      "command": "npx",
      "args": ["-y", "blackwall-mcp"],
      "env": { "BLACKWALL_API_KEY": "bw_live_your_key_here" }
    }
  }
}

Restart Claude Desktop. You'll see a forecast tool available.

Cursor

Settings → MCP → Add new global MCP server, then in mcp.json:

{
  "mcpServers": {
    "blackwall": {
      "command": "npx",
      "args": ["-y", "blackwall-mcp"],
      "env": { "BLACKWALL_API_KEY": "bw_live_your_key_here" }
    }
  }
}

Claude Code

claude mcp add blackwall -e BLACKWALL_API_KEY=bw_live_your_key_here -- npx -y blackwall-mcp

Run locally (any host / testing)

BLACKWALL_API_KEY=bw_live_your_key_here npx -y blackwall-mcp

3. Use it

Once added, instruct your agent: "Before any irreversible action, call the forecast tool and stop if it returns STOP." The model will call it automatically when it's about to do something risky.

The `forecast` tool

Parameter	Type	Required	Description
`action`	string	✅	The action type, e.g. `send_email`, `make_payment`, `run_sql`, `delete_file`, `post_content`
`inputs`	object	✅	Concrete parameters: recipient, `amount_usd`, SQL `statement`, file path, message body, URL, etc.
`context`	object	—	Optional: `{ agent_role, user_intent, environment }`
`depth`	`standard` \| `deep`	—	Analysis depth. `standard` is the default.

Returns: recommendation (GO/CAUTION/STOP), risk_score (0–100), reversibility (class + rollback cost), gate (proceed/confirm/human-required), confidence, red_flags[], predicted_result, alternative_actions[].

Example

Agent about to run DELETE FROM users; (no WHERE clause) →

🛑 BLACK_WALL: STOP — risk 99/100
Red flags:
  • [CRITICAL] SQL_NO_WHERE — deletes the entire table, not one row
  • [CRITICAL] INTENT_MISMATCH — intent was "remove a single test row"
  • [CRITICAL] IRREVERSIBLE_NO_BACKUP — no recovery path
Guidance: DO NOT take this action. Surface the red flags to the user.

Observe mode — try it with zero risk

Not ready to let a guardrail block your agents? Start in observe mode. It scores and logs every action but never tells the agent to stop — your agents behave exactly as they do today. After a week, review your dashboard and see what it would have caught.

{
  "mcpServers": {
    "blackwall": {
      "command": "npx",
      "args": ["-y", "blackwall-mcp"],
      "env": {
        "BLACKWALL_API_KEY": "bw_live_your_key_here",
        "BLACKWALL_MODE": "observe"
      }
    }
  }
}

Then see "what your agents almost did" in your dashboard. Flip BLACKWALL_MODE to enforce (or just remove it — enforce is the default) when you're ready to actually block.

Config reference

Env var	Required	Default	Notes
`BLACKWALL_API_KEY`	✅	—	`bw_live_…` from your dashboard
`BLACKWALL_BASE_URL`	—	`https://blackwalltier.com`
`BLACKWALL_MODE`	—	`enforce`	`observe` = log only, never block