assay-mcp-server

assay-mcp-server

Assay is a fail-closed policy and evidence layer for MCP tool execution. The MCP server exposes policy checks and trace/coverage helpers for reviewing tool calls before or after agent workflows run.

Category
访问服务器

README

<p align="center"> <h1 align="center">Assay</h1> <p align="center"> <strong>Policy-as-code for MCP agents: enforce what a tool call can do, prove what it did, and stay honest about what you can't.</strong><br /> <span>A deterministic, fail-closed policy gate for MCP tool calls, with real kernel-level (eBPF/LSM) enforcement on Linux and offline-verifiable evidence. CI-native, no backend, bounded by design.</span> </p> <p align="center"> <a href="https://crates.io/crates/assay-cli"><img src="https://img.shields.io/crates/v/assay-cli.svg" alt="Crates.io"></a> <a href="https://github.com/Rul1an/assay/actions/workflows/ci.yml"><img src="https://github.com/Rul1an/assay/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://github.com/Rul1an/assay/blob/main/LICENSE"><img src="https://img.shields.io/crates/l/assay-core.svg" alt="License"></a> </p> <p align="center"> <a href="#try-it-in-30-seconds">Quickstart</a> · <a href="#enforce-prove-stay-honest">How it works</a> · <a href="#see-it-work">See it work</a> · <a href="examples/mcp-quickstart/">MCP example</a> · <a href="docs/guides/github-action.md">CI guide</a> · <a href="docs/security/OWASP-MCP-TOP10-MAPPING.md">OWASP MCP Top 10</a> · <a href="https://github.com/Rul1an/assay/discussions">Discussions</a> </p> </p>


In 2026 agents got real tool access through MCP, and the attacks came with it: tool poisoning, rug pulls, confused-deputy OAuth, dozens of CVEs in the first months alone. Most tools scan a server or filter a prompt. Assay sits at the tool-call boundary and does three things, in order.

Enforce, prove, stay honest

  • Enforce. A deterministic, fail-closed policy gate decides every MCP tools/call before it runs, with the precise reason for each allow or deny. On Linux it adds real kernel-level enforcement: a proven IPv4/TCP connect-egress block (eBPF/LSM) and a Landlock TCP-connect port allowlist, both opt-in and fail-closed. A policy it cannot express exactly is refused, never half-applied.
  • Prove. Every decision and observed effect becomes an offline-verifiable, tamper-evident evidence bundle, alongside pinned per-call carriers: the verdict, the pre-call establish journey, and declared-vs-observed tool-annotation conformance. All reviewable in CI, with no hosted backend.
  • Stay honest. Each claim carries its basis (verified, self_reported, inferred, or absent), and a gate refuses to let a claim exceed what was actually observed. A tool returning "success" is the provider's assertion, never proof, until evidence confirms it. Assay ships no single safety score and never claims more than it can prove.

Try it in 30 seconds

cargo install assay-cli

mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt

assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
  -- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo
✅ ALLOW  read_file  path=/tmp/assay-demo/safe.txt  reason=policy_allow
❌ DENY   read_file  path=/tmp/outside-demo.txt      reason=path_constraint_violation
❌ DENY   exec       cmd=ls                          reason=tool_denied

Assay decides each MCP tool call before it runs, fail-closed, with the reason

Wire it into Cursor, Claude Code, or Codex in one line with assay mcp config-path <editor>. New to the threat model? Start with the OWASP MCP Top 10 mapping, which lays out, per risk, what Assay covers and what it deliberately does not.


Use Assay if you already have machine-readable AI outcomes or agent tool-call tests and want a small reviewable artifact boundary in CI.

Start with the path that matches what you already have:

You have Use this when What you get Next click
Promptfoo JSONL from CI evals You want smaller PR evidence than a full eval export Eval outcome receipts, verified bundle, Trust Basis diff Promptfoo JSONL
OpenFeature boolean EvaluationDetails You want CI evidence for a runtime flag decision boundary Decision receipt, verified bundle, Trust Basis diff OpenFeature EvaluationDetails
CycloneDX ML-BOM model component You want CI evidence for the model inventory/provenance boundary that existed Inventory receipt, verified bundle, Trust Basis diff CycloneDX ML-BOM
MCP tool calls You are ready to put a policy file around tool execution Allow/deny audit trail and evidence for observed tool behavior MCP Quick Start
A GitHub PR gate You want CI to block regressions from checked artifacts Trust Basis diff, gate status, SARIF/JUnit-ready output CI Guide
A Runner archive or coverage annotation from an observed run You want to know what the observed evidence can and cannot support before trusting a side-effect claim Coverage descriptors, claim-class cells (strength x basis), and a claimed-vs-observed check Coverage-honesty walkthrough

The core workflow is intentionally small: import or record a bounded outcome, bundle and verify it, compile trust-basis.json, then gate the Trust Basis diff. Assay does not make the upstream tool the source of truth; it makes the evidence boundary inspectable.

For observed runtime evidence specifically, the same boundary discipline runs end to end: a coverage descriptor declares what the capture can and cannot support, claim-class cells record each claim as claim_strength x claim_basis, and a gate refuses to let a claim exceed what was observed. See the coverage-honesty walkthrough and the claim-class semantics.

For privileged tool actions specifically, the MCP proxy records each observed tools/call as a structured tool-decision (assay.tool_decision_surface.v0): the privileged in-application actions kernel and network enforcement cannot see, such as a deploy key added or a workspace member invited. Rule-based classifiers tag the action and project a target with sensitive ids hashed and raw arguments never stored, and the shape keeps the asserted-versus-verified line honest: a tool returning success is the provider's assertion, never proof, until independently checked audit evidence confirms it. See tool-decision surface and credential-scope.

Trust Basis Gate
Status: OK
Bundles verified: 1
Regressed claims: 0

Assay is not a trust-score engine, a generic eval dashboard, or a hosted observability product. See What Assay is and is not for the boundary.

Is This For Me?

Yes, if you:

  • already have eval output, runtime decisions, inventory artifacts, or MCP tool-call tests
  • want a CI review artifact instead of a dashboard-only result
  • need bounded auditability, not a scalar trust badge

Not yet, if you:

  • need Assay to judge model correctness or policy quality for you
  • want a hosted dashboard as the primary product
  • want a compliance claim instead of a bounded evidence boundary

Install

cargo install assay-cli

CI: GitHub Action. Python SDK: pip install assay-it.

No hosted backend. No API keys for core flows. Deterministic: same input, same decision.

v3.21.0 runtime enforcement (Linux): assay sandbox --enforce-net enforces a TCP-connect port allowlist with Landlock, a second kernel route beside the connect4/eBPF egress path, denying any TCP connect to a non-allowlisted port. It records the outcome in a separate assay.enforcement_health.v1 artifact, and --probe-enforcement adds a per-run real-block check (a denied connect blocked with EACCES, the harness listener never reached). Enforcement is opt-in and fail-closed: a network policy it cannot express as an explicit port allowlist is refused rather than partially applied, and a requested health artifact that cannot be written is an error, never a silent absence. It is bounded by design and makes no IP/CIDR, hostname, UDP, or QUIC claim. See CHANGELOG.md for the full release notes.

<details> <summary>Evidence levels and non-goals</summary>

Trust claims use explicit epistemology, not a single “safety score”:

Level Meaning
verified Backed by direct evidence or offline verification in the bundle/path
self_reported Emitted by the system without stronger independent corroboration
inferred Derived from bounded, documented rules
absent No trustworthy evidence supports the claim

Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.

</details>

What ships today

Output Role
Policy gate MCP wrap — deterministic allow/deny before tools run (see CLI note below the diagram).
Evidence bundle Offline-verifiable, tamper-evident archive for audit and replay.
External receipts Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts.
Trust Basis Canonical trust-basis.json — bounded claim classification from verified bundles.
Trust Card trustcard.json / trustcard.md / trustcard.html — same claims, review-friendly artifacts.
SARIF / CI GitHub Action, Security tab integration, policy gates on PRs.
Coding-agent governance Run a coding agent under assay sandbox; emit its observed effects as an evidence bundle (--bundle) or OTel execute_tool spans (--otel-jsonl).
Attestation Export a bundle as an in-toto / DSSE statement (v0), anchor-pluggable.

Repository truth: release notes and CHANGELOG.md remain the authority for what is actually public. main may carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.

  Agent ──► Assay ──► MCP Server
              │
              ├─ ✅ ALLOW / ❌ DENY  (policy)
              ├─► 📋 Evidence bundle (verifiable)
              └─► 📊 Trust Basis → Trust Card → SARIF / CI

CLI: MCP runtime commands live under assay mcp. Use assay mcp --help, assay mcp wrap …, assay mcp discover, assay mcp kill, or follow the MCP Quickstart.

A boundary, not a category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.

See It Work

An agent tries a privileged action — github.add_deploy_key — through the enforcing proxy. Assay decides per call before it forwards and writes a replayable evidence record. One command, offline, against a local mock (no real credentials, no real GitHub call):

privileged-action PR-gate demo

cd examples/privileged-action-gate && ./run.sh
❌ DENY   github.add_deploy_key  reason=no_declared_allowance
❌ DENY   github.add_deploy_key  reason=credential_scope_insufficient
❌ DENY   github.add_deploy_key  reason=manifest_drifted_since_approval
✅ ALLOW  github.add_deploy_key  reason=allow
✅ ALLOW  github.add_deploy_key  reason=allow  + conformance: mismatched (declared_read_only_observed_mutating)  [separate, non-gating]

A deny is fail-closed caution, not a verdict on intent; an allow is the decision to forward, never proof the action happened. The last line is separate evidence — the tool declared itself read-only while the observed call was mutating — recorded beside the verdict, never a gate. Full walkthrough: examples/privileged-action-gate/.

A simpler first example

SafeSkill 72/100

cargo install assay-cli

mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt

assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
  -- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo
✅ ALLOW  read_file  path=/tmp/assay-demo/safe.txt  reason=policy_allow
✅ ALLOW  list_dir   path=/tmp/assay-demo/           reason=policy_allow
❌ DENY   read_file  path=/tmp/outside-demo.txt      reason=path_constraint_violation
❌ DENY   exec       cmd=ls                          reason=tool_denied

Inspect the audit artifact:

assay evidence show demo/fixtures/bundle.tar.gz

Evidence Bundle Inspector

The bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.

Trust artifacts from a verified bundle

After a bundle verifies, compile the claim artifact:

# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json

trust-basis.json is the canonical output for CI and review. Claim id values are stable across runs; consumers should key by id, not row count or order. It is not a scalar trust score.

The current claim-visible receipt families are Promptfoo assertion-component results, OpenFeature boolean EvaluationDetails, and CycloneDX ML-BOM model components. See the receipt-family matrix, the three-family note, and Evidence Receipts in Action.

<details> <summary>Trust Card details</summary>

assay trust-card generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# -> trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.html

The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; trustcard.json is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: MIGRATION — Trust Compiler 3.2, receipt-family matrix. Release history belongs in CHANGELOG.md.

</details>

Add to Cursor in 30 Seconds

Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:

assay mcp config-path cursor

It generates JSON like:

{
  "filesystem-secure": {
    "command": "assay",
    "args": [
      "mcp",
      "wrap",
      "--policy",
      "/path/to/policy.yaml",
      "--",
      "npx",
      "-y",
      "@modelcontextprotocol/server-filesystem",
      "/Users/you"
    ]
  }
}

The same wrapped command works in other MCP clients (Claude Code, Codex) — see the editor MCP recipe and MCP Quick Start.

Policy Is Simple

version: "2.0"
name: "my-policy"

tools:
  allow: ["read_file", "list_dir"]
  deny: ["exec", "shell", "write_file"]

schemas:
  read_file:
    type: object
    additionalProperties: false
    properties:
      path:
        type: string
        pattern: "^/app/.*"
        minLength: 1
    required: ["path"]

Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.

See Policy Files.

<details> <summary>Other import paths and protocol adapters</summary>

OpenTelemetry in, canonical evidence out

Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.

assay trace ingest-otel \
  --input otel-export.jsonl \
  --db .eval/eval.db \
  --out-trace traces/otel.v2.jsonl

Assay can also emit observed tool effects as OTel GenAI execute_tool spans carrying the claim-class outcome (assay sandbox --otel-jsonl), so declared and observed sit in one trace. See OpenTelemetry & Langfuse.

Protocol adapters

Assay ships adapters that map protocol events into canonical evidence:

Protocol Adapter What it maps
ACP (OpenAI/Stripe) assay-adapter-acp Checkout events, payment intents, tool calls
A2A (Google) assay-adapter-a2a Agent capabilities, task delegation, artifacts
UCP (Google/Shopify) assay-adapter-ucp Discover/buy/post-purchase state transitions

Adapter crates are workspace / binary-driven, not published as separate crates.io packages.

</details>

Add to CI

# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
  contents: read
  security-events: write
jobs:
  assay:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Rul1an/assay-action@v2

PRs that violate policy get blocked; SARIF can surface in the Security tab.

Why Assay

Canonical evidence Assay’s evidence model is the stable contract; OTel and adapters map into it.
Deterministic Same input, same decision — not probabilistic.
Portable artifacts Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit.
Bounded claims Explicit about what is verified vs visible vs absent — no score-first UX.
MCP-native assay mcp wrap is the fast path; assay mcp discover, assay mcp kill, and assay mcp tool keep the runtime surface grouped. Adapters extend the same engine.
Offline-first No backend required for core enforcement and bundle verification.

<details> <summary>Measured latency</summary>

On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:

  • Main protection run: 0.771ms p50 / 1.913ms p95
  • Fast-path scenario: 0.345ms p50 / 1.145ms p95

These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)

</details>

Learn More

Internal: Assay-Runner

Assay-Runner is an internal measured-run subsystem used by Assay's delegated Linux/eBPF acceptance path. It is not a standalone product. As of Phase 2D, the runner candidate is split into extraction-ready Rust crates (assay-runner-schema, assay-runner-core, assay-runner-linux) — all publish = false — plus the runner-fixtures/ package tree (Node fixture marked "private": true; Python fixture has no distribution surface). Everything stays inside this repository.

No release commitment. No timeline. No external demand has been measured.

Research, mappings & experiments

Bounded context: numbers below support mapping and experiments, not a product “security score.”

  • OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
  • Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
  • Security experiments — attack vectors and harness notes (methodology matters more than headline counts).
  • MCP tool evidence-binding quickstart — synthetic description→call→effect binding with bounded claims. Experiment-scoped; not a poisoning detector, and distinct from the supported MCP policy quickstart above.

Contributing

cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings

See CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.

License

MIT

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选