Debate Agent MCP
Enables multi-agent code review with P0/P1/P2 severity scoring by orchestrating locally installed AI CLIs (Claude, Codex) to perform parallel analysis, deterministic scoring, and consensus-building on git diffs.
README
Debate Agent MCP
EXPERIMENTAL: This project is in active development. APIs and features may change without notice. Use at your own risk in production environments.
A multi-agent debate framework for code review and debate planning with P0/P1/P2 severity scoring.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ DEBATE AGENT MCP │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ MCP SERVER LAYER │ │
│ │ (Model Context Protocol) │ │
│ │ │ │
│ │ Exposes tools via stdio to Claude Code / AI assistants: │ │
│ │ • list_agents • read_diff • run_agent │ │
│ │ • debate_review • debate_plan │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATOR LAYER │ │
│ │ (@debate-agent/core) │ │
│ │ │ │
│ │ Pipeline: │ │
│ │ 1. Read git diff ──► 2. Run agents in parallel (Promise.all) │ │
│ │ 3. Critique round ──► 4. Deterministic scoring ──► 5. Merge │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ ▼ ▼ │
│ ┌──────────────────────────┐ ┌──────────────────────────┐ │
│ │ Claude CLI │ │ Codex CLI │ │
│ │ /opt/homebrew/bin/claude│ │ /opt/homebrew/bin/codex │ │
│ │ │ │ │ │
│ │ spawn() as subprocess │ │ spawn() as subprocess │ │
│ │ Uses YOUR credentials │ │ Uses YOUR credentials │ │
│ └──────────────────────────┘ └──────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ Anthropic API OpenAI API │
│ (auth via local CLI) (auth via local CLI) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
How It Works
No Authentication Required
The MCP itself requires no API keys or authentication. It orchestrates your locally installed CLI tools:
┌─────────────────────────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ ~/.claude/credentials ──► claude CLI ──► Anthropic API │
│ ~/.codex/credentials ──► codex CLI ──► OpenAI API │
│ │
│ The MCP just runs: spawn("claude", ["--print", prompt]) │
│ Same as typing in your terminal! │
│ │
└─────────────────────────────────────────────────────────────────┘
Execution Flow
Step 1: Build Prompt
├── Combine review question + git diff + platform rules
├── Add P0/P1/P2 severity definitions
└── Request JSON output format
Step 2: Parallel Execution
├── spawn("/opt/homebrew/bin/claude", ["--print", prompt])
├── spawn("/opt/homebrew/bin/codex", ["exec", prompt])
└── Both run simultaneously via Promise.all()
Step 3: Capture Output
├── Read stdout from each CLI process
└── Parse JSON responses
Step 4: Deterministic Scoring (No AI)
├── Count P0/P1/P2 findings
├── Check file accuracy against diff
├── Penalize false positives
└── Score clarity and fix quality
Step 5: Merge & Report
├── Pick winner by highest score
├── Combine unique findings from all agents
└── Generate final recommendation
Roadmap
Current (v1.0) - Single Review Round
Claude ──┐
├──► Parallel Review ──► Score ──► Merge ──► Final Report
Codex ──┘
Future Goal - Multi-Turn Cross-Review
Eliminate hallucinations through adversarial validation
Round 1: Initial Review (Parallel)
┌─────────┐ ┌─────────┐
│ Claude │ │ Codex │
│ Review │ │ Review │
└────┬────┘ └────┬────┘
│ │
▼ ▼
Round 2: Cross-Review (Each agent reviews the other's findings)
┌─────────────────────────────────────────┐
│ Claude reviews Codex's findings │
│ "Is P0 about null pointer valid?" │
│ "Did Codex miss the SQL injection?" │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Codex reviews Claude's findings │
│ "Is the race condition real?" │
│ "False positive on line 42?" │
└─────────────────────────────────────────┘
│ │
▼ ▼
Round 3: Consensus Building
┌─────────────────────────────────────────┐
│ Only findings validated by BOTH agents │
│ Hallucinations eliminated │
│ Disputed findings flagged for human │
└─────────────────────────────────────────┘
│
▼
Final: Validated Review
┌─────────────────────────────────────────┐
│ High-confidence findings (both agreed) │
│ Disputed findings (need human review) │
│ Eliminated findings (proven false) │
│ Combined score from validation rounds │
└─────────────────────────────────────────┘
Goal: By having agents review each other's work, we can:
- Eliminate hallucinated findings (one agent invents issues that don't exist)
- Catch missed issues (one agent finds what the other missed)
- Build confidence scores (findings validated by multiple agents are more reliable)
- Reduce false positives (adversarial review catches incorrect assessments)
Packages
| Package | Description | Install |
|---|---|---|
@debate-agent/core |
Core logic (framework-agnostic) | npm i @debate-agent/core |
@debate-agent/mcp-server |
MCP server for CLI users | npm i -g @debate-agent/mcp-server |
debate-agent-mcp |
VS Code extension | Install from marketplace |
Quick Start
Prerequisites
You must have the agent CLIs installed and authenticated:
# Check Claude CLI
claude --version
claude auth status # Should show logged in
# Check Codex CLI
codex --version
# Should be authenticated via OpenAI
# The MCP will spawn these - no additional auth needed
For CLI Users
# Install globally
npm install -g @debate-agent/mcp-server
# Start MCP server
debate-agent
# Or run directly
npx @debate-agent/mcp-server
For Claude Code
# Add MCP to Claude Code
claude mcp add debate-reviewer -- node /path/to/packages/mcp-server/dist/index.js
# Verify connection
claude mcp list
# Should show: debate-reviewer: ✓ Connected
For SDK Users
npm install @debate-agent/core
import { runDebate, createDebatePlan } from '@debate-agent/core';
// Run code review debate
const result = await runDebate({
question: 'Review this code for security issues',
agents: ['codex', 'claude'],
platform: 'backend',
});
// Create debate plan
const plan = createDebatePlan('Best caching strategy', ['codex', 'claude'], 'collaborative', 2);
MCP Tools
| Tool | Description |
|---|---|
list_agents |
List all configured agents |
read_diff |
Read uncommitted git diff |
run_agent |
Run a single agent with prompt |
debate_review |
Multi-agent P0/P1/P2 code review |
debate_plan |
Create structured debate plan |
Configuration
Create debate-agent.config.json in your project root:
{
"agents": {
"codex": {
"name": "codex",
"path": "/opt/homebrew/bin/codex",
"args": ["exec", "--skip-git-repo-check"],
"timeout_seconds": 180
},
"claude": {
"name": "claude",
"path": "/opt/homebrew/bin/claude",
"args": ["--print", "--dangerously-skip-permissions"],
"timeout_seconds": 180
},
"gemini": {
"name": "gemini",
"path": "/opt/homebrew/bin/gemini",
"args": ["--prompt"],
"timeout_seconds": 180
}
},
"debate": {
"default_agents": ["codex", "claude"],
"include_critique_round": true,
"default_mode": "adversarial"
}
}
Severity Levels
| Level | Criteria |
|---|---|
| P0 | Breaking defects, crashes, data loss, security/privacy problems, build blockers |
| P1 | Likely bugs/regressions, incorrect logic, missing error-handling, missing tests |
| P2 | Minor correctness issues, small logic gaps, non-blocking test gaps |
Defined in: packages/core/src/prompts/review-template.ts
Platform-Specific Rules
| Platform | Focus Areas |
|---|---|
| flutter | Async misuse, setState, dispose(), BuildContext in async, Riverpod leaks |
| android | Manifest, permissions, ProGuard, lifecycle violations, context leaks |
| ios | plist, ATS, keychain, signing, main thread UI, retain cycles |
| backend | DTO mismatch, HTTP codes, SQL injection, auth flaws, rate limiting |
| general | Null pointers, resource leaks, race conditions, XSS, input validation |
Defined in: packages/core/src/prompts/platform-rules.ts
Scoring System
The scoring is deterministic (no AI) - pure rule-based evaluation:
| Criteria | Points | Max |
|---|---|---|
| P0 Finding | +15 | 45 |
| P1 Finding | +8 | 32 |
| P2 Finding | +3 | 12 |
| False Positive | -10 | -30 |
| Concrete Fix | +5 | 25 |
| File Accuracy | +2 | 10 |
| Clarity | 0-10 | 10 |
Maximum possible score: 134 Minimum possible score: -30
Defined in: packages/core/src/engine/judge.ts
Debate Modes
| Mode | Description |
|---|---|
| adversarial | Agents challenge each other's positions |
| consensus | Agents work to find common ground |
| collaborative | Agents build on each other's ideas |
Project Structure
debate-agent-mcp/
├── packages/
│ ├── core/ # @debate-agent/core
│ │ ├── src/
│ │ │ ├── engine/
│ │ │ │ ├── debate.ts # Orchestration (parallel execution)
│ │ │ │ ├── judge.ts # Deterministic scoring rules
│ │ │ │ ├── merger.ts # Combine findings from agents
│ │ │ │ └── planner.ts # Debate plan generation
│ │ │ ├── prompts/
│ │ │ │ ├── review-template.ts # P0/P1/P2 definitions
│ │ │ │ └── platform-rules.ts # Platform-specific scrutiny
│ │ │ ├── tools/
│ │ │ │ ├── read-diff.ts # Git diff reader
│ │ │ │ └── run-agent.ts # CLI spawner (spawn())
│ │ │ ├── config.ts # Config loader
│ │ │ ├── types.ts # TypeScript types
│ │ │ └── index.ts # Public exports
│ │ └── package.json
│ │
│ ├── mcp-server/ # @debate-agent/mcp-server
│ │ ├── src/
│ │ │ ├── index.ts # MCP server (stdio transport)
│ │ │ └── bin/cli.ts # CLI entry point
│ │ └── package.json
│ │
│ └── vscode-extension/ # debate-agent-mcp (VS Code)
│ ├── src/
│ │ └── extension.ts
│ └── package.json
│
├── debate-agent.config.json # Example config
├── package.json # Monorepo root
├── pnpm-workspace.yaml
└── README.md
Integration
Claude Desktop
{
"mcpServers": {
"debate-agent": {
"command": "node",
"args": ["/path/to/packages/mcp-server/dist/index.js"]
}
}
}
Claude CLI
claude mcp add debate-agent -- node /path/to/packages/mcp-server/dist/index.js
VS Code / Cursor
Install the VS Code extension - it auto-configures MCP.
Development
# Clone repo
git clone https://github.com/ferdiangunawan/debate-agent-mcp
cd debate-agent-mcp
# Install dependencies
npm install
# Build all packages
npm run build
# Build specific package
npm run build:core
npm run build:server
npm run build:extension
Known Limitations
- Experimental: APIs may change without notice
- Local CLIs required: You must have
claudeandcodexCLIs installed and authenticated - Timeout risks: Long diffs may cause agent timeouts (default 180s)
- No streaming: Currently waits for full response before processing
- Single critique round: Future versions will support multi-turn validation
Contributing
Contributions welcome! Please open an issue first to discuss proposed changes.
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。