AI MCP Gateway

AI MCP Gateway

Intelligent multi-model orchestrator with dynamic routing that optimizes AI costs by prioritizing free models and escalating to paid tiers only when needed, with stateless architecture using Redis and PostgreSQL.

Category
访问服务器

README

AI MCP Gateway

Cost-Optimized Multi-Model Orchestrator with Stateless Architecture

An intelligent Model Context Protocol (MCP) server and HTTP API that orchestrates multiple AI models (free and paid) with dynamic N-layer routing, cross-checking, cost optimization, and stateless context management via Redis + PostgreSQL.

TypeScript Node.js MCP License


✨ Features

Core Features

  • 🎯 Smart Routing: Dynamic N-layer routing based on task complexity and quality requirements
  • 💰 Cost Optimization: Prioritizes free/cheap models, escalates only when necessary
  • Cross-Checking: Multiple models review each other's work for higher quality
  • 🔧 Code Agent: Specialized AI agent for coding tasks with TODO-driven workflow
  • 🧪 Test Integration: Built-in Vitest and Playwright test runners
  • 📊 Metrics & Logging: Track costs, tokens, and performance
  • 🔄 Self-Improvement: Documents patterns, bugs, and routing heuristics
  • 🛠️ Extensible: Easy to add new models, providers, and tools

NEW: Stateless Architecture

  • 🗄️ Redis Cache Layer: Hot storage for LLM responses, context summaries, routing hints
  • 💾 PostgreSQL Database: Cold storage for conversations, messages, LLM calls, analytics
  • 🌐 HTTP API Mode: Stateless REST API with /v1/route, /v1/code-agent, /v1/chat endpoints
  • 📦 Context Management: Two-tier context with hot (Redis) + cold (DB) layers
  • 🔗 Handoff Packages: Optimized inter-layer communication for model escalation
  • 📝 TODO Tracking: Persistent GitHub Copilot-style TODO lists with Redis/DB storage

📋 Table of Contents


🚀 Quick Start

Prerequisites

  • Node.js >= 20.0.0
  • npm or pnpm (recommended)
  • API keys for desired providers (OpenRouter, Anthropic, OpenAI)
  • Optional: Redis (for caching)
  • Optional: PostgreSQL (for persistence)

Installation

# Clone the repository
git clone https://github.com/yourusername/ai-mcp-gateway.git
cd ai-mcp-gateway

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

# Edit .env and add your API keys and database settings
nano .env

Build

# Build the project
npm run build

# Or run in development mode
npm run dev

🏗️ Architecture

Stateless Design

The AI MCP Gateway is designed as a stateless application with external state management:

┌─────────────────────────────────────────────────┐
│         AI MCP Gateway (Stateless)              │
│  ┌──────────────┐      ┌──────────────┐        │
│  │  MCP Server  │      │  HTTP API    │        │
│  │   (stdio)    │      │  (REST)      │        │
│  └──────┬───────┘      └──────┬───────┘        │
│         │                     │                 │
│         └─────────┬───────────┘                 │
│                   │                             │
│         ┌─────────▼──────────┐                  │
│         │  Routing Engine    │                  │
│         │  Context Manager   │                  │
│         └─────────┬──────────┘                  │
└───────────────────┼─────────────────────────────┘
                    │
        ┌───────────┼───────────┐
        │           │           │
   ┌────▼────┐ ┌───▼────┐ ┌───▼────┐
   │  Redis  │ │  DB    │ │  LLMs  │
   │  (Hot)  │ │(Cold)  │ │        │
   └─────────┘ └────────┘ └────────┘

Two-Tier Context Management

  1. Hot Layer (Redis)

    • Context summaries (conv:summary:{conversationId})
    • Recent messages cache (conv:messages:{conversationId})
    • LLM response cache (llm:cache:{model}:{hash})
    • TODO lists (todo:list:{conversationId})
    • TTL: 30-60 minutes
  2. Cold Layer (PostgreSQL)

    • Full conversation history
    • All messages with metadata
    • Context summaries (versioned)
    • LLM call logs (tokens, cost, duration)
    • Routing rules and analytics
    • Persistent storage

🔄 Dual Mode Operation

The gateway supports two modes:

1. MCP Mode (stdio)

Standard Model Context Protocol server for desktop clients.

npm run start:mcp
# or
npm start

Configure in Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "ai-mcp-gateway": {
      "command": "node",
      "args": ["/path/to/ai-mcp-gateway/dist/index.js"]
    }
  }
}

2. HTTP API Mode

Stateless REST API for web services and integrations.

npm run start:api
# or
MODE=api npm start

API runs on http://localhost:3000 (configurable via API_PORT).


🌐 HTTP API Usage

Endpoints

POST /v1/route

Intelligent model selection and routing.

curl -X POST http://localhost:3000/v1/route \
  -H "Content-Type: application/json" \
  -d '{
    "conversationId": "conv-123",
    "message": "Explain async/await in JavaScript",
    "userId": "user-1",
    "qualityLevel": "normal"
  }'

Response:

{
  "result": {
    "response": "Async/await is...",
    "model": "anthropic/claude-sonnet-4",
    "provider": "anthropic"
  },
  "routing": {
    "summary": "L0 -> primary model",
    "fromCache": false
  },
  "context": {
    "conversationId": "conv-123"
  },
  "performance": {
    "durationMs": 1234,
    "tokens": { "input": 50, "output": 200 },
    "cost": 0.002
  }
}

POST /v1/code-agent

Specialized coding assistant.

curl -X POST http://localhost:3000/v1/code-agent \
  -H "Content-Type: application/json" \
  -d '{
    "conversationId": "conv-123",
    "task": "Create a React component for user profile",
    "files": ["src/components/UserProfile.tsx"]
  }'

POST /v1/chat

General chat endpoint with context.

curl -X POST http://localhost:3000/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "conversationId": "conv-123",
    "message": "What did we discuss earlier?"
  }'

GET /v1/context/:conversationId

Retrieve conversation context.

curl http://localhost:3000/v1/context/conv-123

GET /health

Health check endpoint.

curl http://localhost:3000/health

Response:

{
  "status": "ok",
  "redis": true,
  "database": true,
  "timestamp": "2025-11-22T06:42:00.000Z"
}
  "args": ["/path/to/ai-mcp-gateway/dist/index.js"]
}

} }


### Start the Server

```bash
# Run the built server
pnpm start

# Or use the binary directly
node dist/index.js

🏗️ Architecture

High-Level Overview

┌─────────────────────────────────────────────────────────┐
│                   MCP Client                             │
│            (Claude Desktop, VS Code, etc.)               │
└───────────────────────┬─────────────────────────────────┘
                        │ MCP Protocol
┌───────────────────────▼─────────────────────────────────┐
│                 AI MCP Gateway Server                    │
│                                                           │
│  ┌─────────────────────────────────────────────────┐    │
│  │              Tools Registry                      │    │
│  │  • code_agent    • run_vitest                   │    │
│  │  • run_playwright • fs_read/write               │    │
│  │  • git_diff      • git_status                   │    │
│  └──────────────────┬──────────────────────────────┘    │
│                     │                                     │
│  ┌──────────────────▼──────────────────────────────┐    │
│  │           Routing Engine                        │    │
│  │  • Task classification                          │    │
│  │  • Layer selection (L0→L1→L2→L3)               │    │
│  │  • Cross-check orchestration                    │    │
│  │  • Auto-escalation                              │    │
│  └──────────────────┬──────────────────────────────┘    │
│                     │                                     │
│  ┌──────────────────▼──────────────────────────────┐    │
│  │           LLM Clients                           │    │
│  │  • OpenRouter  • Anthropic                      │    │
│  │  • OpenAI      • OSS Local                      │    │
│  └──────────────────┬──────────────────────────────┘    │
└───────────────────────┼─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        │               │               │
┌───────▼──────┐ ┌─────▼──────┐ ┌─────▼──────┐
│ Free Models  │ │ Paid Models│ │Local Models│
│ (Layer L0)   │ │(Layer L1-L3)│ │  (Layer L0)│
└──────────────┘ └────────────┘ └────────────┘

Key Components

1. MCP Server (src/mcp/)

  • Handles MCP protocol communication
  • Registers and dispatches tools
  • Manages request/response lifecycle

2. Routing Engine (src/routing/)

  • Classifies tasks by type, complexity, quality
  • Selects optimal model layer
  • Orchestrates cross-checking between models
  • Auto-escalates when needed

3. LLM Clients (src/tools/llm/)

  • Unified interface for multiple providers
  • Handles API calls, token counting, cost calculation
  • Supports: OpenRouter, Anthropic, OpenAI, local models

4. Tools (src/tools/)

  • Code Agent: Main AI coding assistant
  • Testing: Vitest and Playwright runners
  • File System: Read/write/list operations
  • Git: Diff and status operations

5. Logging & Metrics (src/logging/)

  • Winston-based structured logging
  • Cost tracking and alerts
  • Performance metrics

🛠️ Available MCP Tools

The gateway exposes 14 MCP tools for various operations:

Code & Development Tools

Tool Description Key Parameters
code_agent AI coding assistant with TODO tracking task, context, quality

Testing Tools

Tool Description Key Parameters
run_vitest Execute Vitest unit/integration tests testPath, watch
run_playwright Execute Playwright E2E tests testPath

File System Tools

Tool Description Key Parameters
fs_read Read file contents path, encoding
fs_write Write file contents path, content
fs_list List directory contents path, recursive

Git Tools

Tool Description Key Parameters
git_diff Show git diff staged
git_status Show git status -

NEW: Cache Tools (Redis)

Tool Description Key Parameters
redis_get Get value from Redis cache key
redis_set Set value in Redis cache key, value, ttl
redis_del Delete key from Redis cache key

NEW: Database Tools (PostgreSQL)

Tool Description Key Parameters
db_query Execute SQL query sql, params
db_insert Insert row into table table, data
db_update Update rows in table table, where, data

Tool Usage Examples

Using Redis cache:

{
  "tool": "redis_set",
  "arguments": {
    "key": "user:profile:123",
    "value": {"name": "John", "role": "admin"},
    "ttl": 3600
  }
}

Querying database:

{
  "tool": "db_query",
  "arguments": {
    "sql": "SELECT * FROM conversations WHERE user_id = $1 LIMIT 10",
    "params": ["user-123"]
  }
}

📦 Context Management

How Context Works

  1. Conversation Initialization

    • Client sends conversationId with each request
    • Gateway checks Redis for existing context summary
    • Falls back to DB if Redis miss
    • Creates new conversation if not exists
  2. Context Storage

    • Summary: Compressed project context (stack, architecture, decisions)
    • Messages: Recent messages (last 50 in Redis, all in DB)
    • TODO Lists: Persistent task tracking
    • Metadata: User, project, timestamps
  3. Context Compression

    • When context grows large (>50 messages):
      • System generates new summary
      • Keeps only recent 5-10 messages in detail
      • Older messages summarized into context
    • Reduces token usage while maintaining relevance
  4. Context Handoff

    • When escalating between layers:
      • Creates handoff package with:
        • Context summary
        • Current task
        • Previous attempts
        • Known issues
        • Request to higher layer
      • Optimized for minimal tokens

Database Schema

-- Conversations
CREATE TABLE conversations (
    id TEXT PRIMARY KEY,
    user_id TEXT,
    project_id TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    metadata JSONB DEFAULT '{}'::jsonb
);

-- Messages
CREATE TABLE messages (
    id SERIAL PRIMARY KEY,
    conversation_id TEXT REFERENCES conversations(id),
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    metadata JSONB DEFAULT '{}'::jsonb,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Context summaries
CREATE TABLE context_summaries (
    id SERIAL PRIMARY KEY,
    conversation_id TEXT REFERENCES conversations(id),
    summary TEXT NOT NULL,
    version INTEGER DEFAULT 1,
    created_at TIMESTAMP DEFAULT NOW()
);

-- LLM call logs
CREATE TABLE llm_calls (
    id SERIAL PRIMARY KEY,
    conversation_id TEXT REFERENCES conversations(id),
    model_id TEXT NOT NULL,
    layer TEXT NOT NULL,
    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    estimated_cost DECIMAL(10, 6) DEFAULT 0,
    duration_ms INTEGER,
    success BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW()
);

-- TODO lists
CREATE TABLE todo_lists (
    id SERIAL PRIMARY KEY,
    conversation_id TEXT REFERENCES conversations(id),
    todo_data JSONB NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

⚙️ Configuration

Environment Variables

Create a .env file (use .env.example as template):

# MCP Server
MCP_SERVER_NAME=ai-mcp-gateway
MCP_SERVER_VERSION=0.1.0

# API Keys
OPENROUTER_API_KEY=sk-or-v1-...
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# OSS/Local Models (optional)
OSS_MODEL_ENDPOINT=http://localhost:11434
OSS_MODEL_ENABLED=false

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
REDIS_DB=0

# PostgreSQL
DATABASE_URL=postgresql://user:pass@localhost:5432/ai_mcp_gateway
DB_HOST=localhost
DB_PORT=5432
DB_NAME=ai_mcp_gateway
DB_USER=postgres
DB_PASSWORD=
DB_SSL=false

# HTTP API
API_PORT=3000
API_HOST=0.0.0.0
API_CORS_ORIGIN=*

# Logging
LOG_LEVEL=info
LOG_FILE=logs/ai-mcp-gateway.log

# Routing Configuration
DEFAULT_LAYER=L0
ENABLE_CROSS_CHECK=true
ENABLE_AUTO_ESCALATE=true
MAX_ESCALATION_LAYER=L2

# Cost Tracking
ENABLE_COST_TRACKING=true
COST_ALERT_THRESHOLD=1.00

# Mode
MODE=mcp  # or 'api' for HTTP server

Model Configuration

Edit src/config/models.ts to:

  • Add/remove models
  • Adjust layer assignments
  • Update pricing
  • Enable/disable models

Example:

{
  id: 'my-custom-model',
  provider: 'openrouter',
  apiModelName: 'provider/model-name',
  layer: 'L1',
  relativeCost: 5,
  pricePer1kInputTokens: 0.001,
  pricePer1kOutputTokens: 0.002,
  capabilities: {
    code: true,
    general: true,
    reasoning: true,
  },
  contextWindow: 100000,
  enabled: true,
}

📖 Usage

Using the Code Agent

The Code Agent is the primary tool for coding tasks:

// Example MCP client call
{
  "tool": "code_agent",
  "arguments": {
    "task": "Create a TypeScript function to validate email addresses",
    "context": {
      "language": "typescript",
      "requirements": [
        "Use regex pattern",
        "Handle edge cases",
        "Include unit tests"
      ]
    },
    "quality": "high"
  }
}

Response includes:

  • Generated code
  • Routing summary (which models were used)
  • Token usage and cost
  • Quality assessment

Running Tests

// Run Vitest tests
{
  "tool": "run_vitest",
  "arguments": {
    "testPath": "tests/unit/mytest.test.ts"
  }
}

// Run Playwright E2E tests
{
  "tool": "run_playwright",
  "arguments": {
    "testPath": "tests/e2e/login.spec.ts"
  }
}

File Operations

// Read file
{
  "tool": "fs_read",
  "arguments": {
    "path": "/path/to/file.ts"
  }
}

// Write file
{
  "tool": "fs_write",
  "arguments": {
    "path": "/path/to/output.ts",
    "content": "console.log('Hello');"
  }
}

// List directory
{
  "tool": "fs_list",
  "arguments": {
    "path": "/path/to/directory"
  }
}

Git Operations

// Get diff
{
  "tool": "git_diff",
  "arguments": {
    "staged": false
  }
}

// Get status
{
  "tool": "git_status",
  "arguments": {}
}

🛠️ Available Tools

Tool Name Description Input
code_agent AI coding assistant with multi-model routing task, context, quality
run_vitest Run Vitest unit/integration tests testPath (optional)
run_playwright Run Playwright E2E tests testPath (optional)
fs_read Read file contents path
fs_write Write file contents path, content
fs_list List directory contents path
git_diff Get git diff path (optional), staged (bool)
git_status Get git status none

🎚️ Model Layers

Layer L0 - Free/Cheapest

  • Models: Mistral 7B Free, Qwen 2 7B Free, OSS Local
  • Cost: $0
  • Use for: Simple tasks, drafts, code review
  • Capabilities: Basic code, general knowledge

Layer L1 - Low Cost

  • Models: Gemini Flash 1.5, GPT-4o Mini
  • Cost: ~$0.08-0.75 per 1M tokens
  • Use for: Standard coding tasks, refactoring
  • Capabilities: Code, reasoning, vision

Layer L2 - Mid-tier

  • Models: Claude 3 Haiku, GPT-4o
  • Cost: ~$1.38-12.5 per 1M tokens
  • Use for: Complex tasks, high-quality requirements
  • Capabilities: Advanced code, reasoning, vision

Layer L3 - Premium

  • Models: Claude 3.5 Sonnet, OpenAI o1
  • Cost: ~$18-60 per 1M tokens
  • Use for: Critical tasks, architecture design
  • Capabilities: SOTA performance, deep reasoning

💻 Development

Project Structure

ai-mcp-gateway/
├── src/
│   ├── index.ts              # Entry point
│   ├── config/               # Configuration
│   │   ├── env.ts
│   │   └── models.ts
│   ├── mcp/                  # MCP server
│   │   ├── server.ts
│   │   └── types.ts
│   ├── routing/              # Routing engine
│   │   ├── router.ts
│   │   └── cost.ts
│   ├── tools/                # MCP tools
│   │   ├── codeAgent/
│   │   ├── llm/
│   │   ├── testing/
│   │   ├── fs/
│   │   └── git/
│   └── logging/              # Logging & metrics
│       ├── logger.ts
│       └── metrics.ts
├── tests/                    # Tests
│   ├── unit/
│   ├── integration/
│   └── regression/
├── docs/                     # Documentation
│   ├── ai-orchestrator-notes.md
│   ├── ai-routing-heuristics.md
│   └── ai-common-bugs-and-fixes.md
├── playwright/               # E2E tests
├── package.json
├── tsconfig.json
├── vitest.config.ts
└── playwright.config.ts

Scripts

# Development
pnpm dev          # Watch mode with auto-rebuild
pnpm build        # Build for production
pnpm start        # Run built server

# Testing
pnpm test         # Run all Vitest tests
pnpm test:watch   # Run tests in watch mode
pnpm test:ui      # Run tests with UI
pnpm test:e2e     # Run Playwright E2E tests

# Code Quality
pnpm type-check   # TypeScript type checking
pnpm lint         # ESLint
pnpm format       # Prettier

🧪 Testing

Unit Tests

# Run all unit tests
pnpm test

# Run specific test file
pnpm vitest tests/unit/routing.test.ts

# Watch mode
pnpm test:watch

Integration Tests

Integration tests verify interactions between components:

pnpm vitest tests/integration/

Regression Tests

Regression tests prevent previously fixed bugs from reoccurring:

pnpm vitest tests/regression/

E2E Tests

End-to-end tests using Playwright:

pnpm test:e2e

🔄 Self-Improvement

The gateway includes a self-improvement system:

1. Bug Tracking (docs/ai-common-bugs-and-fixes.md)

  • Documents encountered bugs
  • Includes root causes and fixes
  • Links to regression tests

2. Pattern Learning (docs/ai-orchestrator-notes.md)

  • Tracks successful patterns
  • Records optimization opportunities
  • Documents lessons learned

3. Routing Refinement (docs/ai-routing-heuristics.md)

  • Defines routing rules
  • Documents when to escalate
  • Model capability matrix

Adding to Self-Improvement Docs

When you discover a bug or pattern:

  1. Document it in the appropriate file
  2. Create a regression test in tests/regression/
  3. Update routing heuristics if needed
  4. Run tests to verify the fix

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Update documentation
  5. Submit a pull request

Adding a New Model

  1. Update src/config/models.ts:

    {
      id: 'new-model-id',
      provider: 'provider-name',
      // ... config
    }
    
  2. Add provider client if needed in src/tools/llm/

  3. Update docs/ai-routing-heuristics.md

Adding a New Tool

  1. Create tool in src/tools/yourtool/index.ts:

    export const yourTool = {
      name: 'your_tool',
      description: '...',
      inputSchema: { ... },
      handler: async (args) => { ... }
    };
    
  2. Register in src/mcp/server.ts

  3. Add tests in tests/unit/


📄 License

MIT License - see LICENSE file for details


🙏 Acknowledgments


📞 Support


🗺️ Roadmap

  • [ ] Token usage analytics dashboard
  • [ ] Caching layer for repeated queries
  • [ ] More LLM providers (Google AI, Cohere, etc.)
  • [ ] Streaming response support
  • [ ] Web UI for configuration and monitoring
  • [ ] Batch processing optimizations
  • [ ] Advanced prompt templates
  • [ ] A/B testing framework

Made with ❤️ for efficient AI orchestration

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选