MCP 服务器

MCP BigQuery Server

Production-ready MCP server for BigQuery that translates natural language questions to SQL, executes queries securely, and delivers results via stdio or HTTP for integration with GitHub Copilot, Power BI, and web applications.

README

MCP BigQuery Server

Production-ready Model Context Protocol (MCP) server for BigQuery with natural language query support. Translate questions to SQL, execute securely, and deliver results via stdio (local) or HTTP (remote) for GitHub Copilot, Power BI, and web applications.

Features

Natural Language to SQL: Ask questions in plain English, get BigQuery results
Secure Query Execution: Read-only, parameterized queries with dataset whitelisting
Multiple Transports: Run locally (stdio) or as a remote HTTP server
Enterprise Ready: JWT auth, RBAC, rate limiting, caching, observability
Power BI Integration: REST API with JSON and CSV endpoints
Azure Native: Key Vault secrets, Managed Identity, Container Apps deployment
High Performance: Query caching, connection pooling, concurrent request handling

Architecture

┌─────────────────┐
│ GitHub Copilot  │
│  or Web Client  │
└────────┬────────┘
         │
    ┌────▼────┐
    │   MCP   │ (stdio or HTTP)
    │ Server  │
    └────┬────┘
         │
    ┌────▼──────────────────────────────┐
    │  Services                         │
    │  • NL2SQL (Azure OpenAI)          │
    │  • SQL Validator & Guardrails     │
    │  • Query Cache (Memory + Redis)   │
    │  • Schema Cache                   │
    └────┬──────────────────────────────┘
         │
    ┌────▼────────┐         ┌──────────────┐
    │   BigQuery  │◄────────┤  Key Vault   │
    │   Client    │         │  (Secrets)   │
    └─────────────┘         └──────────────┘

Quick Start

Prerequisites

Node.js 20+
Google Cloud Platform project with BigQuery enabled
BigQuery service account JSON key
Azure OpenAI deployment (for NL→SQL)
(Optional) Azure Key Vault for secrets
(Optional) Redis for distributed caching

Local Development

Install dependencies:

npm install

Configure environment:

cp .env.example .env
# Edit .env with your configuration

Build:

npm run build

Run in stdio mode (for GitHub Copilot):

npm run start:stdio

Run in HTTP mode (for remote access):

npm run start:http

Running with GitHub Copilot

Add to your MCP settings (~/.config/Code/User/globalStorage/github.copilot-chat/mcp.json):

{
  "mcpServers": {
    "bigquery": {
      "command": "node",
      "args": ["/path/to/mcp-bigquery-server/dist/index.js", "stdio"],
      "env": {
        "GCP_PROJECT_ID": "your-project-id",
        "GCP_SA_KEY_JSON": "{...}",
        "ALLOWED_DATASETS": "dataset1,dataset2",
        "AZURE_OPENAI_ENDPOINT": "https://your-openai.openai.azure.com/",
        "AZURE_OPENAI_API_KEY": "your-key"
      }
    }
  }
}

Restart VS Code
Ask questions in Copilot Chat:

@workspace Ask the BigQuery server: What were total sales by region last quarter?

Configuration

Required Environment Variables

# BigQuery
GCP_PROJECT_ID=your-gcp-project-id
GCP_SA_KEY_JSON='{"type":"service_account",...}'
ALLOWED_DATASETS=dataset1,dataset2,dataset3

# Azure OpenAI (for NL→SQL)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini

# JWT Authentication (HTTP mode)
JWT_ISSUER=https://login.microsoftonline.com/tenant-id/v2.0
JWT_AUDIENCE=api://your-app-id
JWKS_URI=https://login.microsoftonline.com/tenant-id/discovery/v2.0/keys

Optional Environment Variables

# Azure Key Vault (recommended for production)
AZURE_KEY_VAULT_URI=https://your-vault.vault.azure.net/
USE_MANAGED_IDENTITY=true

# Redis (for distributed caching)
REDIS_URL=redis://localhost:6379

# Limits
MAX_ROWS_DEFAULT=10000
MAX_ROWS_ABSOLUTE=100000
QUERY_TIMEOUT_MS=30000
CACHE_TTL_SEC=3600

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
ENABLE_TRACING=true
ENABLE_METRICS=true

Azure Key Vault Setup

1. Create Key Vault

az keyvault create \
  --name kv-mcp-bigquery \
  --resource-group your-rg \
  --location eastus

2. Upload BigQuery Service Account

az keyvault secret set \
  --vault-name kv-mcp-bigquery \
  --name bigquery-service-account \
  --file service-account.json

3. Upload Azure OpenAI Key

az keyvault secret set \
  --vault-name kv-mcp-bigquery \
  --name azure-openai-key \
  --value "your-api-key"

4. Grant Access

# For managed identity (production)
az keyvault set-policy \
  --name kv-mcp-bigquery \
  --object-id <managed-identity-principal-id> \
  --secret-permissions get list

# For service principal (dev)
az keyvault set-policy \
  --name kv-mcp-bigquery \
  --spn <client-id> \
  --secret-permissions get list

API Usage

Natural Language Query (POST /api/query)

curl -X POST https://your-server.azurecontainerapps.io/api/query \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What were total orders and average price by region last quarter?",
    "maxRows": 100
  }'

Response:

{
  "success": true,
  "data": {
    "sql": "SELECT region, COUNT(*) as total_orders, AVG(price) as avg_price...",
    "explanation": "This query calculates total orders and average price by region...",
    "confidence": 0.95,
    "rows": [...],
    "schema": {...},
    "metadata": {...}
  }
}

Direct SQL Query (POST /api/sql)

curl -X POST https://your-server.azurecontainerapps.io/api/sql \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT region, SUM(amount) as total FROM `project.dataset.orders` WHERE date >= @start_date GROUP BY region LIMIT 10",
    "params": {"start_date": "2024-01-01"}
  }'

CSV Export (POST /api/query.csv)

curl -X POST https://your-server.azurecontainerapps.io/api/query.csv \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "Top 100 customers by revenue"}' \
  --output results.csv

Power BI Integration

Option 1: Native BigQuery Connector (Recommended)

Use Power BI's built-in BigQuery connector for live DirectQuery at scale:

Get Data → Google BigQuery
Enter GCP project ID and dataset
Authenticate with service account
Select tables and configure DirectQuery

Pros: Native integration, full Power BI optimization, live data Cons: No natural language queries

Option 2: Web API Connector (for NL→SQL)

Use when you need natural language queries or pre/post processing:

Get Data → Web
Set URL: https://your-server.azurecontainerapps.io/api/query.csv
Set Method: POST
Add Header: Authorization: Bearer <token>
Set Body:

{
  "question": "Monthly sales trends for last 12 months",
  "maxRows": 10000
}

Pros: Natural language queries, custom processing Cons: Snapshot data (not live), token management

Deployment

Deploy to Azure Container Apps

Create resource group:

az group create --name rg-mcp-bigquery --location eastus

Deploy Key Vault:

az deployment group create \
  --resource-group rg-mcp-bigquery \
  --template-file bicep/keyvault.bicep \
  --parameters adminObjectId=$USER_OBJECT_ID

Upload secrets (see Key Vault Setup above)
Build and push image:

docker build -t yourregistry.azurecr.io/mcp-bigquery-server:latest .
docker push yourregistry.azurecr.io/mcp-bigquery-server:latest

Deploy Container App:

az deployment group create \
  --resource-group rg-mcp-bigquery \
  --template-file bicep/main.bicep \
  --parameters bicep/main.parameters.json

GitHub Actions CI/CD

The included workflow (.github/workflows/ci-cd.yml) automates:

Linting, testing, and type checking
Docker image build and security scanning
Push to Azure Container Registry
Deployment to Azure Container Apps

Configure secrets in GitHub:

ACR_USERNAME, ACR_PASSWORD
AZURE_CREDENTIALS
AZURE_RG

Security

Query Safety Guardrails

✅ Read-only: Only SELECT queries allowed
✅ Parameterized: No SQL injection via parameter sanitization
✅ Dataset whitelist: Only approved datasets accessible
✅ Row limits: Enforced maximum rows per query
✅ Timeouts: Query execution time limits
✅ No DDL/DML: Blocks INSERT, UPDATE, DELETE, DROP, etc.

Authentication & Authorization

JWT bearer tokens (Azure Entra ID compatible)
Role-based access control (viewer, analyst, admin)
Per-dataset access rules
Rate limiting per user/IP
Request audit logging

Secrets Management

Azure Key Vault for production secrets
Environment variables for local dev only
Managed Identity (no credentials in code)
Secret caching with TTL
PII redaction in logs

See SECURITY.md for full threat model and compliance.

Performance

Caching Strategy

Query Cache: In-memory + optional Redis
- Keyed by normalized SQL + parameters
- Configurable TTL (default: 1 hour)
- Reduces BigQuery costs and latency
Schema Cache: In-memory with refresh
- Cached dataset/table schemas
- TTL-based invalidation
- Warms on startup

Scaling Configuration

Container Apps auto-scaling:

Min replicas: 1 (dev), 2 (prod)
Max replicas: 10+
Scale rule: 50 concurrent requests per replica
KEDA for advanced metrics

Expected performance:

Throughput: 1000+ queries/second (cached)
Latency: <100ms (cached), <2s (uncached)
Concurrency: 10 concurrent BigQuery queries per instance

Development

Project Structure

mcp-bigquery/
├── src/
│   ├── mcp/              # MCP server, tools, resources
│   ├── api/              # Express HTTP API
│   ├── services/         # BigQuery, NL2SQL, caching
│   ├── config.ts         # Configuration loader
│   ├── logger.ts         # Structured logging
│   ├── telemetry.ts      # OpenTelemetry
│   └── index.ts          # Entry point
├── test/                 # Unit and integration tests
├── loadtest/             # k6 load tests
├── bicep/                # Azure infrastructure
├── .github/workflows/    # CI/CD
└── Dockerfile            # Container image

Available Scripts

npm run dev              # Watch mode build
npm run build            # Production build
npm run start:stdio      # Run stdio mode
npm run start:http       # Run HTTP mode
npm run test             # Run tests
npm run test:coverage    # Coverage report
npm run lint             # Lint code
npm run format           # Format code
npm run loadtest         # Run k6 load test

Adding New Tools

Create tool file in src/mcp/tools/
Implement input schema (zod) and handler
Register in src/mcp/server.ts
Add tests in test/

Example:

export const myTool = {
  name: 'my_tool',
  description: 'Does something useful',
  inputSchema: {...},
};

export async function myToolHandler(input: MyInput) {
  // Implementation
}

Troubleshooting

Common Issues

"Azure OpenAI API key not configured"

Ensure AZURE_OPENAI_API_KEY is set or available in Key Vault
Check Key Vault permissions

"Dataset not allowed"

Add dataset to ALLOWED_DATASETS environment variable
Verify dataset exists in BigQuery

"Authentication failed"

Verify JWT token is valid and not expired
Check JWT_ISSUER, JWT_AUDIENCE, and JWKS_URI configuration
Ensure user has required roles (analyst/admin for NL queries)

"Query timeout"

Increase QUERY_TIMEOUT_MS
Optimize query or add indexes in BigQuery
Check BigQuery quotas

Logs

View logs in Azure:

az containerapp logs show \
  --name mcp-bigquery-server-prod \
  --resource-group rg-mcp-bigquery \
  --follow

Health Checks

/healthz: Basic health (BigQuery connectivity)
/readyz: Full readiness (all services initialized)

License

MIT

Support

For issues and questions:

GitHub Issues: yourorg/mcp-bigquery-server
Documentation: docs/

MCP BigQuery Server

README

MCP BigQuery Server

Features

Architecture

Quick Start

Prerequisites

Local Development

Running with GitHub Copilot

Configuration

Required Environment Variables

Optional Environment Variables

Azure Key Vault Setup

1. Create Key Vault

2. Upload BigQuery Service Account

3. Upload Azure OpenAI Key

4. Grant Access

API Usage

Natural Language Query (POST /api/query)

Direct SQL Query (POST /api/sql)

CSV Export (POST /api/query.csv)

Power BI Integration

Option 1: Native BigQuery Connector (Recommended)

Option 2: Web API Connector (for NL→SQL)

Deployment

Deploy to Azure Container Apps

GitHub Actions CI/CD

Security

Query Safety Guardrails

Authentication & Authorization

Secrets Management

Performance

Caching Strategy

Scaling Configuration

Development

Project Structure

Available Scripts

Adding New Tools

Troubleshooting

Common Issues

Logs

Health Checks

License

Support

推荐服务器