MCP BigQuery Server

MCP BigQuery Server

Production-ready MCP server for BigQuery that translates natural language questions to SQL, executes queries securely, and delivers results via stdio or HTTP for integration with GitHub Copilot, Power BI, and web applications.

Category
访问服务器

README

MCP BigQuery Server

Production-ready Model Context Protocol (MCP) server for BigQuery with natural language query support. Translate questions to SQL, execute securely, and deliver results via stdio (local) or HTTP (remote) for GitHub Copilot, Power BI, and web applications.

Features

  • Natural Language to SQL: Ask questions in plain English, get BigQuery results
  • Secure Query Execution: Read-only, parameterized queries with dataset whitelisting
  • Multiple Transports: Run locally (stdio) or as a remote HTTP server
  • Enterprise Ready: JWT auth, RBAC, rate limiting, caching, observability
  • Power BI Integration: REST API with JSON and CSV endpoints
  • Azure Native: Key Vault secrets, Managed Identity, Container Apps deployment
  • High Performance: Query caching, connection pooling, concurrent request handling

Architecture

┌─────────────────┐
│ GitHub Copilot  │
│  or Web Client  │
└────────┬────────┘
         │
    ┌────▼────┐
    │   MCP   │ (stdio or HTTP)
    │ Server  │
    └────┬────┘
         │
    ┌────▼──────────────────────────────┐
    │  Services                         │
    │  • NL2SQL (Azure OpenAI)          │
    │  • SQL Validator & Guardrails     │
    │  • Query Cache (Memory + Redis)   │
    │  • Schema Cache                   │
    └────┬──────────────────────────────┘
         │
    ┌────▼────────┐         ┌──────────────┐
    │   BigQuery  │◄────────┤  Key Vault   │
    │   Client    │         │  (Secrets)   │
    └─────────────┘         └──────────────┘

Quick Start

Prerequisites

  • Node.js 20+
  • Google Cloud Platform project with BigQuery enabled
  • BigQuery service account JSON key
  • Azure OpenAI deployment (for NL→SQL)
  • (Optional) Azure Key Vault for secrets
  • (Optional) Redis for distributed caching

Local Development

  1. Install dependencies:
npm install
  1. Configure environment:
cp .env.example .env
# Edit .env with your configuration
  1. Build:
npm run build
  1. Run in stdio mode (for GitHub Copilot):
npm run start:stdio
  1. Run in HTTP mode (for remote access):
npm run start:http

Running with GitHub Copilot

  1. Add to your MCP settings (~/.config/Code/User/globalStorage/github.copilot-chat/mcp.json):
{
  "mcpServers": {
    "bigquery": {
      "command": "node",
      "args": ["/path/to/mcp-bigquery-server/dist/index.js", "stdio"],
      "env": {
        "GCP_PROJECT_ID": "your-project-id",
        "GCP_SA_KEY_JSON": "{...}",
        "ALLOWED_DATASETS": "dataset1,dataset2",
        "AZURE_OPENAI_ENDPOINT": "https://your-openai.openai.azure.com/",
        "AZURE_OPENAI_API_KEY": "your-key"
      }
    }
  }
}
  1. Restart VS Code

  2. Ask questions in Copilot Chat:

@workspace Ask the BigQuery server: What were total sales by region last quarter?

Configuration

Required Environment Variables

# BigQuery
GCP_PROJECT_ID=your-gcp-project-id
GCP_SA_KEY_JSON='{"type":"service_account",...}'
ALLOWED_DATASETS=dataset1,dataset2,dataset3

# Azure OpenAI (for NL→SQL)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini

# JWT Authentication (HTTP mode)
JWT_ISSUER=https://login.microsoftonline.com/tenant-id/v2.0
JWT_AUDIENCE=api://your-app-id
JWKS_URI=https://login.microsoftonline.com/tenant-id/discovery/v2.0/keys

Optional Environment Variables

# Azure Key Vault (recommended for production)
AZURE_KEY_VAULT_URI=https://your-vault.vault.azure.net/
USE_MANAGED_IDENTITY=true

# Redis (for distributed caching)
REDIS_URL=redis://localhost:6379

# Limits
MAX_ROWS_DEFAULT=10000
MAX_ROWS_ABSOLUTE=100000
QUERY_TIMEOUT_MS=30000
CACHE_TTL_SEC=3600

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
ENABLE_TRACING=true
ENABLE_METRICS=true

Azure Key Vault Setup

1. Create Key Vault

az keyvault create \
  --name kv-mcp-bigquery \
  --resource-group your-rg \
  --location eastus

2. Upload BigQuery Service Account

az keyvault secret set \
  --vault-name kv-mcp-bigquery \
  --name bigquery-service-account \
  --file service-account.json

3. Upload Azure OpenAI Key

az keyvault secret set \
  --vault-name kv-mcp-bigquery \
  --name azure-openai-key \
  --value "your-api-key"

4. Grant Access

# For managed identity (production)
az keyvault set-policy \
  --name kv-mcp-bigquery \
  --object-id <managed-identity-principal-id> \
  --secret-permissions get list

# For service principal (dev)
az keyvault set-policy \
  --name kv-mcp-bigquery \
  --spn <client-id> \
  --secret-permissions get list

API Usage

Natural Language Query (POST /api/query)

curl -X POST https://your-server.azurecontainerapps.io/api/query \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What were total orders and average price by region last quarter?",
    "maxRows": 100
  }'

Response:

{
  "success": true,
  "data": {
    "sql": "SELECT region, COUNT(*) as total_orders, AVG(price) as avg_price...",
    "explanation": "This query calculates total orders and average price by region...",
    "confidence": 0.95,
    "rows": [...],
    "schema": {...},
    "metadata": {...}
  }
}

Direct SQL Query (POST /api/sql)

curl -X POST https://your-server.azurecontainerapps.io/api/sql \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT region, SUM(amount) as total FROM `project.dataset.orders` WHERE date >= @start_date GROUP BY region LIMIT 10",
    "params": {"start_date": "2024-01-01"}
  }'

CSV Export (POST /api/query.csv)

curl -X POST https://your-server.azurecontainerapps.io/api/query.csv \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "Top 100 customers by revenue"}' \
  --output results.csv

Power BI Integration

Option 1: Native BigQuery Connector (Recommended)

Use Power BI's built-in BigQuery connector for live DirectQuery at scale:

  1. Get Data → Google BigQuery
  2. Enter GCP project ID and dataset
  3. Authenticate with service account
  4. Select tables and configure DirectQuery

Pros: Native integration, full Power BI optimization, live data Cons: No natural language queries

Option 2: Web API Connector (for NL→SQL)

Use when you need natural language queries or pre/post processing:

  1. Get Data → Web
  2. Set URL: https://your-server.azurecontainerapps.io/api/query.csv
  3. Set Method: POST
  4. Add Header: Authorization: Bearer <token>
  5. Set Body:
{
  "question": "Monthly sales trends for last 12 months",
  "maxRows": 10000
}

Pros: Natural language queries, custom processing Cons: Snapshot data (not live), token management

Deployment

Deploy to Azure Container Apps

  1. Create resource group:
az group create --name rg-mcp-bigquery --location eastus
  1. Deploy Key Vault:
az deployment group create \
  --resource-group rg-mcp-bigquery \
  --template-file bicep/keyvault.bicep \
  --parameters adminObjectId=$USER_OBJECT_ID
  1. Upload secrets (see Key Vault Setup above)

  2. Build and push image:

docker build -t yourregistry.azurecr.io/mcp-bigquery-server:latest .
docker push yourregistry.azurecr.io/mcp-bigquery-server:latest
  1. Deploy Container App:
az deployment group create \
  --resource-group rg-mcp-bigquery \
  --template-file bicep/main.bicep \
  --parameters bicep/main.parameters.json

GitHub Actions CI/CD

The included workflow (.github/workflows/ci-cd.yml) automates:

  1. Linting, testing, and type checking
  2. Docker image build and security scanning
  3. Push to Azure Container Registry
  4. Deployment to Azure Container Apps

Configure secrets in GitHub:

  • ACR_USERNAME, ACR_PASSWORD
  • AZURE_CREDENTIALS
  • AZURE_RG

Security

Query Safety Guardrails

  • Read-only: Only SELECT queries allowed
  • Parameterized: No SQL injection via parameter sanitization
  • Dataset whitelist: Only approved datasets accessible
  • Row limits: Enforced maximum rows per query
  • Timeouts: Query execution time limits
  • No DDL/DML: Blocks INSERT, UPDATE, DELETE, DROP, etc.

Authentication & Authorization

  • JWT bearer tokens (Azure Entra ID compatible)
  • Role-based access control (viewer, analyst, admin)
  • Per-dataset access rules
  • Rate limiting per user/IP
  • Request audit logging

Secrets Management

  • Azure Key Vault for production secrets
  • Environment variables for local dev only
  • Managed Identity (no credentials in code)
  • Secret caching with TTL
  • PII redaction in logs

See SECURITY.md for full threat model and compliance.

Performance

Caching Strategy

  • Query Cache: In-memory + optional Redis

    • Keyed by normalized SQL + parameters
    • Configurable TTL (default: 1 hour)
    • Reduces BigQuery costs and latency
  • Schema Cache: In-memory with refresh

    • Cached dataset/table schemas
    • TTL-based invalidation
    • Warms on startup

Scaling Configuration

Container Apps auto-scaling:

  • Min replicas: 1 (dev), 2 (prod)
  • Max replicas: 10+
  • Scale rule: 50 concurrent requests per replica
  • KEDA for advanced metrics

Expected performance:

  • Throughput: 1000+ queries/second (cached)
  • Latency: <100ms (cached), <2s (uncached)
  • Concurrency: 10 concurrent BigQuery queries per instance

Development

Project Structure

mcp-bigquery/
├── src/
│   ├── mcp/              # MCP server, tools, resources
│   ├── api/              # Express HTTP API
│   ├── services/         # BigQuery, NL2SQL, caching
│   ├── config.ts         # Configuration loader
│   ├── logger.ts         # Structured logging
│   ├── telemetry.ts      # OpenTelemetry
│   └── index.ts          # Entry point
├── test/                 # Unit and integration tests
├── loadtest/             # k6 load tests
├── bicep/                # Azure infrastructure
├── .github/workflows/    # CI/CD
└── Dockerfile            # Container image

Available Scripts

npm run dev              # Watch mode build
npm run build            # Production build
npm run start:stdio      # Run stdio mode
npm run start:http       # Run HTTP mode
npm run test             # Run tests
npm run test:coverage    # Coverage report
npm run lint             # Lint code
npm run format           # Format code
npm run loadtest         # Run k6 load test

Adding New Tools

  1. Create tool file in src/mcp/tools/
  2. Implement input schema (zod) and handler
  3. Register in src/mcp/server.ts
  4. Add tests in test/

Example:

export const myTool = {
  name: 'my_tool',
  description: 'Does something useful',
  inputSchema: {...},
};

export async function myToolHandler(input: MyInput) {
  // Implementation
}

Troubleshooting

Common Issues

"Azure OpenAI API key not configured"

  • Ensure AZURE_OPENAI_API_KEY is set or available in Key Vault
  • Check Key Vault permissions

"Dataset not allowed"

  • Add dataset to ALLOWED_DATASETS environment variable
  • Verify dataset exists in BigQuery

"Authentication failed"

  • Verify JWT token is valid and not expired
  • Check JWT_ISSUER, JWT_AUDIENCE, and JWKS_URI configuration
  • Ensure user has required roles (analyst/admin for NL queries)

"Query timeout"

  • Increase QUERY_TIMEOUT_MS
  • Optimize query or add indexes in BigQuery
  • Check BigQuery quotas

Logs

View logs in Azure:

az containerapp logs show \
  --name mcp-bigquery-server-prod \
  --resource-group rg-mcp-bigquery \
  --follow

Health Checks

  • /healthz: Basic health (BigQuery connectivity)
  • /readyz: Full readiness (all services initialized)

License

MIT

Support

For issues and questions:

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选
mcp-server-qdrant

mcp-server-qdrant

这个仓库展示了如何为向量搜索引擎 Qdrant 创建一个 MCP (Managed Control Plane) 服务器的示例。

官方
精选
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选