MCP 服务器

Founder Intelligence Engine

Transforms founder profiles from social media into actionable strategic intelligence through automated scraping, LLM analysis, and personalized news tracking. It leverages vector search and caching to provide deep insights and relevant updates on specific founders.

README

Founder Intelligence Engine — MCP Server

A production-grade Model Context Protocol (MCP) server that transforms founder profiles into actionable strategic intelligence.

Architecture

┌───────────────────────────────────────────────────────────┐
│                     MCP Client (Claude, etc.)             │
│                          ▲ stdio                          │
│               ┌──────────┴──────────┐                     │
│               │   MCP Server (Node) │                     │
│               │   3 registered tools│                     │
│               └──────┬──────────────┘                     │
│          ┌───────────┬┼──────────────┐                    │
│          ▼           ▼▼              ▼                    │
│  ┌──────────┐  ┌───────────┐  ┌──────────────┐           │
│  │  Apify   │  │   Groq    │  │  Embeddings  │           │
│  │  Scraping│  │   LLM     │  │  API         │           │
│  └────┬─────┘  └─────┬─────┘  └──────┬───────┘           │
│       └──────────────┬┘──────────────┘                    │
│                      ▼                                    │
│            ┌─────────────────┐                            │
│            │  Supabase       │                            │
│            │  (Postgres +    │                            │
│            │   pgvector)     │                            │
│            └─────────────────┘                            │
└───────────────────────────────────────────────────────────┘

Data Flow

collect_profile — Scrapes LinkedIn + Twitter via Apify → merges data → generates embedding → stores in Supabase
analyze_profile — Fetches stored profile → calls Groq LLM for strategic analysis → caches result
fetch_personalized_news — Checks cache freshness → if stale: generates search queries → scrapes Google News → embeds articles → ranks by cosine similarity → summarizes with Groq → stores; if fresh: returns cached articles

Caching & Cost Optimization

Operation	Cost	When It Runs
LinkedIn/Twitter scraping	High	Only on profile creation
Groq profile analysis	Medium	Once per profile (cached)
Google News + embeddings	High	Only when news > 24h stale
Read cached articles	Free	Every subsequent request

The fetch_history table tracks last_profile_scrape and last_news_fetch timestamps. The staleCheck.js module compares these against configurable thresholds.

Setup

1. Prerequisites

Node.js 20+
Supabase project (with pgvector enabled)
API keys: Apify, Groq, OpenAI-compatible Embeddings

2. Install

cd /Users/praveenkumar/Desktop/mcp
cp .env.example .env
# Edit .env with your real keys
npm install

3. Database

Run the migration against your Supabase SQL Editor:

-- Paste contents of migrations/001_init.sql

Or via psql:

psql $DATABASE_URL < migrations/001_init.sql

4. Run MCP Server

node src/index.js

5. Configure MCP Client

Add to your MCP client config (e.g., Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "founder-intelligence": {
      "command": "node",
      "args": ["/Users/praveenkumar/Desktop/mcp/src/index.js"],
      "env": {
        "SUPABASE_URL": "...",
        "SUPABASE_SERVICE_KEY": "...",
        "APIFY_API_TOKEN": "...",
        "GROQ_API_KEY": "...",
        "EMBEDDING_API_URL": "...",
        "EMBEDDING_API_KEY": "..."
      }
    }
  }
}

6. Background Worker (Optional)

# Single run (for cron)
node src/backgroundWorker.js

# Daemon mode
BACKGROUND_LOOP=true node src/backgroundWorker.js

Cron example (every 6 hours):

0 */6 * * * cd /app && node src/backgroundWorker.js >> /var/log/worker.log 2>&1

Project Structure

/Users/praveenkumar/Desktop/mcp/
├── migrations/
│   └── 001_init.sql
├── src/
│   ├── db/
│   │   └── supabaseClient.js
│   ├── services/
│   │   ├── apifyService.js
│   │   ├── embeddingService.js
│   │   └── llmService.js
│   ├── tools/
│   │   ├── collectProfile.js
│   │   ├── analyzeProfile.js
│   │   └── fetchPersonalizedNews.js
│   ├── utils/
│   │   ├── similarity.js
│   │   └── staleCheck.js
│   ├── backgroundWorker.js
│   └── index.js
├── .env.example
├── .gitignore
├── .dockerignore
├── Dockerfile
├── package.json
└── README.md

Docker Deployment

Build & Run

docker build -t founder-intelligence-mcp .
docker run --env-file .env founder-intelligence-mcp

Background Worker Container

docker run --env-file .env founder-intelligence-mcp node src/backgroundWorker.js

Docker Compose (production)

version: '3.8'
services:
  mcp-server:
    build: .
    env_file: .env
    stdin_open: true
    restart: unless-stopped

  worker:
    build: .
    env_file: .env
    command: ["node", "src/backgroundWorker.js"]
    environment:
      - BACKGROUND_LOOP=true
    restart: unless-stopped

Scaling Strategy

Component	Strategy
MCP Server	One instance per client (stdio-based)
Background Worker	Single instance or Cloud Run Job on schedule
Supabase	Connection pooling via Supavisor; read replicas for scale
Apify	Concurrent actor runs (up to account limit)
Embeddings	Batch requests (20 per call) to reduce round trips
Groq	Rate-limit aware with retry-after header handling

For high-profile-count deployments:

Move background worker to a Cloud Run Job triggered by Cloud Scheduler
Use Supabase Edge Functions for scheduled refresh
Add a Redis cache layer for hot profile lookups

Security Best Practices

Service-role key only on server side — never expose to clients
All secrets via environment variables — no hardcoded keys
Non-root Docker user — mcp user in container
Input validation — Zod schemas on all tool inputs
Row Level Security — enable RLS on Supabase tables for multi-tenant
API token rotation — rotate Apify, Groq, and embedding keys periodically
Rate limiting — built-in retry logic with exponential backoff
No PII logging — profile data stays in Supabase, not console

Cost Optimization

Service	Cost Driver	Mitigation
Apify	Actor compute units	Scrape only on creation; cache results
Groq	Token usage	Analyze once (cached); batch news summaries
Embeddings	API calls	Batch 20 at a time; embed once per article
Supabase	Row count + storage	Deduplicate articles by URL; prune old articles

Expected cost per profile lifecycle:

Initial setup: ~$0.05–0.15 (scrape + embed + analyze)
Daily news refresh: ~$0.02–0.08 (scrape + embed + summarize top 10)
Cached reads: $0.00

Future Improvement Roadmap

HTTP/SSE transport — support remote MCP clients over HTTP
Multi-tenant profiles — user-scoped access with RLS
Real-time alerts — push notifications when high-relevance news drops
Competitor tracking — dedicated tool to monitor named competitors
Founder network graph — map connections between analyzed founders
Custom embedding models — fine-tuned models for startup/VC domain
Article full-text extraction — deep content scraping for richer embeddings
A/B prompt testing — experiment with different Groq prompts for analysis quality
Dashboard UI — web interface for browsing intelligence feeds
Webhook integrations — push intelligence to Slack, email, or CRM