MCP 服务器

wechat-mp-mcp

Enables crawling WeChat Official Account articles via the mp.weixin.qq.com search interface, including account search, article listing, incremental crawling, and fetching articles as Markdown with local SQLite storage.

README

wechat-mp-mcp

A Model Context Protocol (MCP) server for crawling WeChat Official Account articles via the mp.weixin.qq.com subscription-account search interface.

Works with any MCP-compatible client (Claude Code, Claude Desktop, Cline, Continue, Cursor, etc.).

What it does

Given a WeChat Official Account name, this server can:

search for the account and resolve its internal fakeid
pull the article list (full history on first crawl, incremental afterwards)
fetch a single article and convert its body to Markdown
store everything in a local SQLite database (deduped by URL)

Built-in safeguards: daily quota cap (default 150), jittered delays, randomized page sizes, work-hours gate.

Requirements

Python 3.10+
A personal WeChat subscription account (订阅号). You log in once via QR code at https://mp.weixin.qq.com/, then this server reuses your login session to call the public-account search interface.

Why a subscription account? The search interface is normally used by account operators when writing articles ("insert link → from another account"). This server reuses that flow. The account is only used as a login key — you don't need to publish anything from it. Register one at https://mp.weixin.qq.com/ (free, ~15 minutes, ID verification required).

Personal WeChat is NOT touched. Your chat history, payments, and friends are completely unrelated to this — only your subscription-account backend session is used.

Install

git clone https://github.com/fdslk/WECHAT-MP-MCP.git wechat-mp-mcp
cd wechat-mp-mcp
./install.sh

install.sh is idempotent. It will:

check Python 3.10+
create .venv and install all deps (including Playwright Chromium ~92 MB)
prompt you to scan a QR code to log in (Chromium opens automatically)
prompt to register with Claude Code if the claude CLI is on your PATH

Rerun it anytime — it skips steps that are already done.

Manual install (if you don't want install.sh)

python3 -m venv .venv
.venv/bin/pip install -e '.[login-auto]'        # drop [login-auto] to skip Playwright
.venv/bin/playwright install chromium            # ~92 MB one-time
.venv/bin/wechat-mp-mcp-login-auto               # opens browser for QR scan
claude mcp add wechat-mp --scope user $(pwd)/.venv/bin/wechat-mp-mcp

Login

install.sh runs login for you. If you skipped it, or your session expired (typical lifetime 1-2h), re-run:

.venv/bin/wechat-mp-mcp-login-auto   # automated: Chromium + QR scan
# or
.venv/bin/wechat-mp-mcp-login        # manual: paste URL + Cookie from DevTools

Credentials are saved to ~/.config/wechat-mp-mcp/auth.json (chmod 600).

Wire into other MCP clients

For Claude Code, install.sh does this. For everything else (Claude Desktop, Cline, Continue, Cursor, etc.), add a stdio entry:

{
  "mcpServers": {
    "wechat-mp": {
      "command": "/absolute/path/to/wechat-mp-mcp/.venv/bin/wechat-mp-mcp"
    }
  }
}

Tools exposed

Tool	Cost	Purpose
`search_account(query, limit=5)`	1 API call	Resolve account name → `fakeid`
`list_articles_page(fakeid, begin=0, count=5)`	1 API call	Pull one page of metadata. Use for full-history crawl — call repeatedly with growing `begin` until `returned == 0`
`crawl_incremental(fakeid, max_pages=10, delay_seconds=1.5, override_work_hours=False)`	1-N API calls	Pull only articles newer than what is stored. Stops on first overlap with local DB
`fetch_article(url, save=True)`	0 (public page)	Fetch + parse one article body to Markdown
`quota_status()`	0 (local)	Report today's API call usage vs daily cap
`list_stored_articles(fakeid, limit=20, offset=0, with_body=False)`	0 (local)	Query the local store

fetch_article doesn't consume the daily quota because article pages (mp.weixin.qq.com/s/...) are public — only the search backend has a cap.

Typical workflow (natural language)

After wiring into Claude Code, you can just say:

"Search the WeChat account 'Foo Bar' and show me the latest 5 articles."
"Crawl all history for 'Foo Bar' (61 articles total)."
"Show me what's new in 'Foo Bar' today."
"Fetch the latest article and summarize it in 3 bullets."
"How much daily quota have I used?"

The LLM chooses the right tool. Full crawl: LLM walks list_articles_page itself. Incremental: server-side loop in crawl_incremental.

Configuration (env vars)

Var	Default	Effect
`WECHAT_MP_MCP_DAILY_LIMIT`	`150`	Daily backend-call cap (WeChat's own limit is ~200/day per account — stay below)
`WECHAT_MP_MCP_WORK_HOURS`	`8-23`	Local-time window for `crawl_incremental`. Set `0-24` to disable
`WECHAT_MP_MCP_HOME`	`~/.config/wechat-mp-mcp`	Where to put `auth.json`
`WECHAT_MP_MCP_DB`	`$HOME/wechat.db`	SQLite database path

Anti-detection

A crawler hitting a fixed-interval pagination is the easiest pattern to flag. This server already does:

Jittered delays: 1.5s * [0.7, 2.0] between pages, with ~8% chance of a 30-90s "tea break"
Random page sizes: 3-6 articles per request (biased to 5)
Daily quota cap: hard stop at 150 (25% buffer below WeChat's ~200)
Work-hours gate: crawl_incremental refuses outside 8am-11pm by default — real account operators don't run at 3am
Realistic headers: Referer mimics the article editor page, X-Requested-With: XMLHttpRequest
Public article pages don't count: bodies are fetched from public URLs, not the rate-limited backend

You can still get rate-limited if you crawl aggressively. The 1-2h cookie expiry is normal session timeout, not a punishment.

Risks

The subscription-account search interface is undocumented. WeChat can change it at any time; expect the request shape to need re-tuning every few months.
Read counts / likes / "看一看" are not available through this path. Those require intercepting the WeChat App's traffic (out of scope).
Use a dedicated subscription account for crawling — don't use one you actively operate. Worst case is a 24h freq-control lock on the search interface; the account itself isn't banned.
Don't share your auth.json. If WeChat sees the same session from multiple IPs, the account gets flagged as compromised.

Storage

SQLite at ~/.config/wechat-mp-mcp/wechat.db by default. Tables:

account(fakeid PK, nickname, alias, ...)
article(link PK, fakeid, title, update_time, body_markdown, ...)
quota(date PK, count) — per-day API counter

Inspect with any SQLite client. URL is the article's primary key, so re-crawling never produces duplicates.

Tests

.venv/bin/python tests/test_flows.py           # 29 offline checks (no auth needed)
.venv/bin/python tests/live_check.py           # E2E against real WeChat (1 search + 1 list + 1 fetch)
.venv/bin/python tests/live_crawl.py <fakeid>  # Live incremental crawl

Live tests require a valid auth.json. live_check.py uses WECHAT_MP_TEST_QUERY env var (default: 央视新闻) or first CLI arg for the target account.