token-saver

token-saver

MCP plugin that alerts you when AI token usage is wasteful. Fires warnings, errors, and alerts on large outputs, verbose logs, and repetitive history, auto-suppressing noise to keep your context lean.

Category
访问服务器

README

token-saver

MCP plugin that alerts you when AI token usage is wasteful. Works with Claude Code, Cursor, Windsurf, Zed, Continue.dev — any MCP-compatible client, any model. Fires warnings, errors, and alerts on large outputs, verbose logs, and repetitive history. Auto-suppresses noise to keep your context lean.

CI npm version License: MIT Node.js


Overview

In agentic coding sessions, AI model responses often contain massive log outputs, repeated tool results, or near-duplicate history entries — all of which are re-sent on every turn, burning tokens. token-saver monitors every output and tells you when something is wasteful, so you can suppress it before it poisons your context window.

Works with any MCP-compatible client: Claude Code, Cursor, Windsurf, Zed, Continue.dev, and any other tool that speaks the Model Context Protocol. No dependency on any specific AI provider or API — token-saver analyzes plain text and is model-agnostic by design.

Core value proposition: Most token waste in long AI sessions comes from outputs nobody actually reads — stack traces, verbose logs, repeated file contents. token-saver catches these early and tells you exactly why and how much you're wasting.


How it works

Your AI model output (Claude, GPT, Gemini, or any other)
        │
        ▼
  check_output          ← estimates tokens, detects log/noise patterns
        │
        ▼
  alert level           ← info / warning / error / alert
        │
        ▼
  shouldSuppress        ← true if output matches suppression criteria
        │
        ▼
  get_session_stats     ← cumulative waste report for the session

Installation

Three ways — pick what suits you:

Option A — npx (no install, always latest)

No global install needed. Add directly to your MCP client config:

{
  "mcpServers": {
    "token-saver-mcp": {
      "command": "npx",
      "args": ["-y", "token-saver-mcp"]
    }
  }
}

Option B — npm global

npm install -g token-saver-mcp

Then add to your MCP client config:

{
  "mcpServers": {
    "token-saver-mcp": {
      "command": "token-saver-mcp"
    }
  }
}

Option C — install directly from GitHub

npm install -g github:flightlesstux/token-saver

Same config as Option B. Works without a build step — compiled output is included in the repo.


Tools

Tool Description
set_mode Switch mode: off (default, silent) · monitor (analyze only) · active (full suppression). Start here.
check_output Analyze a text output. Returns alert level, token count, suppression flag, and detected patterns.
analyze_history Scan a messages array for near-duplicates and ignored log outputs. Returns suggested truncation and savings estimate.
get_session_stats Cumulative session statistics: tokens analyzed, suppressed, saved, and alert counts.
reset_session_stats Reset session statistics to zero.
set_thresholds Override warning/error/alert token thresholds and suppression flags for the current session.

Example usage

1. Enable the plugin (off by default)

{ "name": "set_mode", "arguments": { "mode": "active" } }
{ "mode": "active" }

2. Check a suspicious output

{ "name": "check_output", "arguments": { "text": "[INFO] server started\n[DEBUG] connection ok\n[TRACE] request received\n..." } }
{
  "alertLevel": "warning",
  "tokens": 87,
  "outputType": "log",
  "shouldSuppress": true,
  "reason": "Output matches log/noise patterns and will be suppressed",
  "detectedPatterns": [
    { "pattern": "\\[INFO\\]", "matchCount": 5, "description": "Log pattern matched 5 times" },
    { "pattern": "\\[DEBUG\\]", "matchCount": 5, "description": "Log pattern matched 5 times" }
  ]
}

3. Scan conversation history for waste

{ "name": "analyze_history", "arguments": { "messages": [ ...your messages array... ] } }
{
  "totalMessages": 6,
  "totalTokens": 114,
  "repetitiveMessages": [
    { "index": 2, "role": "user", "tokens": 19, "reason": "Near-duplicate of message 0" },
    { "index": 4, "role": "user", "tokens": 19, "reason": "Near-duplicate of message 0" }
  ],
  "suggestedTruncation": 2,
  "estimatedTokenSavings": 38,
  "alertLevel": "alert"
}

4. Session summary

{ "name": "get_session_stats", "arguments": {} }
{
  "turns": 5,
  "totalTokensAnalyzed": 1416,
  "totalTokensSuppressed": 201,
  "warningsFired": 2,
  "errorsFired": 0,
  "alertsFired": 1,
  "tokensSaved": 201
}

Proof test output

Run python3 test_live.py to verify the full mode/suppression/history flow locally:

============================================================
TOKEN-SAVER PROOF TEST
============================================================

[1] Default mode (off) — all analysis skipped
  [check_output] mode=off skipped=true
  [PASS] mode=off correctly skips analysis

[2] Switch to monitor mode
  [PASS] mode switched to monitor

[3] Short normal output → info
  [check_output] level=info tokens=3 suppress=False
    reason: Output is within normal bounds
  [PASS] info level, no suppression

[4] Large output (>1000 tokens) → warning or higher
  [check_output] level=warning tokens=1125 suppress=False
    reason: Output exceeds warning threshold (1125 tokens >= 1000)
  [PASS] warning level fired at 1125 tokens

[5] Log output in monitor mode → detected, not suppressed
  [check_output] level=info tokens=87 suppress=False
    patterns: 3 matched
  [PASS] patterns detected, suppression=false (monitor mode)

[6] Switch to active mode
  [PASS] mode switched to active

[7] Log output in active mode → suppressed
  [check_output] level=warning tokens=87 suppress=True
    reason: Output matches log/noise patterns and will be suppressed
  [PASS] suppressed 87 log tokens

[8] Repetitive history → alert
  totalMessages=6 totalTokens=114
  repetitive=5 savings=95 level=alert
  [PASS] 95 tokens saveable from repetitive history

[9] Session stats
  turns=5 analyzed=1416 suppressed=201 warnings=2 alerts=1
  [PASS] 201 tokens suppressed this session

============================================================
PROOF SUMMARY
============================================================
  Tokens suppressed this session : 201
  Turns analyzed                 : 5
  Warnings fired                 : 2
  Alerts fired                   : 1

  Overall: ALL CHECKS PASSED
============================================================

Alert levels

Level Trigger
info Output is within normal bounds (<1000 tokens, no noise patterns)
warning Output exceeds 1000 tokens OR matches log/noise patterns
error Output exceeds 5000 tokens
alert Output exceeds 10000 tokens OR repetitive ignored messages exceed inactivity threshold

Configuration

Optional .token-saver.json in your project root:

{
  "warningThresholdTokens": 1000,
  "errorThresholdTokens": 5000,
  "alertThresholdTokens": 10000,
  "suppressLogs": true,
  "suppressRepetitiveHistory": true,
  "logPatterns": [
    "\\[INFO\\]", "\\[DEBUG\\]", "\\[TRACE\\]"
  ],
  "inactivityTurnsBeforeAlert": 3
}

All fields are optional — defaults work well for most projects.


Requirements

  • Node.js >= 24
  • Any MCP-compatible AI client

FAQ

Does it work with non-Claude models and clients? Yes. token-saver has zero dependency on any AI provider or API. It analyzes plain text — Claude, GPT-4, Gemini, Mistral, Llama, whatever. Works with any MCP-compatible client: Claude Code, Cursor, Windsurf, Zed, Continue.dev.

Why is the default mode "off"? Intentional. Install it, verify it's there, then turn it on when you're ready. set_mode("monitor") to observe first, set_mode("active") for full suppression. Your MCP client (Claude) calls this for you when you ask — you don't touch JSON directly.

What's the difference between monitor and active mode? monitor — analyzes and reports waste, never suppresses. active — full mode, sets shouldSuppress: true on matching outputs so your client can skip feeding noise back into context.

Does it actually block or delete anything? No. It sets shouldSuppress: true on noisy outputs and explains why — but never intercepts or modifies any API call. Your client decides what to do with the signal.

How does token counting work? Fast heuristic: ~4 characters per token (English/code average). Not the exact tokenizer — that would add latency. Accurate enough to catch waste at scale.

What's the difference between warning, error, and alert? info — normal output. warning — over 1,000 tokens or log patterns detected. error — over 5,000 tokens. alert — over 10,000 tokens or repetitive ignored history detected.

Can I add custom log patterns? Yes. Add a logPatterns array to .token-saver.json with regex strings. Merged with built-in patterns.

Does it send data anywhere? No. Everything runs locally in memory. No telemetry. Stats evaporate when the MCP server stops. See PRIVACY.md.

Is it free? MIT license. Free forever. No SaaS, no subscription.


Contributing

Contributions are welcome — new detection heuristics, better suppression logic, benchmark improvements, and docs.

Read CONTRIBUTING.md before opening a PR. All commits must follow Conventional Commits. The CI pipeline enforces typechecking, linting, testing, and coverage on every PR.


License

MITflightlesstux.github.io/token-saver

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选