code-mode-toon
A lightweight Model Context Protocol (MCP) orchestrator designed for efficiency at scale. It features TOON compression (reducing token usage by 30-90%) and Lazy Loading, making it the ideal solution for complex, multi-tool agentic workflows.
README
CodeModeTOON MCP Server
<table> <tr> <td> <a href="https://glama.ai/mcp/servers/@ziad-hsn/code-mode-toon"> <img width="380" height="200" src="https://glama.ai/mcp/servers/@ziad-hsn/code-mode-toon/badge" alt="Glama MCP Server Badge" /> </a> </td> <td> <strong>Listed on Glama MCP Directory</strong><br/> One-click installation for AI assistants </td> </tr> </table>
A lightweight Model Context Protocol (MCP) orchestrator designed for efficiency at scale. It features TOON compression (reducing token usage by 30-90%) and Lazy Loading, making it the ideal solution for complex, multi-tool agentic workflows.
The "Context Trap" in Agentic Workflows
Recent articles from Anthropic and Cloudflare (see Here) highlights a critical bottleneck: AI agents struggle with complex, multi-step workflows because they lack state.
While Code Execution (e.g., TypeScript) allows agents to maintain state and structure workflows effectively, it introduces a new problem: Data Bloat. Real-world operations (like SRE log analysis or database dumps) generate massive JSON payloads that explode the context window, making stateful execution prohibitively expensive.
CodeModeTOON bridges this gap. It enables:
- Stateful Execution: Run complex TypeScript workflows to maintain context outside the model.
- Context Efficiency: Use TOON Compression to "zip" the results, allowing agents to process massive datasets without blowing their token budget.
How It Works
graph LR
A[AI Agent<br/>Claude/Cursor] -->|JSON-RPC| B[CodeModeTOON<br/>Server]
B -->|Lazy Load| C[Perplexity]
B -->|Lazy Load| D[Context7]
B -->|Lazy Load| E[Custom Servers]
C -->|Raw JSON| B
D -->|Raw JSON| B
E -->|Raw JSON| B
B -->|TOON<br/>Compressed| A
style B fill:#4f46e5,color:#fff
style A fill:#10b981,color:#fff
Data Flow: Requests route through CodeModeTOON → Servers are lazy-loaded on-demand → Responses are TOON-compressed before returning to the agent.
🔥 Key Features
🗜️ TOON Compression
Reduces token usage by 30-90% for structured data.
- Validated: ~83% savings on Kubernetes audits
- Best for: SRE logs, database dumps, API responses
- How it works: Schema extraction + value compression
⚡ Lazy Loading
Servers only start when needed. Zero overhead for unused tools.
- Best for: Multi-tool workflows, resource-constrained environments
- Performance: Sub-100ms startup for active servers
🔒 Sandboxed Execution
Secure JS execution with auto-proxied MCP tool access.
- Best for: Complex stateful workflows, batch operations
- Security: Uses Node.js
vmmodule (not for multi-tenant use)
🤖 Agent-Friendly Features
Designed for programmatic discovery and self-correction.
suggest_approach: Meta-tool that recommends the best execution strategy (code vs workflow vs direct call).- Efficiency Metrics:
execute_codereturns operation counts and compression savings to reinforce efficient behavior. - Recovery Hints: Error messages include actionable next steps for agents (e.g., "Server not found? Try list_servers").
Table of Contents
- The Context Trap
- How It Works
- Key Features
- When to Use
- Installation
- Quick Start
- Usage Examples
- Workflows
- Performance Benchmark
- Troubleshooting
- Security
- Contributing
- License
When to Use CodeModeTOON
✅ Perfect for:
- Multi-step AI workflows requiring state management
- Processing large structured datasets (logs, DB dumps, K8s manifests)
- Coordinating multiple MCP servers in parallel
- Token-constrained environments (reducing API costs)
❌ Not ideal for:
- Simple single-tool queries
- Unstructured text-heavy responses (compression <10%)
- Multi-tenant production servers (vm module security limitation)
Installation
One‑Click (Cursor)
Manual Setup
Add this to your ~/.cursor/mcp.json:
{
"mcpServers": {
"code-mode-toon": {
"type": "stdio",
"command": "npx",
"args": ["-y", "code-mode-toon"],
"env": {
"CODE_MODE_TOON_CONFIG": "~/.cursor/mcp.json"
}
}
}
}
🧠 Claude Skills
CodeModeTOON includes a pre-built Claude Skill to make your AI assistant an expert at using this orchestrator.
code-mode-toon-workflow-expert
A specialized skill that teaches Claude how to:
- Decide when to use a workflow vs ad-hoc code.
- Create new workflows following best practices.
- Orchestrate multiple tools efficiently.
Installation:
- Unzip
claude-skills/code-mode-toon-workflow-expert.skill - Place the folder in your
.claude/skills/directory (or import via Claude desktop app).
🤖 AI Assistant Prompts
Copy these prompts into your AI's custom instructions (e.g., .cursorrules or Claude Project instructions) to maximize CodeModeTOON's potential.
1. System Identity & Orchestration (Essential)
Goal: Teaches the AI to act as an orchestrator and prioritize workflows.
YOU ARE AN AGENTIC ORCHESTRATOR. You have access to "CodeModeTOON", a high-efficiency MCP bridge.
1. PRIORITIZE WORKFLOWS: Before running single tools, check `list_workflows`. If a workflow exists (e.g., `research`, `k8s-detective`), USE IT. It is faster and saves tokens.
2. HANDLE COMPRESSED DATA: Outputs may be "TOON encoded" (highly compressed JSON). This is normal. Do not complain about "unreadable data" - simply parse it or ask for specific fields if needed.
3. BATCH OPERATIONS: Never run 3+ sequential tool calls if they can be batched. Use `execute_code` to run them in a single block.
2. Tool Discovery (Lazy Loading)
Goal: Prevents the AI from giving up if a tool isn't immediately visible.
TOOLS ARE LAZY LOADED. If you need a capability (e.g., "search", "kubernetes", "database") and don't see the tool:
1. DO NOT assume it's missing.
2. RUN `search_tools({ query: "..." })` to find it.
3. RUN `get_tool_api({ serverName: "..." })` to learn how to use it.
4. Only then, execute the tool.
3. Efficiency & TOON Compression
Goal: Enforces token-saving behaviors for large data operations.
OPTIMIZE FOR TOKENS. When fetching large datasets (logs, docs, API responses):
1. ALWAYS wrap the output in `TOON.encode(data)` inside `execute_code`.
2. PREFER structured data (JSON/Objects) over plain text. TOON compresses structure by ~83%, but text by only ~4%.
3. IF synthesizing data, do it server-side (via workflow `synthesize: true`) to avoid pulling raw data into context.
Quick Start
After installation, try this 30-second demo in Claude or Cursor:
// Ask your AI assistant to run this via execute_code
const api = await get_tool_api({ serverName: 'perplexity' });
const result = await servers['perplexity'].perplexity_ask({
messages: [{ role: 'user', content: "Explain TOON compression" }]
});
console.log(result); // See compression in action! ~40% token savings
What just happened? The response was automatically TOON-encoded, saving tokens.
Usage Examples
<details> <summary><strong>1. Optimized Tool Execution with TOON Compression</strong></summary>
// Inside execute_code
const api = await get_tool_api({ serverName: 'perplexity' });
// Request large data - automatically compressed!
const result = await servers['perplexity'].perplexity_ask({
messages: [{ role: 'user', content: "Summarize the history of Rome" }]
});
console.log(result); // Returns TOON-encoded string, saving ~40% tokens
</details>
<details> <summary><strong>2. Multi-Server Coordination</strong></summary>
// Fetch large documentation from Context7
const api = await get_tool_api({ serverName: 'context7' });
const docs = await servers['context7']['get-library-docs']({
context7CompatibleLibraryID: 'kubernetes/kubernetes'
});
console.log(TOON.encode(docs)); // Massive compression on structured data
</details>
<details> <summary><strong>3. Workflow Orchestration</strong></summary>
// Run a complex research workflow
const result = await workflows.research({
goal: "Compare xsync vs sync.Map performance",
queries: ["xsync vs sync.Map benchmarks"],
synthesize: true,
outputFile: "/tmp/research.toon"
});
console.log(result.synthesis); // LLM-synthesized findings
</details>
Workflows
CodeModeTOON supports Workflows—pre-defined, server-side TypeScript modules that orchestrate multiple MCP tools.
Research Workflow
A powerful research assistant that:
- Parallelizes data fetching from multiple sources (Context7, Wikipedia, Perplexity).
- Synthesizes findings using LLMs (optional).
- Outputs TOON-encoded files for maximum context efficiency.
- Retries failed requests automatically.
See .workflows/README.md for detailed documentation, usage examples, and AI prompts.
Performance Benchmark
Why This Matters
Scenario 2 (92% savings) demonstrates CodeModeTOON's strength:
| Metric | Original | TOON | Savings |
|---|---|---|---|
| Characters | 37,263 | 2,824 | ~83% |
| Estimated Tokens* | ~9,315 | ~706 | ~8,600 tokens |
| Cost (Claude Sonnet)** | $0.028 | $0.002 | $0.026 |
*Assuming 4 chars/token average
***$3/M tokens input pricing*
Key Insight: For infrastructure audits, log analysis, or database dumps, TOON compression can reduce token costs by 90%+, making complex agentic workflows feasible within budget.
<details> <summary><strong>Detailed Scenarios</strong></summary>
Scenario 1: Natural Language Query (History of Rome) Unstructured text compresses poorly, as expected.
- Original JSON: 11,651 chars
- TOON Encoded: 11,166 chars
- Compression Ratio: ~4.16% Savings
Scenario 2: Kubernetes Cluster Audit (50 Pods) Highly structured, repetitive JSON (infrastructure dumps) compresses extremely well.
- Original JSON: 37,263 chars
- TOON Encoded: 2,824 chars
- Compression Ratio: ~83% Savings 📉 </details>
Troubleshooting
"Server not found" error
Cause: CodeModeTOON can't locate your MCP config.
Solution: Ensure CODE_MODE_TOON_CONFIG points to your config:
export CODE_MODE_TOON_CONFIG=~/.cursor/mcp.json
TOON encoding not working
Cause: Results aren't being encoded.
Solution: Use console.log(TOON.encode(data)), not console.log(data).
Lazy server won't load
Cause: Server name mismatch.
Solution: Verify server name matches your config. Use get_tool_api({ serverName: 'name' }) to inspect available servers.
Security Note
⚠️ The vm module is NOT a security sandbox. Suitable for personal AI assistant use (Claude, Cursor) with trusted code. Not for multi-tenant or public services.
Acknowledgments
- Anthropic: Code execution with MCP
- Cloudflare: Code Mode announcement
Author
Built by Ziad Hassan (Senior SRE/DevOps) — LinkedIn · GitHub
Contributing
Contributions are welcome! 🙌
Ways to Contribute
- Report bugs - Open an issue with reproduction steps
- Suggest features - Discuss use cases in Issues
- Add workflows - See Workflows
- Improve docs - Documentation PRs always welcome
Development Setup
git clone https://github.com/ziad-hsn/code-mode-toon.git
cd code-mode-toon
npm install
npm test
License
MIT License — see LICENSE for details.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。