Massive Context MCP

Massive Context MCP

An MCP server that enables processing of massive datasets up to 10M+ tokens using a recursive language model pattern for strategic chunking and analysis. It automates sub-queries and result aggregation using free local inference via Ollama or the Claude API to handle context beyond standard prompt limits.

Category
访问服务器

README

Massive Context MCP

<!-- mcp-name: io.github.egoughnour/massive-context-mcp -->

PyPI MCP Registry Claude Desktop Tests Release License: MIT

Handle massive contexts (10M+ tokens) with chunking, sub-queries, and free local inference via Ollama. Tools in Claude Desktop Based on the Recursive Language Model pattern. Inspired by richardwhiteii/rlm.

Core Idea

Instead of feeding massive contexts directly into the LLM:

  1. Load context as external variable (stays out of prompt)
  2. Inspect structure programmatically
  3. Chunk strategically (lines, chars, or paragraphs)
  4. Sub-query recursively on chunks
  5. Aggregate results for final synthesis

Quick Start

Installation

Option 1: PyPI (Recommended)

uvx massive-context-mcp
# or
pip install massive-context-mcp

Option 2: Claude Desktop One-Click

Download the .mcpb from Releases and double-click to install.

Option 3: From Source

git clone https://github.com/egoughnour/massive-context-mcp.git
cd massive-context-mcp
uv sync

Wire to Claude Code / Claude Desktop

Add to ~/.claude/.mcp.json (Claude Code) or claude_desktop_config.json (Claude Desktop):

{
  "mcpServers": {
    "massive-context": {
      "command": "uvx",
      "args": ["massive-context-mcp"],
      "env": {
        "RLM_DATA_DIR": "~/.rlm-data",
        "OLLAMA_URL": "http://localhost:11434"
      }
    }
  }
}

Tools

Setup & Status Tools

Tool Purpose
rlm_system_check Check system requirements — verify macOS, Apple Silicon, 16GB+ RAM, Homebrew
rlm_setup_ollama Install via Homebrew — managed service, auto-updates, requires Homebrew
rlm_setup_ollama_direct Install via direct download — no sudo, fully headless, works on locked-down machines
rlm_ollama_status Check Ollama availability — detect if free local inference is available

Analysis Tools

Tool Purpose
rlm_auto_analyze One-step analysis — auto-detects type, chunks, and queries
rlm_load_context Load context as external variable
rlm_inspect_context Get structure info without loading into prompt
rlm_chunk_context Chunk by lines/chars/paragraphs
rlm_get_chunk Retrieve specific chunk
rlm_filter_context Filter with regex (keep/remove matching lines)
rlm_exec Execute Python code against loaded context (sandboxed)
rlm_sub_query Make sub-LLM call on chunk
rlm_sub_query_batch Process multiple chunks in parallel
rlm_store_result Store sub-call result for aggregation
rlm_get_results Retrieve stored results
rlm_list_contexts List all loaded contexts

Quick Analysis with rlm_auto_analyze

For most use cases, just use rlm_auto_analyze — it handles everything automatically:

rlm_auto_analyze(
    name="my_file",
    content=file_content,
    goal="find_bugs"  # or: summarize, extract_structure, security_audit, answer:<question>
)

What it does automatically:

  1. Detects content type (Python, JSON, Markdown, logs, prose, code)
  2. Selects optimal chunking strategy
  3. Adapts the query for the content type
  4. Runs parallel sub-queries
  5. Returns aggregated results

Supported goals:

Goal Description
summarize Summarize content purpose and key points
find_bugs Identify errors, issues, potential problems
extract_structure List functions, classes, schema, headings
security_audit Find vulnerabilities and security issues
answer:<question> Answer a custom question about the content

Programmatic Analysis with rlm_exec

For deterministic pattern matching and data extraction, use rlm_exec to run Python code directly against a loaded context. This is closer to the paper's REPL approach and provides full control over analysis logic.

Tool: rlm_exec

Purpose: Execute arbitrary Python code against a loaded context in a sandboxed subprocess.

Parameters:

  • code (required): Python code to execute. Set the result variable to capture output.
  • context_name (required): Name of a previously loaded context.
  • timeout (optional, default 30): Maximum execution time in seconds.

Features:

  • Context available as read-only context variable
  • Pre-imported modules: re, json, collections
  • Subprocess isolation (won't crash the server)
  • Timeout enforcement
  • Works on any system with Python (no Docker needed)

Example — Finding patterns in a loaded context:

# After loading a context
rlm_exec(
    code="""
import re
amounts = re.findall(r'\$[\d,]+', context)
result = {'count': len(amounts), 'sample': amounts[:5]}
""",
    context_name="bill"
)

Example Response:

{
  "result": {
    "count": 1247,
    "sample": ["$500", "$1,000", "$250,000", "$100,000", "$50"]
  },
  "stdout": "",
  "stderr": "",
  "return_code": 0,
  "timed_out": false
}

Example — Extracting structured data:

rlm_exec(
    code="""
import re
import json

# Find all email addresses
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', context)

# Count by domain
from collections import Counter
domains = [e.split('@')[1] for e in emails]
domain_counts = Counter(domains)

result = {
    'total_emails': len(emails),
    'unique_domains': len(domain_counts),
    'top_domains': domain_counts.most_common(5)
}
""",
    context_name="dataset",
    timeout=60
)

When to use rlm_exec vs rlm_sub_query:

Use Case Tool Why
Extract all dates, IDs, amounts rlm_exec Regex is deterministic and fast
Find security vulnerabilities rlm_sub_query Requires reasoning and context
Parse JSON/XML structure rlm_exec Standard libraries work perfectly
Summarize themes or tone rlm_sub_query Natural language understanding needed
Count word frequencies rlm_exec Simple computation, no AI needed
Answer "Why did X happen?" rlm_sub_query Requires inference and reasoning

Tip: For large contexts, combine both — use rlm_exec to filter/extract, then rlm_sub_query for semantic analysis of filtered results.

Providers & Auto-Detection

RLM automatically detects and uses the best available provider:

Provider Default Model Cost Use Case
auto (best available) $0 or ~$0.80/1M Default — prefers Ollama if available
ollama gemma3:12b $0 Local inference, requires Ollama
claude-sdk claude-haiku-4-5 ~$0.80/1M input Cloud inference, always available

How Auto-Detection Works

When you use provider="auto" (the default), RLM:

  1. Checks if Ollama is running at OLLAMA_URL (default: http://localhost:11434)
  2. Checks if gemma3:12b is available (or any gemma3 variant)
  3. Uses Ollama if available, otherwise falls back to Claude SDK

The status is cached for 60 seconds to avoid repeated network checks.

Check Ollama Status

Use rlm_ollama_status to see what's available:

rlm_ollama_status()

Response when Ollama is ready:

{
  "running": true,
  "models": ["gemma3:12b", "llama3:8b"],
  "default_model_available": true,
  "best_provider": "ollama",
  "recommendation": "Ollama is ready! Sub-queries will use free local inference by default."
}

Response when Ollama is not available:

{
  "running": false,
  "error": "connection_refused",
  "best_provider": "claude-sdk",
  "recommendation": "Ollama not available. Sub-queries will use Claude API. To enable free local inference, install Ollama and run: ollama serve"
}

Transparent Provider Selection

All sub-query responses include which provider was actually used:

{
  "provider": "ollama",
  "model": "gemma3:12b",
  "requested_provider": "auto",
  "response": "..."
}

Autonomous Usage

Enable Claude to use RLM tools automatically without manual invocation:

1. CLAUDE.md Integration Copy CLAUDE.md.example content to your project's CLAUDE.md (or ~/.claude/CLAUDE.md for global) to teach Claude when to reach for RLM tools automatically.

2. Hook Installation Copy the .claude/hooks/ directory to your project to auto-suggest RLM when reading files >10KB:

cp -r .claude/hooks/ /Users/your_username/your-project/.claude/hooks/

The hook provides guidance but doesn't block reads.

3. Skill Reference Copy the .claude/skills/ directory for comprehensive RLM guidance:

cp -r .claude/skills/ /Users/your_username/your-project/.claude/skills/

With these in place, Claude will autonomously detect when to use RLM instead of reading large files directly into context.

Setting Up Ollama (Free Local Inference)

RLM can automatically install and configure Ollama on macOS with Apple Silicon. There are two installation methods with different trade-offs:

Choosing an Installation Method

Aspect rlm_setup_ollama (Homebrew) rlm_setup_ollama_direct (Direct Download)
Sudo required Only if Homebrew not installed ❌ Never
Homebrew required ✅ Yes ❌ No
Auto-updates ✅ Yes (brew upgrade) ❌ Manual
Service management brew services (launchd) ⚠️ ollama serve (foreground)
Install location /opt/homebrew/ ~/Applications/
Locked-down machines ⚠️ May fail ✅ Works
Fully headless ⚠️ May prompt for sudo ✅ Yes

Recommendation:

  • Use Homebrew method if you have Homebrew and want managed updates
  • Use Direct Download for automation, locked-down machines, or when you don't have admin access

Method 1: Homebrew Installation (Recommended if you have Homebrew)

# 1. Check if your system meets requirements
rlm_system_check()

# 2. Install via Homebrew
rlm_setup_ollama(install=True, start_service=True, pull_model=True)

What this does:

  • Installs Ollama via Homebrew (brew install ollama)
  • Starts Ollama as a managed background service (brew services start ollama)
  • Pulls gemma3:12b model (~8GB download)

Requirements:

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • 16GB+ RAM (gemma3:12b needs ~8GB to run)
  • Homebrew installed

Method 2: Direct Download (Fully Headless, No Sudo)

# 1. Check system (Homebrew NOT required for this method)
rlm_system_check()

# 2. Install via direct download - no sudo, no Homebrew
rlm_setup_ollama_direct(install=True, start_service=True, pull_model=True)

What this does:

  • Downloads Ollama from https://ollama.com/download/Ollama-darwin.zip
  • Extracts to ~/Applications/Ollama.app (user directory, no admin needed)
  • Starts Ollama via ollama serve (background process)
  • Pulls gemma3:12b model

Requirements:

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • 16GB+ RAM
  • No special permissions needed!

Note on PATH: After direct installation, the CLI is at:

~/Applications/Ollama.app/Contents/Resources/ollama

Add to your shell config if needed:

export PATH="$HOME/Applications/Ollama.app/Contents/Resources:$PATH"

For Systems with Less RAM

Use a smaller model on either installation method:

rlm_setup_ollama(install=True, start_service=True, pull_model=True, model="gemma3:4b")
# or
rlm_setup_ollama_direct(install=True, start_service=True, pull_model=True, model="gemma3:4b")

Manual Setup

If you prefer manual installation or are on a different platform:

  1. Install Ollama from https://ollama.ai or via Homebrew:

    brew install ollama
    
  2. Start the service:

    brew services start ollama
    # or: ollama serve
    
  3. Pull the model:

    ollama pull gemma3:12b
    
  4. Verify it's working:

    rlm_ollama_status()
    

Provider Selection

RLM automatically uses Ollama when available. You can also force a specific provider:

# Auto-detection (default) - uses Ollama if available
rlm_sub_query(query="Summarize", context_name="doc")

# Explicitly use Ollama
rlm_sub_query(query="Summarize", context_name="doc", provider="ollama")

# Explicitly use Claude SDK
rlm_sub_query(query="Summarize", context_name="doc", provider="claude-sdk")

Usage Example

Basic Pattern

# 0. (Optional) First-time setup on macOS - choose ONE method:

# Option A: Homebrew (if you have it)
rlm_system_check()
rlm_setup_ollama(install=True, start_service=True, pull_model=True)

# Option B: Direct download (no sudo, fully headless)
rlm_system_check()
rlm_setup_ollama_direct(install=True, start_service=True, pull_model=True)

# 0b. (Optional) Check if Ollama is available for free inference
rlm_ollama_status()

# 1. Load a large document
rlm_load_context(name="report", content=<large document>)

# 2. Inspect structure
rlm_inspect_context(name="report", preview_chars=500)

# 3. Chunk into manageable pieces
rlm_chunk_context(name="report", strategy="paragraphs", size=1)

# 4. Sub-query chunks in parallel (auto-uses Ollama if available)
rlm_sub_query_batch(
    query="What is the main topic? Reply in one sentence.",
    context_name="report",
    chunk_indices=[0, 1, 2, 3],
    concurrency=4
)

# 5. Store results for aggregation
rlm_store_result(name="topics", result=<response>)

# 6. Retrieve all results
rlm_get_results(name="topics")

Processing a 2MB Document

Tested with H.R.1 Bill (2MB):

# Load
rlm_load_context(name="bill", content=<2MB XML>)

# Chunk into 40 pieces (50K chars each)
rlm_chunk_context(name="bill", strategy="chars", size=50000)

# Sample 8 chunks (20%) with parallel queries
# (auto-uses Ollama if running, otherwise Claude SDK)
rlm_sub_query_batch(
    query="What topics does this section cover?",
    context_name="bill",
    chunk_indices=[0, 5, 10, 15, 20, 25, 30, 35],
    concurrency=4
)

Result: Comprehensive topic extraction at $0 cost (with Ollama) or ~$0.02 (with Claude).

Analyzing War and Peace (3.3MB)

Literary analysis of Tolstoy's epic novel from Project Gutenberg:

# Download the text
curl -o war_and_peace.txt https://www.gutenberg.org/files/2600/2600-0.txt
# Load into RLM (3.3MB, 66K lines)
rlm_load_context(name="war_and_peace", content=open("war_and_peace.txt").read())

# Chunk by lines (1000 lines per chunk = 67 chunks)
rlm_chunk_context(name="war_and_peace", strategy="lines", size=1000)

# Sample 10 chunks evenly across the book (15% coverage)
sample_indices = [0, 7, 14, 21, 28, 35, 42, 49, 56, 63]

# Extract characters from each sampled section
rlm_sub_query_batch(
    query="List major characters in this section with brief descriptions.",
    context_name="war_and_peace",
    chunk_indices=sample_indices,
    provider="claude-sdk",  # Haiku 4.5
    concurrency=8
)

Result: Complete character arc across the novel — Pierre's journey from idealist to prisoner to husband, Natásha's growth, Prince Andrew's philosophical struggles — all for ~$0.03.

Metric Value
File size 3.35 MB
Lines 66,033
Chunks 67
Sampled 10 (15%)
Cost ~$0.03

Data Storage

$RLM_DATA_DIR/
├── contexts/     # Raw contexts (.txt + .meta.json)
├── chunks/       # Chunked versions (by context name)
└── results/      # Stored sub-call results (.jsonl)

Contexts persist across sessions. Chunked contexts are cached for reuse.

Architecture

Claude Code
    │
    ▼
RLM MCP Server
    │
    ├─► rlm_ollama_status ─► Check availability (cached 60s)
    │
    └─► provider="auto" (default)
            │
            ├─► ollama (if running) ─► Local LLM (gemma3:12b) ─► $0
            │
            └─► claude-sdk (fallback) ─► Anthropic API ─► ~$0.80/1M

The key insight: context stays external. Instead of stuffing 2MB into your prompt, load it once, chunk it, and make targeted sub-queries. Claude orchestrates; sub-models do the heavy lifting.

Cost optimization: RLM automatically uses free local inference when Ollama is available, falling back to Claude API only when needed.

Learning Prompts

Use these prompts with Claude Code to explore the codebase and learn RLM patterns. The code is the single source of truth.

Understanding the Tools

Read src/rlm_mcp_server.py and list all RLM tools with their parameters and purpose.
Explain the chunking strategies available in rlm_chunk_context.
When would I use each one?
What's the difference between rlm_sub_query and rlm_sub_query_batch?
Show me the implementation.

Understanding the Architecture

Read src/rlm_mcp_server.py and explain how contexts are stored and persisted.
Where does the data live?
How does the claude-sdk provider extract text from responses?
Walk me through _call_claude_sdk.
What happens when I call rlm_load_context? Trace the full flow.

Hands-On Learning

Load the README as a context, chunk it by paragraphs,
and run a sub-query on the first chunk to summarize it.
Show me how to process a large file in parallel using rlm_sub_query_batch.
Use a real example.
I have a 1MB log file. Walk me through the RLM pattern to extract all errors.

Extending RLM

Read the test file and explain what scenarios are covered.
What edge cases should I be aware of?
How would I add a new chunking strategy (e.g., by regex delimiter)?
Show me where to modify the code.
How would I add a new provider (e.g., OpenAI)?
What functions need to change?

License

MIT

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选