Aleph
Enables AI assistants to analyze documents larger than their context window by loading files into RAM and querying them via search, navigation, and Python execution tools. Supports recursive reasoning to process massive datasets in chunks using sub-agents.
README
Aleph
Aleph is an MCP (Model Context Protocol) server that enables AI assistants to analyze documents too large for their context window. By implementing a Recursive Language Model (RLM) approach, it allows models to search, explore, and compute over massive datasets without exhausting their token limits.
Key Capabilities
- Unlimited Context: Load files as large as your system RAM allows—gigabytes of data accessible via simple queries. The LLM never sees the raw file; it queries a Python process that holds the data in memory.
- Navigation Tools: High-performance regex search and line-based navigation.
- Compute Sandbox: Execute Python code over loaded content for parsing and analysis.
- Evidence Tracking: Automatic citation of source text for grounded answers.
- Recursive Reasoning: Spawn sub-agents to process document chunks in parallel.
How "Unlimited Context" Works
Traditional LLMs are limited by their context window (~200K tokens). Aleph sidesteps this entirely:
┌─────────────────┐ queries ┌─────────────────────────┐
│ LLM Context │ ───────────────► │ Python Process (RAM) │
│ (~200K tokens)│ ◄─────────────── │ (8GB, 32GB, 64GB...) │
│ │ small results │ └── your_file.txt │
└─────────────────┘ └─────────────────────────┘
- Python loads the entire file into RAM as a string
- The LLM queries it via
search(),peek(),lines(), etc. - Only query results (kilobytes) enter the LLM's context—never the full file
- Your RAM is the limit, not the model's context window (with a default 1GB safety cap on action tools)
You can load multiple files or entire repos as separate contexts and query them independently.
A 50MB log file? The LLM sees ~1KB of search results. A 2GB database dump? Same—just the slices you ask for.
By default, Aleph sets a 1GB max file size for action tools to avoid accidental overload, but you can raise it with --max-file-size based on your machine.
This cap applies to load_file / read_file; load_context still accepts any size you can supply in-memory.
Installation
pip install "aleph-rlm[mcp]"
After installation, you can automatically configure popular MCP clients:
aleph-rlm install
MCP Server
Run Aleph as an MCP server with:
aleph
Use --enable-actions to allow file and command tools.
Integration
Claude Desktop / Cursor / Windsurf
Add Aleph to your mcpServers configuration:
{
"mcpServers": {
"aleph": {
"command": "aleph",
"args": ["--enable-actions", "--tool-docs", "concise"]
}
}
}
Install the /aleph skill for the RLM workflow prompt:
mkdir -p ~/.claude/commands
cp /path/to/aleph/docs/prompts/aleph.md ~/.claude/commands/aleph.md
Then use it like:
/aleph: Find the root cause of this test failure and propose a fix.
Claude Code
To use Aleph with Claude Code, register the MCP server and install the workflow prompt:
# Register the MCP server
claude mcp add aleph aleph -- --enable-actions --tool-docs concise
# Add the workflow prompt
mkdir -p ~/.claude/commands
cp docs/prompts/aleph.md ~/.claude/commands/aleph.md
Codex CLI
Add to ~/.codex/config.toml:
[mcp_servers.aleph]
command = "aleph"
args = ["--enable-actions", "--tool-docs", "concise"]
How It Works
- Load: Store a document in external memory via
load_contextorload_file(with--enable-actions). - Explore: Search for patterns using
search_contextor view slices withpeek_context. - Compute: Run Python scripts over the content in a secure sandbox via
exec_python. - Finalize: Generate an answer with linked evidence and citations using
finalize.
Recursion: Handling Very Large Inputs
When content is too large even for slice-based exploration, Aleph supports recursive decomposition:
- Chunk the content into manageable pieces
- Spawn sub-agents to analyze each chunk
- Synthesize findings into a final answer
# exec_python
chunks = chunk(100_000) # split into ~100K char pieces
results = [sub_query("Extract key findings.", context_slice=c) for c in chunks]
final = sub_query("Synthesize into a summary:", context_slice="\n\n".join(results))
sub_query can use an API backend (OpenAI-compatible) or spawn a local CLI (Claude, Codex, Aider) - whichever is available.
Sub-query backends
When ALEPH_SUB_QUERY_BACKEND is auto (default), Aleph chooses the first available backend:
- API - if API credentials are available
- claude CLI - if installed
- codex CLI - if installed
- aider CLI - if installed
Quick setup:
# OpenAI-compatible API (OpenAI, Groq, Together, local LLMs, etc.)
export ALEPH_SUB_QUERY_API_KEY=sk-...
export ALEPH_SUB_QUERY_MODEL=gpt-5.2-codex
# Optional: custom endpoint
export ALEPH_SUB_QUERY_URL=https://api.your-provider.com/v1
Note: Some MCP clients don't reliably pass
envvars from their config to the server process. Ifsub_queryreports "API key not found" despite your client's MCP settings, add the exports to your shell profile (~/.zshrcor~/.bashrc) and restart your terminal/client.
For a full list of options, see docs/CONFIGURATION.md.
Available Tools
Aleph exposes the full toolset below.
Core exploration
| Tool | Description |
|---|---|
load_context |
Store text or JSON in external memory. |
list_contexts |
List loaded contexts and metadata. |
peek_context |
View specific line or character ranges. |
search_context |
Perform regex searches with surrounding context. |
chunk_context |
Split content into navigable chunks. |
diff_contexts |
Diff two contexts (text or JSON). |
exec_python |
Run Python code over the loaded content. |
get_variable |
Retrieve a variable from the exec_python sandbox. |
Reasoning workflow
| Tool | Description |
|---|---|
think |
Structure reasoning for complex problems. |
get_status |
Show current session state. |
get_evidence |
Retrieve collected citations. |
evaluate_progress |
Self-evaluate progress with convergence tracking. |
summarize_so_far |
Summarize progress on long tasks. |
finalize |
Complete with answer and evidence. |
Recursion
| Tool | Description |
|---|---|
sub_query |
Spawn a sub-agent on a content slice. |
Session management
| Tool | Description |
|---|---|
save_session |
Persist current session to file. |
load_session |
Load a saved session from file. |
Recipes and reporting
| Tool | Description |
|---|---|
load_recipe |
Load an Alephfile recipe for execution. |
list_recipes |
List loaded recipes and status. |
finalize_recipe |
Finalize a recipe run and generate a result bundle. |
get_metrics |
Get token-efficiency metrics for a recipe/session. |
export_result |
Export a recipe result bundle to a file. |
sign_evidence |
Sign evidence bundles for verification. |
Remote MCP orchestration
| Tool | Description |
|---|---|
add_remote_server |
Register a remote MCP server. |
list_remote_servers |
List registered remote MCP servers. |
list_remote_tools |
List tools available on a remote server. |
call_remote_tool |
Call a tool on a remote MCP server. |
close_remote_server |
Close a remote MCP server connection. |
Action tools
Enabled with the --enable-actions flag. Use --workspace-root and --workspace-mode (fixed, git, any) to control scope.
| Tool | Description |
|---|---|
load_file |
Load a workspace file into a context. |
read_file / write_file |
File system access (workspace-scoped). |
run_command |
Shell execution. |
run_tests |
Execute test commands (supports optional cwd). |
Configuration
For full configuration options (limits, budgets, and backend details), see docs/CONFIGURATION.md.
Changelog
Unreleased
- Unlimited context architecture: Clarified that file size is limited by system RAM (with a default 1GB action-tool cap) rather than LLM context windows. Load gigabytes of data and query it with search/peek/lines.
- Added
--workspace-modefor action tools (fixed,git,any) to support multi-repo workflows. - Added optional
cwdforrun_teststo run tests outside the server’s default working directory. - Updated MCP setup docs with multi-repo configuration examples.
Development
git clone https://github.com/Hmbown/aleph.git
cd aleph
pip install -e ".[dev,mcp]"
pytest
See DEVELOPMENT.md for architecture details.
References
Aleph implements the Recursive Language Model (RLM) architecture described in:
Recursive Language Models Zhang, A. L., Kraska, T., & Khattab, O. (2025) arXiv:2512.24601
RLMs treat the input context as an external environment variable rather than part of the prompt. This allows models to programmatically decompose inputs, recursively query themselves over chunks, and synthesize results—processing inputs far beyond their native context window.
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。