astllm-mcp
An MCP server for efficient code indexing and symbol retrieval using tree-sitter AST parsing to fetch specific functions or classes without loading entire files. It significantly reduces AI token costs by providing O(1) byte-offset access to code components across multiple programming languages.
README
astllm-mcp
MCP server for efficient code indexing and symbol retrieval. Index GitHub repos or local folders once with tree-sitter AST parsing, then let AI agents retrieve only the specific symbols they need — instead of loading entire files.
Simple 1 file binary distribution for trivial deployments.
Cut code-reading token costs by up to 99%.
How it works
- Index — fetch source files, parse ASTs with tree-sitter, store symbols with byte offsets
- Explore — browse file trees and outlines without touching file content
- Retrieve — fetch only the exact function/class/method you need via O(1) byte-offset seek
- Savings — every response reports tokens saved vs loading raw files
The index is stored locally in ~/.code-index/ (configurable). Incremental re-indexing only re-parses changed files.
The server automatically indexes the working directory on startup (incremental, non-blocking). Optionally set ASTLLM_WATCH=1 to also watch for file changes and re-index automatically.
Supported languages
Python, JavaScript, TypeScript, TSX, Go, Rust, Java, PHP, Dart, C#, C, C++
Installation
Option 1: Download a pre-built binary (recommended)
Download the binary for your platform from the GitHub Releases page:
| Platform | File |
|---|---|
| macOS ARM (M1/M2/M3) | astllm-mcp-macosx-arm |
| Linux x86-64 | astllm-mcp-linux-x86 |
| Linux ARM64 | astllm-mcp-linux-arm |
# Example for Linux x86-64
curl -L https://github.com/tluyben/astllm-mcp/releases/latest/download/astllm-mcp-linux-x86 -o astllm-mcp
chmod +x astllm-mcp
./astllm-mcp # runs as an MCP stdio server
No Node.js, no npm, no build tools required.
Option 2: Build from source
Requires Node.js 18+ and a C++20-capable compiler (for tree-sitter native bindings).
git clone https://github.com/tluyben/astllm-mcp
cd astllm-mcp
CXXFLAGS="-std=c++20" npm install --legacy-peer-deps
npm run build
Note on Node.js v22+: The
CXXFLAGS="-std=c++20"flag is required because Node.js v22+ v8 headers mandate C++20. The--legacy-peer-depsflag is needed because tree-sitter grammar packages target slightly different tree-sitter core versions.
MCP client configuration
Claude Code
Option A — claude mcp add CLI (easiest):
# Pre-built binary, project-scoped (.mcp.json)
claude mcp add astllm /path/to/astllm-mcp-linux-x86 --scope project
# Pre-built binary, user-scoped (~/.claude.json)
claude mcp add astllm /path/to/astllm-mcp-linux-x86 --scope user
# From source (Node.js), project-scoped
claude mcp add astllm node --args /path/to/astllm-mcp/dist/index.js --scope project
Option B — manual JSON config:
Add to ~/.claude.json (global) or .mcp.json in your project root (project-scoped):
Pre-built binary:
{
"mcpServers": {
"astllm": {
"command": "/path/to/astllm-mcp-linux-x86",
"type": "stdio"
}
}
}
From source (Node.js):
{
"mcpServers": {
"astllm": {
"command": "node",
"args": ["/path/to/astllm-mcp/dist/index.js"],
"type": "stdio"
}
}
}
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
Pre-built binary:
{
"mcpServers": {
"astllm": {
"command": "/path/to/astllm-mcp-macosx-arm"
}
}
}
From source (Node.js):
{
"mcpServers": {
"astllm": {
"command": "node",
"args": ["/path/to/astllm-mcp/dist/index.js"]
}
}
}
Tools
Indexing
index_repo
Index a GitHub repository. Fetches source files via the GitHub API, parses ASTs, stores symbols locally.
repo_url GitHub URL or "owner/repo" slug
generate_summaries Generate one-line AI summaries (requires API key, default: false)
incremental Only re-index changed files (default: true)
storage_path Custom storage directory
index_folder
Index a local folder recursively.
folder_path Path to index
generate_summaries AI summaries (default: false)
extra_ignore_patterns Additional gitignore-style patterns
follow_symlinks Follow symlinks (default: false)
incremental Only re-index changed files (default: true)
storage_path Custom storage directory
Navigation
list_repos
List all indexed repositories with file count, symbol count, and last-indexed time.
get_repo_outline
High-level overview: directory breakdown, language distribution, symbol kind counts.
repo Repository identifier ("owner/repo" or short name if unique)
get_file_tree
File and directory structure with per-file language and symbol count. Much cheaper than reading files.
repo Repository identifier
path_prefix Filter to a subdirectory
include_summaries Include per-file summaries
get_file_outline
All symbols in a file as a hierarchical tree (methods nested under their class).
repo Repository identifier
file_path File path relative to repo root
Retrieval
get_symbol
Full source code for a single symbol, retrieved by byte-offset seek (O(1)).
repo Repository identifier
symbol_id Symbol ID from get_file_outline or search_symbols
verify Check content hash for drift detection (default: false)
context_lines Lines of context around the symbol (0–50, default: 0)
get_symbols
Batch retrieval of multiple symbols in one call.
repo Repository identifier
symbol_ids Array of symbol IDs
Search
search_symbols
Search symbols by name, kind, language, or file pattern. Returns signatures and summaries — no source loaded until you call get_symbol.
repo Repository identifier
query Search query
kind Filter: function | class | method | type | constant | interface
file_pattern Glob pattern, e.g. "src/**/*.ts"
language Filter by language
limit Max results 1–100 (default: 50)
search_text
Full-text search across indexed file contents. Useful for string literals, comments, config values.
repo Repository identifier
query Case-insensitive substring
file_pattern Glob pattern to restrict files
limit Max matching lines (default: 100)
Cache
invalidate_cache
Delete a repository's index, forcing full re-index on next operation.
repo Repository identifier
Symbol IDs
Symbol IDs have the format file/path::qualified.Name#kind, for example:
src/auth/login.ts::AuthService.login#method
src/utils.go::parseURL#function
lib/models.py::User#class
Get IDs from get_file_outline or search_symbols, then pass them to get_symbol.
Token savings
Every response includes a _meta envelope:
{
"_meta": {
"timing_ms": 2.1,
"tokens_saved": 14823,
"total_tokens_saved": 89412,
"cost_avoided_claude_usd": 0.222345,
"cost_avoided_gpt_usd": 0.148230,
"total_cost_avoided_claude_usd": 1.34118
}
}
AI summaries (optional)
Set one of these environment variables to enable one-line symbol summaries:
# Anthropic Claude Haiku (recommended)
export ANTHROPIC_API_KEY=sk-ant-...
# Google Gemini Flash
export GOOGLE_API_KEY=...
# OpenAI-compatible (Ollama, etc.)
export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_MODEL=llama3
Summaries use a three-tier fallback: docstring first-line → AI → signature.
Environment variables
| Variable | Default | Description |
|---|---|---|
CODE_INDEX_PATH |
~/.code-index |
Index storage directory |
GITHUB_TOKEN |
— | GitHub API token (higher rate limits, private repos) |
ASTLLM_MAX_INDEX_FILES |
500 |
Max files to index per repo |
ASTLLM_MAX_FILE_SIZE_KB |
500 |
Max file size to index (KB) |
ASTLLM_LOG_LEVEL |
warn |
Log level: debug, info, warn, error |
ASTLLM_LOG_FILE |
— | Log to file instead of stderr |
ASTLLM_WATCH |
0 |
Watch working directory for source file changes and re-index automatically (1 or true to enable) |
ASTLLM_PERSIST |
0 |
Persist the index to ~/.astllm/{path}.json after every index, and pre-load it on startup (1 or true to enable) |
ANTHROPIC_API_KEY |
— | Enable Claude Haiku summaries |
GOOGLE_API_KEY |
— | Enable Gemini Flash summaries |
OPENAI_BASE_URL |
— | Enable local LLM summaries |
Legacy
JASTLLM_*variable names are also accepted for compatibility with the original Python version's indexes.
Telling Claude to use this MCP
By default Claude will use Grep/Glob/Read to explore code. To make it prefer the MCP tools, add the following to your project's CLAUDE.md:
## Code search
An astllm-mcp index is available for this project. Prefer MCP tools over Grep/Glob/Read for all code exploration:
- `search_symbols` — find functions, classes, methods by name (use this first)
- `get_file_outline` — list all symbols in a file before deciding to read it
- `get_repo_outline` — understand project structure without reading files
- `get_symbol` — read a specific function/class source (O(1), much cheaper than reading the file)
- `get_symbols` — batch-read multiple symbols in one call
- `search_text` — full-text search for strings, comments, config values
- `get_file_tree` — browse directory structure with symbol counts
Only fall back to Grep/Read when the MCP tools cannot cover the case (e.g. a file type not indexed by tree-sitter).
The repo identifier to pass to MCP tools is local/<folder-name> for locally indexed folders (e.g. local/src). Use list_repos if unsure.
Security
- Path traversal and symlink-escape protection
- Secret files excluded (
.env,*.pem,*.key, credentials, etc.) - Binary files excluded by extension and content sniffing
- File size limits enforced before reading
Single-file binaries (no Node.js required)
Uses Bun to produce self-contained executables. All JS and native tree-sitter .node addons are embedded — users just download and run, no npm install or Node.js needed.
Prerequisites: install Bun once (curl -fsSL https://bun.sh/install | bash), then:
npm run build:macosx-arm # → dist/astllm-mcp-macosx-arm (run on macOS ARM)
npm run build:linux-x86 # → dist/astllm-mcp-linux-x86 (run on Linux x86)
npm run build:linux-arm # → dist/astllm-mcp-linux-arm (run on Linux ARM)
Each build script must run on the matching platform. The grammar packages ship prebuilt
.nodefiles for all platforms, but thetree-sittercore is compiled from source on install.scripts/prep-bun-build.mjs(run automatically before each binary build) copies the compiled.nodeinto the location Bun expects. For CI, use a matrix — Linux x86 and Linux ARM can both build on Linux via Docker/QEMU; macOS ARM requires a macOS runner.
How it works: tree-sitter and all grammar packages support
bun build --compilevia a statically-analyzablerequire()path. Bun embeds the correct native addon for the target and extracts it to a temp directory on first run.
Development
npm run build # compile TypeScript → dist/
npm run dev # run directly with tsx (no compile step)
The project is TypeScript ESM. All local imports use .js extensions (TypeScript NodeNext resolution).
Storage layout
~/.code-index/
<owner>/
<repo>/
index.json # symbol index with byte offsets
files/ # raw file copies for byte-offset seeking
src/
auth.ts
...
_savings.json # cumulative token savings
Inspiration
This tool was inspired by: https://github.com/jgravelle/jcodemunch-mcp
I needed simplified distribution and a bunch of features this did not have.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。