astllm-mcp

astllm-mcp

An MCP server for efficient code indexing and symbol retrieval using tree-sitter AST parsing to fetch specific functions or classes without loading entire files. It significantly reduces AI token costs by providing O(1) byte-offset access to code components across multiple programming languages.

Category
访问服务器

README

astllm-mcp

MCP server for efficient code indexing and symbol retrieval. Index GitHub repos or local folders once with tree-sitter AST parsing, then let AI agents retrieve only the specific symbols they need — instead of loading entire files.

Simple 1 file binary distribution for trivial deployments.

Cut code-reading token costs by up to 99%.

How it works

  1. Index — fetch source files, parse ASTs with tree-sitter, store symbols with byte offsets
  2. Explore — browse file trees and outlines without touching file content
  3. Retrieve — fetch only the exact function/class/method you need via O(1) byte-offset seek
  4. Savings — every response reports tokens saved vs loading raw files

The index is stored locally in ~/.code-index/ (configurable). Incremental re-indexing only re-parses changed files.

The server automatically indexes the working directory on startup (incremental, non-blocking). Optionally set ASTLLM_WATCH=1 to also watch for file changes and re-index automatically.

Supported languages

Python, JavaScript, TypeScript, TSX, Go, Rust, Java, PHP, Dart, C#, C, C++

Installation

Option 1: Download a pre-built binary (recommended)

Download the binary for your platform from the GitHub Releases page:

Platform File
macOS ARM (M1/M2/M3) astllm-mcp-macosx-arm
Linux x86-64 astllm-mcp-linux-x86
Linux ARM64 astllm-mcp-linux-arm
# Example for Linux x86-64
curl -L https://github.com/tluyben/astllm-mcp/releases/latest/download/astllm-mcp-linux-x86 -o astllm-mcp
chmod +x astllm-mcp
./astllm-mcp   # runs as an MCP stdio server

No Node.js, no npm, no build tools required.

Option 2: Build from source

Requires Node.js 18+ and a C++20-capable compiler (for tree-sitter native bindings).

git clone https://github.com/tluyben/astllm-mcp
cd astllm-mcp
CXXFLAGS="-std=c++20" npm install --legacy-peer-deps
npm run build

Note on Node.js v22+: The CXXFLAGS="-std=c++20" flag is required because Node.js v22+ v8 headers mandate C++20. The --legacy-peer-deps flag is needed because tree-sitter grammar packages target slightly different tree-sitter core versions.

MCP client configuration

Claude Code

Option A — claude mcp add CLI (easiest):

# Pre-built binary, project-scoped (.mcp.json)
claude mcp add astllm /path/to/astllm-mcp-linux-x86 --scope project

# Pre-built binary, user-scoped (~/.claude.json)
claude mcp add astllm /path/to/astllm-mcp-linux-x86 --scope user

# From source (Node.js), project-scoped
claude mcp add astllm node --args /path/to/astllm-mcp/dist/index.js --scope project

Option B — manual JSON config:

Add to ~/.claude.json (global) or .mcp.json in your project root (project-scoped):

Pre-built binary:

{
  "mcpServers": {
    "astllm": {
      "command": "/path/to/astllm-mcp-linux-x86",
      "type": "stdio"
    }
  }
}

From source (Node.js):

{
  "mcpServers": {
    "astllm": {
      "command": "node",
      "args": ["/path/to/astllm-mcp/dist/index.js"],
      "type": "stdio"
    }
  }
}

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

Pre-built binary:

{
  "mcpServers": {
    "astllm": {
      "command": "/path/to/astllm-mcp-macosx-arm"
    }
  }
}

From source (Node.js):

{
  "mcpServers": {
    "astllm": {
      "command": "node",
      "args": ["/path/to/astllm-mcp/dist/index.js"]
    }
  }
}

Tools

Indexing

index_repo

Index a GitHub repository. Fetches source files via the GitHub API, parses ASTs, stores symbols locally.

repo_url            GitHub URL or "owner/repo" slug
generate_summaries  Generate one-line AI summaries (requires API key, default: false)
incremental         Only re-index changed files (default: true)
storage_path        Custom storage directory

index_folder

Index a local folder recursively.

folder_path             Path to index
generate_summaries      AI summaries (default: false)
extra_ignore_patterns   Additional gitignore-style patterns
follow_symlinks         Follow symlinks (default: false)
incremental             Only re-index changed files (default: true)
storage_path            Custom storage directory

Navigation

list_repos

List all indexed repositories with file count, symbol count, and last-indexed time.

get_repo_outline

High-level overview: directory breakdown, language distribution, symbol kind counts.

repo    Repository identifier ("owner/repo" or short name if unique)

get_file_tree

File and directory structure with per-file language and symbol count. Much cheaper than reading files.

repo             Repository identifier
path_prefix      Filter to a subdirectory
include_summaries  Include per-file summaries

get_file_outline

All symbols in a file as a hierarchical tree (methods nested under their class).

repo       Repository identifier
file_path  File path relative to repo root

Retrieval

get_symbol

Full source code for a single symbol, retrieved by byte-offset seek (O(1)).

repo          Repository identifier
symbol_id     Symbol ID from get_file_outline or search_symbols
verify        Check content hash for drift detection (default: false)
context_lines Lines of context around the symbol (0–50, default: 0)

get_symbols

Batch retrieval of multiple symbols in one call.

repo        Repository identifier
symbol_ids  Array of symbol IDs

Search

search_symbols

Search symbols by name, kind, language, or file pattern. Returns signatures and summaries — no source loaded until you call get_symbol.

repo          Repository identifier
query         Search query
kind          Filter: function | class | method | type | constant | interface
file_pattern  Glob pattern, e.g. "src/**/*.ts"
language      Filter by language
limit         Max results 1–100 (default: 50)

search_text

Full-text search across indexed file contents. Useful for string literals, comments, config values.

repo          Repository identifier
query         Case-insensitive substring
file_pattern  Glob pattern to restrict files
limit         Max matching lines (default: 100)

Cache

invalidate_cache

Delete a repository's index, forcing full re-index on next operation.

repo    Repository identifier

Symbol IDs

Symbol IDs have the format file/path::qualified.Name#kind, for example:

src/auth/login.ts::AuthService.login#method
src/utils.go::parseURL#function
lib/models.py::User#class

Get IDs from get_file_outline or search_symbols, then pass them to get_symbol.

Token savings

Every response includes a _meta envelope:

{
  "_meta": {
    "timing_ms": 2.1,
    "tokens_saved": 14823,
    "total_tokens_saved": 89412,
    "cost_avoided_claude_usd": 0.222345,
    "cost_avoided_gpt_usd": 0.148230,
    "total_cost_avoided_claude_usd": 1.34118
  }
}

AI summaries (optional)

Set one of these environment variables to enable one-line symbol summaries:

# Anthropic Claude Haiku (recommended)
export ANTHROPIC_API_KEY=sk-ant-...

# Google Gemini Flash
export GOOGLE_API_KEY=...

# OpenAI-compatible (Ollama, etc.)
export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_MODEL=llama3

Summaries use a three-tier fallback: docstring first-line → AI → signature.

Environment variables

Variable Default Description
CODE_INDEX_PATH ~/.code-index Index storage directory
GITHUB_TOKEN GitHub API token (higher rate limits, private repos)
ASTLLM_MAX_INDEX_FILES 500 Max files to index per repo
ASTLLM_MAX_FILE_SIZE_KB 500 Max file size to index (KB)
ASTLLM_LOG_LEVEL warn Log level: debug, info, warn, error
ASTLLM_LOG_FILE Log to file instead of stderr
ASTLLM_WATCH 0 Watch working directory for source file changes and re-index automatically (1 or true to enable)
ASTLLM_PERSIST 0 Persist the index to ~/.astllm/{path}.json after every index, and pre-load it on startup (1 or true to enable)
ANTHROPIC_API_KEY Enable Claude Haiku summaries
GOOGLE_API_KEY Enable Gemini Flash summaries
OPENAI_BASE_URL Enable local LLM summaries

Legacy JASTLLM_* variable names are also accepted for compatibility with the original Python version's indexes.

Telling Claude to use this MCP

By default Claude will use Grep/Glob/Read to explore code. To make it prefer the MCP tools, add the following to your project's CLAUDE.md:

## Code search

An astllm-mcp index is available for this project. Prefer MCP tools over Grep/Glob/Read for all code exploration:

- `search_symbols` — find functions, classes, methods by name (use this first)
- `get_file_outline` — list all symbols in a file before deciding to read it
- `get_repo_outline` — understand project structure without reading files
- `get_symbol` — read a specific function/class source (O(1), much cheaper than reading the file)
- `get_symbols` — batch-read multiple symbols in one call
- `search_text` — full-text search for strings, comments, config values
- `get_file_tree` — browse directory structure with symbol counts

Only fall back to Grep/Read when the MCP tools cannot cover the case (e.g. a file type not indexed by tree-sitter).

The repo identifier to pass to MCP tools is local/<folder-name> for locally indexed folders (e.g. local/src). Use list_repos if unsure.

Security

  • Path traversal and symlink-escape protection
  • Secret files excluded (.env, *.pem, *.key, credentials, etc.)
  • Binary files excluded by extension and content sniffing
  • File size limits enforced before reading

Single-file binaries (no Node.js required)

Uses Bun to produce self-contained executables. All JS and native tree-sitter .node addons are embedded — users just download and run, no npm install or Node.js needed.

Prerequisites: install Bun once (curl -fsSL https://bun.sh/install | bash), then:

npm run build:macosx-arm   # → dist/astllm-mcp-macosx-arm  (run on macOS ARM)
npm run build:linux-x86    # → dist/astllm-mcp-linux-x86   (run on Linux x86)
npm run build:linux-arm    # → dist/astllm-mcp-linux-arm   (run on Linux ARM)

Each build script must run on the matching platform. The grammar packages ship prebuilt .node files for all platforms, but the tree-sitter core is compiled from source on install. scripts/prep-bun-build.mjs (run automatically before each binary build) copies the compiled .node into the location Bun expects. For CI, use a matrix — Linux x86 and Linux ARM can both build on Linux via Docker/QEMU; macOS ARM requires a macOS runner.

How it works: tree-sitter and all grammar packages support bun build --compile via a statically-analyzable require() path. Bun embeds the correct native addon for the target and extracts it to a temp directory on first run.

Development

npm run build   # compile TypeScript → dist/
npm run dev     # run directly with tsx (no compile step)

The project is TypeScript ESM. All local imports use .js extensions (TypeScript NodeNext resolution).

Storage layout

~/.code-index/
  <owner>/
    <repo>/
      index.json        # symbol index with byte offsets
      files/            # raw file copies for byte-offset seeking
        src/
          auth.ts
          ...
  _savings.json         # cumulative token savings

Inspiration

This tool was inspired by: https://github.com/jgravelle/jcodemunch-mcp

I needed simplified distribution and a bunch of features this did not have.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选