semantic-code-mcp

semantic-code-mcp

A local MCP server that provides semantic code search for Python codebases using tree-sitter for chunking and LanceDB for vector storage. It enables natural language queries to find relevant code snippets based on meaning rather than just text matching.

Category
访问服务器

README

semantic-code-mcp

MCP server that provides semantic code search for Claude Code. Instead of iterative grep/glob, it indexes your codebase with embeddings and returns ranked results by meaning.

Supports Python, Rust, and Markdown — more languages planned.

How It Works

Claude Code ──(MCP/STDIO)──▶ semantic-code-mcp server
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
              AST Chunker      Embedder        LanceDB
             (tree-sitter)  (sentence-trans)  (vectors)
  1. Chunking — tree-sitter parses source files into functions, classes, methods, structs, traits, markdown sections, etc.
  2. Embedding — sentence-transformers encodes each chunk (all-MiniLM-L6-v2, 384d)
  3. Storage — vectors stored in LanceDB (embedded, like SQLite)
  4. Search — hybrid semantic + keyword search with recency boosting

Indexing is incremental (mtime-based) and uses git ls-files for fast file discovery. The embedding model loads lazily on first query.

Installation

macOS / Windows

PyPI ships CPU-only torch on these platforms, so no extra flags are needed (~1.7GB install).

uvx semantic-code-mcp

Claude Code integration:

claude mcp add --scope user semantic-code -- uvx semantic-code-mcp

Linux

[!IMPORTANT] Without the --index flag, PyPI installs CUDA-bundled torch (~3.5GB). Unless you need GPU acceleration (you don't — embeddings run on CPU), use the command below to get the CPU-only build (~1.7GB).

uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp

Claude Code integration:

claude mcp add --scope user semantic-code -- \
  uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp

<details> <summary>Claude Desktop / other MCP clients (JSON config)</summary>

{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["--index", "pytorch-cpu=https://download.pytorch.org/whl/cpu", "semantic-code-mcp"]
    }
  }
}

On macOS/Windows you can omit the --index and pytorch-cpu args.

</details>

Updating

uvx caches the installed version. To get the latest release:

uvx --upgrade semantic-code-mcp

Or pin a specific version in your MCP config:

claude mcp add --scope user semantic-code -- uvx semantic-code-mcp@0.2.0

MCP Tools

search_code

Search code by meaning, not just text matching. Auto-indexes on first search.

Parameter Type Default Description
query str required Natural language description of what you're looking for
project_path str required Absolute path to the project root
limit int 10 Maximum number of results

Returns ranked results with file_path, line_start, line_end, name, chunk_type, content, and score.

index_codebase

Index a codebase for semantic search. Only processes new and changed files unless force=True.

Parameter Type Default Description
project_path str required Absolute path to the project root
force bool False Re-index all files regardless of changes

index_status

Check indexing status for a project.

Parameter Type Default Description
project_path str required Absolute path to the project root

Returns is_indexed, files_count, and chunks_count.

Configuration

All settings are environment variables with the SEMANTIC_CODE_MCP_ prefix (via pydantic-settings):

Variable Default Description
SEMANTIC_CODE_MCP_CACHE_DIR ~/.cache/semantic-code-mcp Where indexes are stored
SEMANTIC_CODE_MCP_LOCAL_INDEX false Store index in .semantic-code/ within each project
SEMANTIC_CODE_MCP_EMBEDDING_MODEL all-MiniLM-L6-v2 Sentence-transformers model
SEMANTIC_CODE_MCP_DEBUG false Enable debug logging
SEMANTIC_CODE_MCP_PROFILE false Enable pyinstrument profiling

Pass environment variables via the env field in your MCP config:

{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["semantic-code-mcp"],
      "env": {
        "SEMANTIC_CODE_MCP_DEBUG": "true",
        "SEMANTIC_CODE_MCP_LOCAL_INDEX": "true"
      }
    }
  }
}

Or with Claude Code CLI:

claude mcp add --scope user semantic-code \
  -e SEMANTIC_CODE_MCP_DEBUG=true \
  -e SEMANTIC_CODE_MCP_LOCAL_INDEX=true \
  -- uvx semantic-code-mcp

Tech Stack

Component Choice Rationale
MCP Framework FastMCP Python decorators, STDIO transport
Embeddings sentence-transformers Local, no API costs, good quality
Vector Store LanceDB Embedded (like SQLite), no server needed
Chunking tree-sitter AST-based, respects code structure

Development

uv sync                            # Install dependencies
uv run python -m semantic_code_mcp # Run server
uv run pytest                      # Run tests
uv run ruff check src/             # Lint
uv run ruff format src/            # Format

Pre-commit hooks enforce linting, formatting, type-checking (ty), security scanning (bandit), and Conventional Commits.

Releasing

Versions are derived from git tags automatically (hatch-vcs) — there's no hardcoded version in pyproject.toml.

git tag v0.2.0
git push origin v0.2.0

CI builds the package, publishes to PyPI, and creates a GitHub Release with auto-generated notes.

Adding a New Language

The chunker system is designed to make adding languages straightforward. Each language needs:

  1. A tree-sitter grammar package (e.g. tree-sitter-javascript)
  2. A chunker subclass that walks the AST and extracts meaningful chunks

Steps:

uv add tree-sitter-mylang

Create src/semantic_code_mcp/chunkers/mylang.py:

from enum import StrEnum, auto

import tree_sitter_mylang as tsmylang
from tree_sitter import Language, Node

from semantic_code_mcp.chunkers.base import BaseTreeSitterChunker
from semantic_code_mcp.models import Chunk, ChunkType


class NodeType(StrEnum):
    function_definition = auto()
    # ... other node types


class MyLangChunker(BaseTreeSitterChunker):
    language = Language(tsmylang.language())
    extensions = (".ml",)

    def _extract_chunks(self, root: Node, file_path: str, lines: list[str]) -> list[Chunk]:
        chunks = []
        for node in root.children:
            match node.type:
                case NodeType.function_definition:
                    name = node.child_by_field_name("name").text.decode()
                    chunks.append(self._make_chunk(node, file_path, lines, ChunkType.function, name))
                # ... other node types
        return chunks

Register it in src/semantic_code_mcp/container.py:

from semantic_code_mcp.chunkers.mylang import MyLangChunker

def get_chunkers(self) -> list[BaseTreeSitterChunker]:
    return [PythonChunker(), RustChunker(), MarkdownChunker(), MyLangChunker()]

The CompositeChunker handles dispatch by file extension automatically. Use BaseTreeSitterChunker._make_chunk() for consistent chunk construction. See chunkers/python.py and chunkers/rust.py for complete examples.

Project Structure

  • src/semantic_code_mcp/chunkers/ — language chunkers (base.py, composite.py, python.py, rust.py, markdown.py)
  • src/semantic_code_mcp/services/ — IndexService (scan/chunk/index), SearchService (search + auto-index)
  • src/semantic_code_mcp/indexer.py — embed + store pipeline
  • docs/decisions/ — architecture decision records
  • TODO.md — epics and planning
  • CHANGELOG.md — completed work (Keep a Changelog format)
  • .claude/rules/ — context-specific coding rules for AI agents

License

MIT

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选