MCP 服务器

semantic-code-mcp

A local MCP server that provides semantic code search for Python codebases using tree-sitter for chunking and LanceDB for vector storage. It enables natural language queries to find relevant code snippets based on meaning rather than just text matching.

README

semantic-code-mcp

MCP server that provides semantic code search for Claude Code. Instead of iterative grep/glob, it indexes your codebase with embeddings and returns ranked results by meaning.

Supports Python, Rust, and Markdown — more languages planned.

How It Works

Claude Code ──(MCP/STDIO)──▶ semantic-code-mcp server
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
              AST Chunker      Embedder        LanceDB
             (tree-sitter)  (sentence-trans)  (vectors)

Chunking — tree-sitter parses source files into functions, classes, methods, structs, traits, markdown sections, etc.
Embedding — sentence-transformers encodes each chunk (all-MiniLM-L6-v2, 384d)
Storage — vectors stored in LanceDB (embedded, like SQLite)
Search — hybrid semantic + keyword search with recency boosting

Indexing is incremental (mtime-based) and uses git ls-files for fast file discovery. The embedding model loads lazily on first query.

Installation

macOS / Windows

PyPI ships CPU-only torch on these platforms, so no extra flags are needed (~1.7GB install).

uvx semantic-code-mcp

Claude Code integration:

claude mcp add --scope user semantic-code -- uvx semantic-code-mcp

Linux

[!IMPORTANT] Without the --index flag, PyPI installs CUDA-bundled torch (~3.5GB). Unless you need GPU acceleration (you don't — embeddings run on CPU), use the command below to get the CPU-only build (~1.7GB).

uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp

Claude Code integration:

claude mcp add --scope user semantic-code -- \
  uvx --index pytorch-cpu=https://download.pytorch.org/whl/cpu semantic-code-mcp

<details> <summary>Claude Desktop / other MCP clients (JSON config)</summary>

{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["--index", "pytorch-cpu=https://download.pytorch.org/whl/cpu", "semantic-code-mcp"]
    }
  }
}

On macOS/Windows you can omit the --index and pytorch-cpu args.

</details>

Updating

uvx caches the installed version. To get the latest release:

uvx --upgrade semantic-code-mcp

Or pin a specific version in your MCP config:

claude mcp add --scope user semantic-code -- uvx semantic-code-mcp@0.2.0

MCP Tools

`search_code`

Search code by meaning, not just text matching. Auto-indexes on first search.

Parameter	Type	Default	Description
`query`	`str`	required	Natural language description of what you're looking for
`project_path`	`str`	required	Absolute path to the project root
`limit`	`int`	`10`	Maximum number of results

Returns ranked results with file_path, line_start, line_end, name, chunk_type, content, and score.

`index_codebase`

Index a codebase for semantic search. Only processes new and changed files unless force=True.

Parameter	Type	Default	Description
`project_path`	`str`	required	Absolute path to the project root
`force`	`bool`	`False`	Re-index all files regardless of changes

`index_status`

Check indexing status for a project.

Parameter	Type	Default	Description
`project_path`	`str`	required	Absolute path to the project root

Returns is_indexed, files_count, and chunks_count.

Configuration

All settings are environment variables with the SEMANTIC_CODE_MCP_ prefix (via pydantic-settings):

Variable	Default	Description
`SEMANTIC_CODE_MCP_CACHE_DIR`	`~/.cache/semantic-code-mcp`	Where indexes are stored
`SEMANTIC_CODE_MCP_LOCAL_INDEX`	`false`	Store index in `.semantic-code/` within each project
`SEMANTIC_CODE_MCP_EMBEDDING_MODEL`	`all-MiniLM-L6-v2`	Sentence-transformers model
`SEMANTIC_CODE_MCP_DEBUG`	`false`	Enable debug logging
`SEMANTIC_CODE_MCP_PROFILE`	`false`	Enable pyinstrument profiling

Pass environment variables via the env field in your MCP config:

{
  "mcpServers": {
    "semantic-code": {
      "command": "uvx",
      "args": ["semantic-code-mcp"],
      "env": {
        "SEMANTIC_CODE_MCP_DEBUG": "true",
        "SEMANTIC_CODE_MCP_LOCAL_INDEX": "true"
      }
    }
  }
}

Or with Claude Code CLI:

claude mcp add --scope user semantic-code \
  -e SEMANTIC_CODE_MCP_DEBUG=true \
  -e SEMANTIC_CODE_MCP_LOCAL_INDEX=true \
  -- uvx semantic-code-mcp

Tech Stack

Component	Choice	Rationale
MCP Framework	FastMCP	Python decorators, STDIO transport
Embeddings	sentence-transformers	Local, no API costs, good quality
Vector Store	LanceDB	Embedded (like SQLite), no server needed
Chunking	tree-sitter	AST-based, respects code structure

Development

uv sync                            # Install dependencies
uv run python -m semantic_code_mcp # Run server
uv run pytest                      # Run tests
uv run ruff check src/             # Lint
uv run ruff format src/            # Format

Pre-commit hooks enforce linting, formatting, type-checking (ty), security scanning (bandit), and Conventional Commits.

Releasing

Versions are derived from git tags automatically (hatch-vcs) — there's no hardcoded version in pyproject.toml.

git tag v0.2.0
git push origin v0.2.0

CI builds the package, publishes to PyPI, and creates a GitHub Release with auto-generated notes.

Adding a New Language

The chunker system is designed to make adding languages straightforward. Each language needs:

A tree-sitter grammar package (e.g. tree-sitter-javascript)
A chunker subclass that walks the AST and extracts meaningful chunks

Steps:

uv add tree-sitter-mylang

Create src/semantic_code_mcp/chunkers/mylang.py:

from enum import StrEnum, auto

import tree_sitter_mylang as tsmylang
from tree_sitter import Language, Node

from semantic_code_mcp.chunkers.base import BaseTreeSitterChunker
from semantic_code_mcp.models import Chunk, ChunkType


class NodeType(StrEnum):
    function_definition = auto()
    # ... other node types


class MyLangChunker(BaseTreeSitterChunker):
    language = Language(tsmylang.language())
    extensions = (".ml",)

    def _extract_chunks(self, root: Node, file_path: str, lines: list[str]) -> list[Chunk]:
        chunks = []
        for node in root.children:
            match node.type:
                case NodeType.function_definition:
                    name = node.child_by_field_name("name").text.decode()
                    chunks.append(self._make_chunk(node, file_path, lines, ChunkType.function, name))
                # ... other node types
        return chunks

from semantic_code_mcp.chunkers.mylang import MyLangChunker

def get_chunkers(self) -> list[BaseTreeSitterChunker]:
    return [PythonChunker(), RustChunker(), MarkdownChunker(), MyLangChunker()]

The CompositeChunker handles dispatch by file extension automatically. Use BaseTreeSitterChunker._make_chunk() for consistent chunk construction. See chunkers/python.py and chunkers/rust.py for complete examples.

Project Structure

src/semantic_code_mcp/chunkers/ — language chunkers (base.py, composite.py, python.py, rust.py, markdown.py)
src/semantic_code_mcp/services/ — IndexService (scan/chunk/index), SearchService (search + auto-index)
src/semantic_code_mcp/indexer.py — embed + store pipeline
docs/decisions/ — architecture decision records
TODO.md — epics and planning
CHANGELOG.md — completed work (Keep a Changelog format)
.claude/rules/ — context-specific coding rules for AI agents

License

MIT

semantic-code-mcp

README

semantic-code-mcp

How It Works

Installation

macOS / Windows

Linux

Updating

MCP Tools

search_code

index_codebase

index_status

Configuration

Tech Stack

Development

Releasing

Adding a New Language

Project Structure

License

推荐服务器

`search_code`

`index_codebase`

`index_status`