MCP 服务器

MCPunk

好的，我明白了。这段文字翻译成中文是： “通过智能代码搜索与你的代码库进行对话，无需嵌入。通过将文件分解为逻辑块，为 LLM 提供搜索这些块的工具，并让它找到回答你的问题所需的特定代码。”

Tools

get_a_joke

Get a really funny joke! For testing :)

configure_project

Configure a new project containing files. Each file in the project is split into 'chunks' - logical sections like functions, classes, markdown sections, and import blocks. After configuring, a common workflow is: 1. list_all_files_in_project to get an overview of the project (with an initial limit on the depth of the search) 2. Find files by function/class definition: find_files_by_chunk_content(... ["def my_funk"]) 3. Find files by function/class usage: find_files_by_chunk_content(... ["my_funk"]) 4. Determine which chunks in the found files are relevant: find_matching_chunks_in_file(...) 5. Get details about the chunks: chunk_details(...) Use ~ (tilde) literally if the user specifies it in paths.

list_all_files_in_project

List all files in a project, returning a file tree. This is useful for getting an overview of the project, or specific subdirectories of the project. A project may have many files, so you are suggested to start with a depth limit to get an overview, and then continue increasing the depth limit with a filter to look at specific subdirectories.

find_files_by_chunk_content

Step 1: Find files containing chunks with matching text. Returns file tree only showing which files contain matches. You must use find_matching_chunks_in_file on each relevant file to see the actual matches. Example workflow: 1. Find files: files = find_files_by_chunk_content(project, ["MyClass"]) 2. For each file, find actual matches: matches = find_matching_chunks_in_file(file, ["MyClass"]) 3. Get content: content = chunk_details(file, match_id)

find_matching_chunks_in_file

Step 2: Find the actual matching chunks in a specific file. Required after find_files_by_chunk_content or list_all_files_in_project to see matches, as those tools only show files, not their contents. This can be used for things like: - Finding all chunks in a file that make reference to a specific function (e.g. find_matching_chunks_in_file(..., ["my_funk"]) - Finding a chunk where a specific function is defined (e.g. find_matching_chunks_in_file(..., ["def my_funk"]) Some chunks are split into multiple parts, because they are too large. This will look like 'chunkx_part1', 'chunkx_part2', ...

chunk_details

Get full content of a specific chunk. Returns chunk content as string. Common patterns: 1. Final step after find_matching_chunks_in_file finds relevant chunks 2. Examining implementations after finding definitions/uses

list_most_recently_checked_out_branches

List the n most recently checked out branches in the project

diff_with_ref

Return a summary of the diff between HEAD and the given ref. You probably want the ref to be the 'base' branch like develop or main, off which PRs are made - and you can likely determine this by viewing the most recently checked out branches.

README

MCPunk 🤖

无需嵌入，通过为 LLM 提供智能搜索代码的工具，即可与你的代码库进行对话。

MCPunk 让你通过对话探索和理解代码库。它的工作原理是：

将文件分解为逻辑块（函数、类、markdown 章节）
为 LLM 提供搜索和查询这些块的工具
让 LLM 找到回答你问题所需的特定代码

无需嵌入，无需复杂的配置 - 只有清晰、可审计的搜索，你可以看到并指导。它与 Claude Desktop 或任何其他 MCP 客户端配合使用效果很好。

GitHub 仓库

构建时考虑了以下几点：

上下文为王 - LLM 可能很棒，但前提是提供了适当的上下文。
上下文是宝贵的 - LLM 需要上下文，但它们无法处理太多。太可惜了！ MCPunk 是一种 RAG，它本质上为 LLM 提供上下文提示，允许 LLM 真正缩小到仅相关的内容。
人在回路中 - 你可以准确地看到 LLM 考虑了哪些数据以及它是如何找到的，你可以跳入聊天并随心所欲地指导事情。

设置

这些是 Claude Desktop 的说明，但 MCPunk 可以在任何使用 MCP 的地方使用。

安装 uv
将下面的代码片段放入你的 claude_desktop_config.json 中 （有关 claude_desktop_config.json 的详细信息，包括位置）
重启 Claude Desktop，你应该会在短暂的延迟后看到可用的工具，如下面的屏幕截图所示
开始聊天：“嘿，伙计，你能设置 ~/git/mcpunk 项目并告诉我 python 文件是如何分块的吗”

（"command": "uvx", 可能不起作用，你可能需要使用例如 "command": "/Users/michael/.local/bin/uvx",）

{
  "mcpServers": {
    "MCPunk": {
      "command": "uvx",
      "args": ["mcpunk"]
    }
  }
}

使用示例

回答有关你的代码库的问题

[用户] 嘿，伙计，你能设置 ~/git/mcpunk 仓库，然后帮我了解 mcpunk 的用户如何获取他们使用 mcpunk 配置的仓库中当前检出分支的 git diff
[Claude] 调用 configure_project
[Claude] 调用 list_all_files_in_project
[Claude] 调用 find_files_by_chunk_content
[Claude] ... 使用 MCPunk 总共进行了 21 次工具调用。
[Claude] 现在我了解了 mcpunk 中 git diff 功能的工作原理。让我总结一下用户如何获取使用 mcpunk 配置的仓库中当前检出分支的 git diff：要获取使用 mcpunk 配置的仓库中当前检出分支的 git diff：
1. 首先，用户需要使用 configure_project 工具配置 mcpunk 以使其与他们的项目一起工作
2. 查看最近检出了哪些分支（以确定要与哪个分支进行比较）
3. 然后，获取当前分支 (HEAD) 和参考分支之间的差异这将显示当前分支 (HEAD) 和指定的参考分支之间的差异。 ref 参数应该是你想要比较的基础分支的名称（例如“main”或“develop”）。
[用户] 干得好！

PR 审查

[用户] 嘿，伙计，你能设置 ~/git/mcpunk 仓库并查看一下当前分支与 scratch/1.5 的比较吗
[Claude] 设置 ~/git/mcpunk 并为 ref scratch/1.5 调用 diff_with_ref
[Claude] 我看到你做了一些更改，主要是添加了一个新的测试文件 tests/test_git_analysis.py...
[用户] 这如何融入代码库的其余部分？
[Claude] 进行 5 次工具调用，用于按块列出文件、列出块、获取块详细信息
[Claude] 我明白了 - 该代码解析 git reflog 输出以跟踪最近检出的分支。 main 函数 get_recent_branches 接受一个仓库路径并返回最后 N 个检出的分支，使用 _branches_from_reflog 作为辅助函数来解析原始 reflog 输出。
[用户] 如果你时间紧迫，必须指导某人审查这项工作，你会指导他们看什么？请概述离散的工作内容
[Claude]
- _branches_from_reflog 中的 split("to ")[-1] 看起来很脆弱
- 测试数据仅显示简单的分支名称，空格/特殊字符呢？
- example_reflog 是否代表真实的 git reflog 输出？
- 我们是否针对实际的 git 行为进行测试？
[用户] 你是最棒的！

跨多个代码库工作

你可以直接要求你的 LLM 设置多个项目，它可以自由地跨项目进行查询。如果一个项目依赖于另一个项目，并且它们位于不同的仓库中，这将非常方便。在这种情况下，LLM 应该通过导入来识别这一点。

什么是 MCPunk & 其他背景

MCPunk 是一个 MCP 服务器，它提供以下工具：

配置一个项目，该项目是一个文件目录。配置后，文件将被分成逻辑块。 MCPunk 是为代码构建的，但实际上它可以应用于任何文档，甚至如果你愿意，也可以应用于图像。
搜索项目中包含特定文本的文件
搜索文件中包含特定文本的块
查看特定块的完整内容

除此之外，它还提供了一些内置的分块器。最成熟的是 Python 分块器。

MCPunk 不必用于对话。例如，它可以作为 CI 管道中代码审查的一部分使用。它实际上是通用的 RAG。

sequenceDiagram
    participant User
    participant Claude as Claude Desktop
    participant MCPunk as MCPunk Server
    participant Files as File System

    Note over User,Files: Setup Phase
    User->>Claude: Ask question about codebase
    Claude->>MCPunk: configure_project(root_path, project_name)
    MCPunk->>Files: Scan files in root directory

    Note over MCPunk,Files: Chunking Process
    MCPunk->>MCPunk: For each file, apply appropriate chunker:
    MCPunk->>MCPunk: - PythonChunker: functions, classes, imports
    MCPunk->>MCPunk: - MarkdownChunker: sections by headings
    MCPunk->>MCPunk: - VueChunker: template/script/style sections
    MCPunk->>MCPunk: - WholeFileChunker: fallback
    MCPunk->>MCPunk: Split chunks >10K chars into parts

    MCPunk-->>Claude: Project configured with N files

    Note over User,Files: Navigation Phase<br>(LLM freely uses all these tools repeatedly to drill in)
    Claude->>MCPunk: list_all_files_in_project(project_name)
    MCPunk-->>Claude: File tree structure

    Claude->>MCPunk: find_files_by_chunk_content(project_name, "search term")
    MCPunk-->>Claude: Files containing matching chunks

    Claude->>MCPunk: find_matching_chunks_in_file(project_name, file_path, "search term")
    MCPunk-->>Claude: List of matching chunk IDs in file

    Claude->>MCPunk: chunk_details(chunk_id)
    MCPunk-->>Claude: Full content of specific chunk

    Claude->>User: Answer based on relevant code chunks

    Note over User,Files: Optional Git Analysis
    Claude->>MCPunk: list_most_recently_checked_out_branches(project_name)
    MCPunk->>Files: Parse git reflog
    MCPunk-->>Claude: List of recent branches

    Claude->>MCPunk: diff_with_ref(project_name, "main")
    MCPunk->>Files: Generate git diff
    MCPunk-->>Claude: Diff between HEAD and reference

Roaming RAG 速成课程

见

https://arcturus-labs.com/blog/2024/11/21/roaming-rag--make-the-model-find-the-answers/
https://simonwillison.net/2024/Dec/6/roaming-rag/

Roaming RAG 的要点是

将内容（代码库、pdf 文件等）分解为“块”。每个块都是一个“小”的逻辑项，例如函数、markdown 文档中的章节或代码文件中的所有导入。
为 LLM 提供搜索块的工具。 MCPunk 通过提供以下工具来实现这一点：搜索包含具有特定文本的块的文件，并列出特定块的完整内容。

与更传统的“向量搜索”RAG 相比：

LLM 必须向下钻取才能找到块，并且自然地意识到它们的更广泛的上下文（例如它们所在的哪个文件）
块应始终是连贯的。就像一个完整的函数。
你可以准确地看到 LLM 正在搜索什么，并且通常很明显它是否搜索得很差，你可以通过建议改进的搜索词来帮助它。
需要精确搜索匹配。 MCPunk 不提供任何类型的模糊搜索。

块

块是文件的子部分。例如，

单个 python 函数
markdown 章节
来自 Python 文件的所有导入

块由分块器从文件创建，并且 MCPunk 附带了一些内置的分块器。

在 MCPunk 中设置项目时，它会遍历所有文件并应用第一个适用的分块器。然后，LLM 可以使用工具 (1) 查询包含具有特定文本的文件，(2) 查询特定文件中的所有块，以及 (3) 获取块的完整内容。

这个基本基础使 claude 能够有效地导航相对较大的代码库，从广泛搜索相关文件开始，然后缩小到相关区域。

内置分块器：

PythonChunker 将事物分块为类、函数、文件级导入和文件级语句（例如全局变量）。适用于以 .py 结尾的文件
VueChunker 分块为“template”、“script”、“style”块 - 或任何存在的顶级 <blah>....</blah> 项。适用于以 .vue 结尾的文件
MarkdownChunker 将事物分块为 markdown 章节（按标题）。适用于以 .md 结尾的文件
WholeFileChunker 备用分块器，为整个文件创建一个块。适用于任何文件。

任何超过 10k 个字符的块（可配置）都会自动拆分为多个块，名称后缀为 part1、part2 等。这有助于避免耗尽上下文，同时仍然允许合理地导航块。

自定义分块器

每种类型的文件（例如 Python 与 C）都需要一个自定义分块器。 MCPunk 附带了一些内置的。如果没有特定的分块器与文件匹配，则使用一个默认分块器，该分块器只是将整个文件放入一个块中。

当前建议添加块的方式是 fork 此项目并添加它们，并按照开发运行 MCPunk。要添加分块器

在 file_chunkers.py 中添加它，继承自 BaseChunker
将其添加到 file_breakdown.py 中的 ALL_CHUNKERS

可以为模块实现某种插件系统，以声明它们具有 MCPunk 要使用的自定义分块器，例如 pytest 的插件系统，但目前没有计划实现这一点（除非有人想这样做）。

限制

有时 LLM 不擅长搜索。例如，搜索“dependency”，缺少术语“dependencies”。有改进的空间。
有时 LLM 会尝试找到一段特定的关键代码，但未能找到它，然后继续，而不承认它具有有限的上下文意识。
“大型”项目没有经过充分的测试。一个包含约 1000 个 Python 文件总共约 25 万行代码的项目运行良好。设置项目需要约 5 秒。随着代码库大小的增加，执行初始分块的时间将会增加，并且可能需要更复杂的搜索。该代码通常不是考虑到大型代码库而编写的 - 你会看到诸如所有数据存储在内存中、通过迭代所有数据来完成搜索、各种需要基本优化的东西。
小型项目可能最好将所有代码连接起来并放入上下文中。 MCPunk 实际上只适用于这种情况不切实际的情况。
在某些情况下，显然最好允许 LLM 获取整个文件，而不是让它一次挑选一个块。 MCPunk 没有机制来实现这一点。在实践中，我发现这不是一个大问题。

配置

可以通过以 MCPUNK_ 为前缀的环境变量配置各种内容。有关可用选项，请参阅 settings.py - 这些选项通过 Pydantic Settings 从环境变量加载。

例如，要配置 include_chars_in_response 选项：

{
  "mcpServers": {
    "MCPunk": {
      "command": "uvx",
      "args": ["mcpunk"],
      "env": {
        "MCPUNK_INCLUDE_CHARS_IN_RESPONSE": "false"
      }
    }
  }
}

路线图 & 开发状态

MCPunk 被认为接近功能完成。它还没有得到广泛使用，作为用户，你很可能会遇到错误或粗糙的边缘。欢迎在 https://github.com/jurasofish/mcpunk/issues 上提交错误报告

路线图想法

添加一堆提示来帮助使用 MCPunk。如果没有真正的“向外星人解释如何制作煎饼”类型的提示，事情确实会有点平淡。
在提取 python 模块级语句时包含模块级注释。
可能用于搜索的词干提取
更改整个“项目”概念，使其不需要文件实际存在 - 这导致允许项目内部的“虚拟”文件。
- 考虑将文件从具有路径更改为具有 URI，因此可以像 file://... / http[s]:// / gitdiff:// / 等任意 URI
分块 git diff。目前，有一个工具可以获取整个 diff。这可能非常大。相反，可以将该工具更改为 add_diff_to_project 并将其文件放在 gitdiff:// URI 下或某个虚假路径下
缓存项目，因此每次你重新启动 MCP 客户端时，它不需要重新解析所有文件。这可能很棘手，因为对分块器中代码的更改将使缓存无效。可能不会优先考虑，因为它对我来说不是那么慢。
允许用户提供自定义代码来执行分块，也许类似于 pytest 插件
类似于 tree sitter 的东西可能可以用于更通用的分块器
跟踪发送/接收的字符，最好按聊天记录。
按聊天记录记录状态、日志等

开发

请参阅 run_mcp_server.py。

如果你像下面这样设置 claude desktop，那么你可以重新启动它以查看最新的更改，因为你从本地版本的仓库中处理 MCPunk。

{
  "mcpServers": {
    "MCPunk": {
      "command": "/Users/michael/.local/bin/uvx",
      "args": [
        "--from",
        "/Users/michael/git/mcpunk",
        "--no-cache",
        "mcpunk"
      ]
    }
  }
}

测试、Linting、CI

请参阅 Makefile 和 github actions 工作流程。