MCP 服务器

Whissle MCP Server

一个基于 Python 的服务器，提供对 Whissle API 端点的访问，用于语音转文本、说话人分离、翻译和文本摘要。

Tools

speech_to_text

Convert speech to text with a given model and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. ⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: audio_file_path (str): Path to the audio file to transcribe model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER" timestamps (bool, optional): Whether to include word timestamps boosted_lm_words (List[str], optional): Words to boost in recognition boosted_lm_score (int, optional): Score for boosted words (0-100) output_directory (str, optional): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: TextContent with the transcription and path to the output file.

diarize_speech

Convert speech to text with speaker diarization and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. ⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: audio_file_path (str): Path to the audio file to transcribe model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER" max_speakers (int, optional): Maximum number of speakers to identify boosted_lm_words (List[str], optional): Words to boost in recognition boosted_lm_score (int, optional): Score for boosted words (0-100) output_directory (str, optional): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: TextContent with the diarized transcription and path to the output file.

translate_text

Translate text from one language to another. ⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: text (str): The text to translate source_language (str): Source language code (e.g., "en" for English) target_language (str): Target language code (e.g., "es" for Spanish) Returns: TextContent with the translated text.

summarize_text

Summarize text using an LLM model. ⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: content (str): The text to summarize model_name (str, optional): The LLM model to use. Defaults to "openai" instruction (str, optional): Specific instructions for summarization Returns: TextContent with the summary.

list_asr_models

List all available ASR models and their capabilities.

README

Whissle MCP 服务器

一个基于 Python 的服务器，提供对 Whissle API 端点的访问，用于语音转文本、说话人分离、翻译和文本摘要。

⚠️ 重要提示

此服务器提供对 Whissle API 端点的访问，这可能会产生费用
每个进行 API 调用的工具都标有费用警告
请遵循以下准则：
1. 仅在用户明确要求时才使用工具
2. 对于处理音频的工具，请考虑音频的长度，因为它会影响费用
3. 某些操作（如翻译或摘要）可能会产生更高的费用
4. 描述中没有费用警告的工具可以免费使用，因为它们只读取现有数据

前提条件

Python 3.8 或更高版本
pip (Python 包安装程序)
一个 Whissle API 身份验证令牌

安装

克隆存储库：

git clone <repository-url>
cd whissle_mcp

创建并激活虚拟环境：

python -m venv venv
source venv/bin/activate  # 在 Windows 上，使用：venv\Scripts\activate

安装所需的软件包：
```
pip install -e .
```
设置环境变量：在项目根目录中创建一个 .env 文件，内容如下：
```
WHISSLE_AUTH_TOKEN=insert_auth_token_here  # 替换为您的实际 Whissle API 令牌
WHISSLE_MCP_BASE_PATH=/path/to/your/base/directory
```
⚠️ 重要提示：切勿将您的实际令牌提交到存储库。 .env 文件包含在 .gitignore 中，以防止意外提交。

配置 Claude 集成：将 claude_config.example.json 复制到 claude_config.json 并更新路径：

{
    "mcpServers": {
        "Whissle": {
            "command": "/path/to/your/venv/bin/python",
            "args": [
                "/path/to/whissle_mcp/server.py"
            ],
            "env": {
                "WHISSLE_AUTH_TOKEN": "insert_auth_token_here"
            }
        }
    }
}

将 /path/to/your/venv/bin/python 替换为虚拟环境中 Python 解释器的实际路径
将 /path/to/whissle_mcp/server.py 替换为 server.py 文件的实际路径

配置

环境变量

WHISSLE_AUTH_TOKEN: 您的 Whissle API 身份验证令牌（必需）
- 这是一个敏感凭据，绝不应共享或提交到版本控制
- 请联系您的管理员以获取有效的令牌
- 将其安全地存储在您的本地 .env 文件中
WHISSLE_MCP_BASE_PATH: 文件操作的基本目录（可选，默认为用户的桌面）

支持的音频格式

服务器支持以下音频格式：

WAV (.wav)
MP3 (.mp3)
OGG (.ogg)
FLAC (.flac)
M4A (.m4a)

文件大小限制

最大文件大小：25 MB
大于此限制的文件将被拒绝

可用工具

1. 语音转文本

使用 Whissle API 将语音转换为文本。

response = speech_to_text(
    audio_file_path="path/to/audio.wav",
    model_name="en-NER",  # 默认模型
    timestamps=True,      # 包含单词时间戳
    boosted_lm_words=["specific", "terms"],  # 要在识别中提升的单词
    boosted_lm_score=80   # 提升单词的分数 (0-100)
)

2. 说话人分离

将语音转换为文本，并进行说话人识别。

response = diarize_speech(
    audio_file_path="path/to/audio.wav",
    model_name="en-NER",  # 默认模型
    max_speakers=2,       # 要识别的最大说话人数
    boosted_lm_words=["specific", "terms"],
    boosted_lm_score=80
)

3. 文本翻译

将文本从一种语言翻译成另一种语言。

response = translate_text(
    text="Hello, world!",
    source_language="en",
    target_language="es"
)

4. 文本摘要

使用 LLM 模型总结文本。

response = summarize_text(
    content="Long text to summarize...",
    model_name="openai",  # 默认模型
    instruction="Provide a brief summary"  # 可选
)

5. 列出 ASR 模型

列出所有可用的 ASR 模型及其功能。

response = list_asr_models()

响应格式

语音转文本和说话人分离

{
    "transcript": "转录的文本",
    "duration_seconds": 10.5,
    "language_code": "en",
    "timestamps": [
        {
            "word": "The",
            "startTime": 0,
            "endTime": 100,
            "confidence": 0.95
        }
    ],
    "diarize_output": [
        {
            "text": "转录的文本",
            "speaker_id": 1,
            "start_timestamp": 0,
            "end_timestamp": 10.5
        }
    ]
}

翻译

{
    "type": "text",
    "text": "Translation:\n此处为翻译后的文本"
}

摘要

{
    "type": "text",
    "text": "Summary:\n此处为摘要后的文本"
}

错误响应

{
    "error": "此处为错误消息"
}

错误处理

服务器包含强大的错误处理功能：

自动重试 HTTP 500 错误
针对不同故障场景的详细错误消息
文件验证（存在性、大小、格式）
身份验证检查

常见错误类型：

HTTP 500：服务器错误（带有重试机制）
HTTP 413：文件太大
HTTP 415：不支持的文件格式
HTTP 401/403：身份验证错误

运行服务器

启动服务器：
```
mcp serve
```
服务器将在默认的 MCP 端口（通常为 8000）上可用

测试

提供了一个测试脚本来验证所有工具的功能：

python test_whissle.py

测试脚本将：

检查身份验证令牌
测试所有可用的工具
提供每个操作的详细输出
优雅地处理错误

支持

如有问题或疑问，请：

检查错误消息以获取具体详细信息
验证您的身份验证令牌
确保您的音频文件符合要求
如有 API 相关问题，请联系 Whissle 支持

许可证

[在此处添加您的许可证信息]