MCP 服务器

kokoro-tts-mcp

Text-to-speech MCP server using the Kokoro-82M model accelerated with MLX on Apple Silicon, enabling local Claude and Codex clients to speak text aloud and convert text to audio.

README

kokoro-tts-mcp

Text-to-speech using the Kokoro-82M model, accelerated with MLX on Apple Silicon. Works three ways:

MCP server — gives local Claude and Codex clients (Claude Chat/Code/Cowork, Codex App, Codex CLI) the ability to speak text aloud and convert text to audio.
ChatGPT Mac App — supported via kokoro-clipboard + Keyboard Maestro workaround (not MCP-native yet).
Command-line tools — kokoro and kokoro-clipboard commands for use in scripts, the terminal, or piped workflows

Both share the same generation engine and playback code, so pause/stop controls (via Stream Deck, hotkeys, etc.) work identically regardless of how audio was started.

The MCP server lazy-loads the model on first use and keeps it resident in memory (~600 MB), so subsequent requests start instantly. The CLI loads the model fresh each invocation (~3s startup), which is negligible for longer text.

Requirements

macOS on Apple Silicon (M1/M2/M3/M4)
Python 3.12 (not 3.13+ due to spacy/pydantic incompatibility)
espeak (brew install espeak)
ffmpeg (optional, only needed for MP3 export)

Setup

git clone https://github.com/scottschram/kokoro-tts-mcp.git
cd kokoro-tts-mcp

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

After installing, download the spaCy English model:

python -m spacy download en_core_web_sm

Usage

Command Line (`kokoro`)

kokoro "Hello, world."                         # play immediately
cat article.txt | kokoro                       # pipe input
kokoro -v bm_fable "Good morning, London."     # British male voice
kokoro -f article.txt -o article.wav           # save to WAV
kokoro -f article.txt --mp3                    # save as MP3 to /tmp
kokoro -o talk.wav -p "Hello"                  # save AND play
kokoro -s 1.3 "A bit faster."                 # speed adjustment
kokoro -v list                                 # show all voices
kokoro -h                                      # full help

Playback via the MCP speak() tool: text ~2500 words or less starts within a few seconds; beyond that, first-audio latency grows roughly linearly with text size (~3 min at 3000 words, ~4 min at 5000). The delay sits in the MCP client's tool-call dispatch — not in the Kokoro pipeline, which streams audio within seconds at any size when driven via the CLI or a direct Python import. For long reads, use the CLI: kokoro -f file.txt -o file.wav (play with your preferred audio player) or cat file.txt | kokoro. Pause and stop work at any point during playback. See CLAUDE.md for the bisection.

To make kokoro available globally, symlink it:

ln -sf /path/to/kokoro-tts-mcp/kokoro ~/bin/kokoro

Command Line (`kokoro-clipboard`)

kokoro-clipboard                                # speak current clipboard
kokoro-clipboard --dry-run                      # preview cleaned speech text
kokoro-clipboard --silent-nontext               # do not speak non-text clipboard
kokoro-clipboard --raw                          # skip markdown cleanup
kokoro-clipboard --max-chars 20000              # character cap before truncation
kokoro-clipboard --text "[kokoro]Hello[/kokoro]" --dry-run

kokoro-clipboard reads the current macOS clipboard and speaks it with markdown cleanup. If [kokoro]...[/kokoro] markers are present, only the text between markers is spoken. If markers are absent, the full clipboard text is spoken.

If clipboard content is non-text (image/PDF/file/URL), it speaks a short type message unless --silent-nontext is used.

Arguments:

Argument	Description
`-v`, `--voice`	Voice name (default: `af_heart`)
`-s`, `--speed`	Speed multiplier (default: `1.0`)
`--kokoro-cmd`	Command/path used to invoke `kokoro`
`--raw`	Skip markdown cleanup
`--silent-nontext`	Exit without speaking when clipboard is non-text
`--max-chars`	Character cap before truncation (default: `20000`)
`--dry-run`	Print final text instead of speaking
`--text`	Use provided text instead of reading clipboard

To make kokoro-clipboard available globally, symlink it:

ln -sf /path/to/kokoro-tts-mcp/kokoro-clipboard ~/bin/kokoro-clipboard

Keyboard Maestro (ChatGPT Mac workaround)

If ChatGPT Mac does not have MCP support for your account/workflow, you can still get spoken output by triggering kokoro-clipboard from Keyboard Maestro.

Create a new Keyboard Maestro macro group limited to ChatGPT (com.openai.chat).
Create a macro named Speak Clipboard.
Set trigger: The clipboard changes.
Add action: If Then Else with If All Conditions Met:
- The clipboard contains [kokoro]
- The clipboard contains [/kokoro]
In the Then branch, add action: Execute Shell Script.
Configure shell script:
- Shell: /bin/zsh
- Input: None
- Script:

~/bin/kokoro-clipboard

Optional variants:

~/bin/kokoro-clipboard --silent-nontext
~/bin/kokoro-clipboard -v bm_fable -s 1.1

Usage notes:

This If Then Else setup is marker-only: it speaks only when both markers exist.
Inside the copied text, kokoro-clipboard speaks only the text between [kokoro]...[/kokoro].
If you remove the If Then Else gate, kokoro-clipboard will speak any copied ChatGPT text.
Non-text clipboard items (images/files/PDF) are announced unless --silent-nontext is set.

MCP Server (Claude Code)

claude mcp add kokoro-tts -- \
    /path/to/kokoro-tts-mcp/.venv/bin/python3.12 \
    /path/to/kokoro-tts-mcp/mcp_server.py

Then in Claude Code, you can ask Claude to speak:

"Say hello" "Read that summary aloud using the British male voice bm_george" "Save that explanation as an MP3"

MCP Server (Claude Desktop — Chat / Cowork)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "kokoro-tts": {
      "command": "/path/to/kokoro-tts-mcp/.venv/bin/python3.12",
      "args": ["/path/to/kokoro-tts-mcp/mcp_server.py"]
    }
  }
}

Restart the Claude app after editing.

MCP Server (Codex CLI)

codex mcp add kokoro-tts -- \
    /path/to/kokoro-tts-mcp/.venv/bin/python3.12 \
    /path/to/kokoro-tts-mcp/mcp_server.py

Then in Codex CLI, you can ask Codex to speak:

"Say hello" "Read that summary aloud using the British male voice bm_george" "Save that explanation as an MP3"

MCP Server (Codex Mac App)

Codex Mac App and Codex CLI share the same global Codex config (~/.codex/config.toml). After registering kokoro-tts with codex mcp add ... in a terminal, restart the Codex app.

Smoke Test

A quick test script to verify the TTS pipeline without MCP or the full CLI:

./test-tts                          # default test phrase
./test-tts "Custom text"            # speak custom text
./test-tts "Cheerio" bm_fable       # specify voice

Tools

Tool	Description
`speak(text, voice?, speed?)`	Play text aloud (non-blocking, returns immediately)
`pause()`	Pause current playback
`resume()`	Resume paused playback
`stop()`	Stop playback immediately
`status()`	Return current state: `idle`, `playing`, or `paused`
`user_stop_requested()`	Check if the user stopped playback externally (returns `True` once, then clears)
`speak_and_save(text, output_path?, voice?, speed?, mp3?)`	Generate and save audio to a file
`list_voices()`	List all available voices

Voices

28 English voices are available. The naming convention is: first letter = accent (a = American, b = British), second letter = gender (f = female, m = male).

American Female: af_heart (default), af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky

American Male: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa

British Female: bf_alice, bf_emma, bf_isabella, bf_lily

British Male: bm_daniel, bm_fable, bm_george, bm_lewis

Playback Control

Two shell scripts control playback from outside Claude (e.g., via Stream Deck, Keyboard Maestro, or a hotkey). They work with both the MCP server and the CLI — whichever is currently playing:

kokoro-pause — Toggle pause/resume. Also supports kokoro-pause pause, kokoro-pause resume, and kokoro-pause status.
kokoro-stop — Stop playback immediately and discard audio.

These work by creating/removing sentinel files (/tmp/kokoro-tts-pause, /tmp/kokoro-tts-stop) that the playback loop monitors.

Multi-Segment Playback

When Claude plays multiple segments sequentially (e.g., reading a list of items one by one), it polls status() until idle before starting the next segment. If the user stops playback externally (via kokoro-stop, Stream Deck, etc.), user_stop_requested() returns True once, signaling Claude to skip remaining segments instead of immediately starting the next one. The MCP stop() tool does not set this flag — it only applies to external stops, so Claude can distinguish "user wants silence" from "Claude decided to stop."

Text Preprocessing

MCP server and CLI — Negative numbers (e.g., -3) are expanded to words (minus 3) before generation. The Kokoro phonemizer silently drops bare negative-sign tokens, so without this preprocessing, -3 degrees would be spoken as just degrees.

kokoro-clipboard — Clipboard text goes through additional preprocessing to improve listening quality:

Markdown syntax stripped (headings, bold, italic, links, fences, tables, etc.)
URLs expanded to speakable form (https://foo.com/path → https colon slash slash foo dot com slash path)
Negative numbers expanded (-3 → minus 3)
Punctuation between digits/words preserved (3.14, 10:30, $1,299.99 stay intact)
[kokoro]...[/kokoro] markers supported to limit what gets spoken
Use --dry-run to preview the cleaned text without audio

Known Issues

Python 3.13+ not supported — spacy and pydantic have incompatibilities on 3.13+. Use Python 3.12.
Short text workaround — Text under 25 characters is automatically padded to avoid an mlx-audio hang bug. This is handled transparently.
Do not install phonemizer — The phonemizer package conflicts with phonemizer-fork (pulled in by mlx-audio). Installing it causes out-of-dictionary words to be silently skipped. See requirements.txt for details.
misaki must be <0.9 — Version 0.9+ breaks EspeakWrapper.set_data_path. This is pinned in requirements.txt.

License

MIT

kokoro-tts-mcp

README

kokoro-tts-mcp

Requirements

Setup

Usage

Command Line (kokoro)

Command Line (kokoro-clipboard)

Keyboard Maestro (ChatGPT Mac workaround)

MCP Server (Claude Code)

MCP Server (Claude Desktop — Chat / Cowork)

MCP Server (Codex CLI)

MCP Server (Codex Mac App)

Smoke Test

Tools

Voices

Playback Control

Multi-Segment Playback

Text Preprocessing

Known Issues

License

推荐服务器

Command Line (`kokoro`)

Command Line (`kokoro-clipboard`)