MCP 服务器

Text to Speech MCP Server

Enables agents to convert text to speech using OpenAI's TTS models with voice selection, delivery instructions, and queue-based audio playback. Supports both blocking and non-blocking modes for flexible audio generation and playback control.

README

🎤 Text to Speech MCP Server

Where your agent finally learns to speak up for itself

Welcome to the Text to Speech (TTS) MCP Server – a sophisticated yet charmingly chaotic text-to-speech MCP server that transforms your boring written words into magnificent audible experiences.
Because who needs human vocal cords when you have Python and some very fancy AI models?

🚀 What Does This Do?

This delightful contraption takes your text and makes it speak through your computer's speakers using OpenAI's cutting-edge TTS models. It's like having a personal narrator, except they never get tired, never ask for coffee breaks, and never judge your terrible programming jokes.

Features That Actually Matter

Speak MCP Tool: Gives your agent the ability to voice any given text in one of several available voices
Instructions for Delivery: Provide optional instructions to guide delivery, character, pacing, tone, and emotion
Model Selection: OpenAI TTS model can be configured via environment variables (default: gpt-4o-mini-tts)
Blocking/Non-Blocking Mode: Speak commands can either return immediately for continued agent operation while sound is playing (default) or return only after the sound finishes for a more controlled workflow
Queue-Based Audio Playback: Agents can queue up messages to wait patiently in line and be played in sequence

🛠️ Installation & Setup

Prerequisites

Python 3.10+
An OpenAI API key (the magic ingredient)
PortAudio (required for PyAudio to work properly)
A sense of humor (optional but recommended)

Quick Start

Install PortAudio:

# macOS
brew install portaudio

# Linux (Debian/Ubuntu)
sudo apt-get install portaudio19-dev

# Windows
pip install pipwin && pipwin install pyaudio

Clone this repository:
```
git clone <your-repo-url>
cd tts-mcp
```

Create a virtual environment (because global installs are for rebels):

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Set up your environment variables:

cp env.template .env
# Edit .env and add your OpenAI API key

Or set directly:

export OPENAI_API_KEY="your-secret-key-here"

Configure MCP in your Cursor settings with the provided mcp-config.json. Example:

{
  "mcpServers": {
    "tts-server": {
      "command": "/absolute/path/to/tts-mcp/.venv/bin/python",
      "args": ["/absolute/path/to/tts-mcp/tts_mcp_server.py"],
      "cwd": "/absolute/path/to/tts-mcp",
      "env": { "PYTHONPATH": "/absolute/path/to/tts-mcp" }
    }
  }
}

Replace paths with your local repo and venv.

Start making your computer talk!

🎭 Voice Options

Choose your narrator wisely:

alloy: Neutral, balanced tone (default)
ash: warm, expressive; friendly support vibes
ballad: smooth narrator; long-form storytelling
coral: bright, upbeat; cheerful promos
echo: Clear and professional, like a news anchor
fable: Warm and storytelling, perfect for bedtime code reviews
onyx: Deep and authoritative, for when your code needs to sound important
nova: Bright and energetic, like your enthusiasm before debugging
sage: calm, measured; helpful explainer
shimmer: Soft and gentle, for when you need to break bad news about production bugs
verse: dramatic, theatrical; trailer read

🎪 Usage Examples

Basic Usage

# Non-blocking (default) - returns immediately
speak("Hello, world! I'm now audible!")

# Blocking - waits for completion
speak("This message will finish before I return", blocking=True)

# With specific voice
speak("I'm feeling dramatic today!", voice="fable")

# With delivery instructions
speak(
    "You're doing great—let's take this one step at a time.",
    voice="shimmer",
    instructions="Speak in a warm, reassuring and unhurried tone and pace"
)

In Cursor with MCP

Just tell Cursor to use the speak tool in your conversations.
You can suggest a voice and style instructions for maintaining a consistent character.

⚙️ Configuration

Environment variables:

OPENAI_API_KEY (required): Your OpenAI API key
TTS_MODEL (optional): Defaults to gpt-4o-mini-tts. Other options include tts-1, tts-1-hd (though "instructions" are not supported on those, as well as some of the voices)
LOG_LEVEL (optional): DEBUG, INFO (default), WARNING, ERROR

🧰 Troubleshooting

No audio / no default output device:
- Set a system default output device and restart the MCP server.
- macOS: System Settings → Sound → Output.
PyAudio install issues:
- macOS: brew install portaudio then pip install -r requirements.txt
- Linux (Debian/Ubuntu): sudo apt-get install portaudio19-dev then pip install pyaudio
- Windows: pip install pipwin && pipwin install pyaudio
Missing API key:
- Ensure .env contains OPENAI_API_KEY=... or export it in your shell.
High latency or choppy audio:
- Close other audio apps; verify system output device; keep blocking=False if you need responsiveness.
Logs:
- Logs stream to stderr and to tts_mcp_server.log. Tail with:
```
tail -f tts_mcp_server.log
```

🙏 Acknowledgments

Cursor for writing 95% of the code here
Coffee, for making everything else possible

Remember: With great text-to-speech power comes great responsibility. Use your new vocal abilities wisely, and try not to annoy your coworkers too much.

Pro tip: If your computer starts talking back to you without being prompted, it might be time to take a break. Or update your Python version. Probably the latter.

This project is licensed under the BSD 3-Clause License. See the LICENSE file for details.