Advanced TTS MCP Server

Advanced TTS MCP Server

Provides high-quality text-to-speech synthesis with 10 natural voices, emotion control, and dynamic pacing for professional applications requiring expressive speech output.

Category
访问服务器

README

Advanced TTS MCP Server

A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.

✨ Features

🎯 Advanced Voice Control

  • 10 High-Quality Voices - Male and female voices with distinct personalities
  • Emotion Control - Neutral, happy, excited, calm, serious, casual, confident
  • Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes
  • Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume

🚀 Professional Capabilities

  • Streaming Audio - Real-time synthesis and playback
  • Batch Processing - Handle multiple text segments efficiently
  • Multiple Formats - WAV, MP3, FLAC, OGG output support
  • Natural Speech Enhancement - Automatic pause insertion and emotion markers
  • Queue Management - Handle multiple concurrent requests

🔧 MCP Integration

  • 6 Powerful Tools - Complete synthesis, batch processing, voice management
  • 2 Rich Resources - Voice capabilities and usage examples
  • Real-time Status - Track processing progress and manage requests
  • File Management - Save, list, and organize audio outputs

🚀 Quick Start

Option 1: Deploy to Smithery.ai (Recommended)

🎯 One-Click Deployment to Smithery Platform

  1. Deploy Now: Visit Smithery.ai and import this repository
  2. Configure: Set your preferred voice and speech settings
  3. Use Instantly: Access via Claude Desktop or any MCP-compatible client

Benefits:

  • ✅ Zero setup required
  • ✅ Automatic scaling and updates
  • ✅ No model downloads needed
  • ✅ Enterprise-grade hosting

📋 Full Smithery Deployment Guide →

Option 2: Local Installation

Prerequisites:

  • Node.js 18+

Installation:

  1. Clone the repository
git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcp
  1. Install dependencies
npm install
  1. Configure Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "advanced-tts": {
      "command": "node",
      "args": ["dist/index.js"],
      "cwd": "/path/to/advanced-tts-mcp"
    }
  }
}
  1. Start using!
# Build TypeScript
npm run build

# Start server
npm start

Restart Claude Desktop and start synthesizing with natural, expressive voices.

🎙️ Available Voices

Voice ID Name Gender Description
af_heart Heart Female Warm, friendly voice (default)
af_sky Sky Female Clear, bright voice
af_bella Bella Female Elegant, sophisticated voice
af_sarah Sarah Female Professional, confident voice
af_nicole Nicole Female Gentle, soothing voice
am_adam Adam Male Strong, authoritative voice
am_michael Michael Male Friendly, approachable voice
bf_emma Emma Female Young, energetic voice
bf_isabella Isabella Female Mature, expressive voice
bm_lewis Lewis Male Deep, resonant voice

📚 Usage Examples

Basic Synthesis

# Simple text-to-speech
await synthesize_speech(
    text="Hello! Welcome to Advanced TTS.",
    voice_id="af_heart"
)

Emotional Expression

# Excited announcement
await synthesize_speech(
    text="This is amazing news! You're going to love this new feature!",
    voice_id="af_heart",
    emotion="excited",
    pacing="conversational",
    speed=1.1
)

Professional Presentation

# Tutorial narration
await synthesize_speech(
    text="Step one: Open your browser. Step two: Navigate to the website.",
    voice_id="am_adam", 
    emotion="calm",
    pacing="tutorial",
    speed=0.9
)

Batch Processing

# Multiple segments with pauses
await batch_synthesize(
    segments=[
        "Welcome to our presentation.",
        "Today we'll cover three main topics.", 
        "Let's begin with the first topic."
    ],
    voice_id="af_sarah",
    emotion="confident",
    pacing="presentation",
    merge_output=True,
    segment_pause=1.0,
    save_file=True
)

🛠️ Available Tools

synthesize_speech

Convert text to natural speech with full control over voice characteristics.

Parameters:

  • text - Text to synthesize (max 10,000 chars)
  • voice_id - Voice selection (see table above)
  • speed - Speech rate (0.25-3.0)
  • emotion - Voice emotion (neutral, happy, excited, calm, serious, casual, confident)
  • pacing - Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)
  • volume - Audio volume (0.1-2.0)
  • output_format - File format (wav, mp3, flac, ogg)
  • save_file - Save to file (boolean)
  • filename - Custom filename

batch_synthesize

Process multiple text segments efficiently with optional merging.

Parameters:

  • segments - List of text segments
  • merge_output - Combine into single file
  • segment_pause - Pause between segments (0.0-5.0s)
  • All synthesis parameters from above

get_voices

Retrieve complete voice information and capabilities.

get_status

Check processing status for synthesis requests.

cancel_request

Cancel active synthesis operations.

list_output_files

Browse saved audio files with metadata.

🎛️ Voice Controls

Emotions

  • Neutral - Standard, professional tone
  • Happy - Upbeat, cheerful expression
  • Excited - Enthusiastic, energetic delivery
  • Calm - Relaxed, soothing tone
  • Serious - Formal, authoritative delivery
  • Casual - Relaxed, conversational style
  • Confident - Assured, professional tone

Pacing Styles

  • Natural - Balanced, human-like rhythm
  • Conversational - Casual discussion pace
  • Presentation - Professional speaking rhythm
  • Tutorial - Educational, clear delivery
  • Narrative - Storytelling pace
  • Fast - Quick delivery (1.2x base speed)
  • Slow - Deliberate delivery (0.8x base speed)

🎵 Audio Formats

Format Quality Use Case
WAV Uncompressed Highest quality, editing
MP3 Compressed Web, streaming, sharing
FLAC Lossless Archival, high-quality storage
OGG Compressed Open source alternative

🔧 Configuration

Environment Variables

# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin

# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100

# Audio settings  
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=true

Server Configuration

config = ServerConfig(
    model_path="./kokoro-v1.0.onnx",
    voices_path="./voices-v1.0.bin", 
    output_dir="./audio_output",
    max_queue_size=100,
    enable_streaming=True,
    default_voice="af_heart"
)

🏗️ Architecture

├── src/advanced_tts/
│   ├── __init__.py          # Package initialization
│   ├── server.py            # MCP server implementation  
│   ├── engine.py            # Kokoro TTS engine wrapper
│   ├── models.py            # Data models and validation
│   └── utils.py             # Utility functions
├── pyproject.toml           # Project configuration
├── README.md               # Documentation
└── LICENSE                 # MIT License

🤝 Contributing

Contributions welcome! Areas for improvement:

  • Additional voice models
  • Real-time streaming synthesis
  • Advanced audio effects
  • Multi-language support
  • Performance optimizations

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

  • Kokoro TTS - High-quality neural voice synthesis
  • MCP Protocol - Seamless AI model integration
  • FastMCP - Efficient server framework

Developed by Sami Halawa

Transform your text into natural, expressive speech with Advanced TTS MCP Server.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选