mcp-youtube-transcript
Extracts YouTube transcripts and performs AI-powered video analysis for Claude Desktop, enabling transcript retrieval, quality analysis, and smart resource management.
README
🎥 YouTube Video Intelligence Suite
Professional-grade YouTube transcript extraction and AI-powered video analysis for Claude Desktop
A comprehensive Model Context Protocol (MCP) server that transforms YouTube videos into intelligent, searchable content through advanced transcript extraction and AI analysis. No API keys required - works seamlessly with Claude Desktop's built-in intelligence.
� Current Version: v0.5.0
Latest Enhancement: VTT→SRV1 Migration with Enhanced Quality Analysis
- Smart Format Fallback: SRV1 → JSON3 → TTML → VTT priority chain for superior quality
- Advanced Quality Analysis: Comprehensive safety validation with quality scoring
- Enhanced Deduplication: Intelligent duplicate detection with effectiveness tracking
- Professional-Grade Output: Industry-standard transcript quality with safety validation
🚀 Quick Start
Prerequisites
- Python 3.10+
- uv package manager
- Claude Desktop app
- No API keys required! ✨
Installation & Testing
# Clone and setup
git clone <repository-url>
cd mcp-youtube-transcript
uv sync
# Quick test (optional but recommended)
python quick_test.py
# Or use automated setup
./setup.sh
Claude Desktop Configuration
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"youtube-transcript": {
"command": "uv",
"args": [
"run",
"--directory",
"/FULL/PATH/TO/mcp-youtube-transcript",
"python",
"main.py"
]
}
}
}
⚠️ Replace /FULL/PATH/TO/mcp-youtube-transcript with your actual project path!
Test in Claude Desktop
Get the transcript from: https://www.youtube.com/watch?v=jNQXAC9IVRw
📖 For complete setup instructions, see DEPLOYMENT_GUIDE.md
🛠️ Standalone Extraction Tools
NEW: Professional CLI Tool (Decoupled from MCP)
A complete standalone extraction system with transcript + comment support:
# Extract transcript only
uv run python scripts/youtube_extract.py <video-url>
# Extract transcript + comments
uv run python scripts/youtube_extract.py <video-url> --comments --max-comments 100
# With custom options
uv run python scripts/youtube_extract.py <video-url> \
--comments \
--max-comments 50 \
--comment-replies \
--format both \
--output ./data
# Minimal format optimized for Claude
uv run python scripts/youtube_extract.py <video-url> --minimal
Features:
- ✅ Transcript extraction (multi-format fallback)
- ✅ Comment extraction with threading
- ✅ Quality analysis and metrics
- ✅ Multiple output formats (Markdown, JSON)
- ✅ Completely standalone (no MCP dependency)
- ✅ Production-ready error handling
📖 Full documentation: docs/STANDALONE_CLI.md
Legacy Tool (MCP-focused)
# Basic extraction for MCP resources
uv run scripts/youtube_to_mcp.py <video-url>
# Output saved to resources/transcripts/ as markdown files
🌟 Features
🏗️ Complete MCP Architecture
- 8 Core Tools - Professional transcript extraction + advanced analysis
- 6 Smart Resources - Zero-token access to cached data and analytics
- 3 Essential Prompts - Guided conversation starters for common workflows
- Enhanced Quality Pipeline - Advanced deduplication and safety validation
- Rich Metadata - Comprehensive video information with engagement metrics
- Modular Design - Shared extraction module for consistency across interfaces
🚀 Enhanced Extraction Pipeline (v0.5.0)
- Smart Format Fallback - SRV1 → JSON3 → TTML → VTT priority chain for best quality
- Advanced Quality Analysis - Comprehensive safety validation with quality metrics
- Intelligent Deduplication - Advanced algorithms with effectiveness tracking
- HTML Entity Support - Proper decoding across all subtitle formats
- Context-Aware Validation - Video metadata integration for enhanced assessment
- Professional-Grade Output - Industry-standard transcript quality
Core Transcript Extraction
- Multi-format YouTube URL support (youtube.com, youtu.be, embed URLs)
- Multi-language transcript extraction with automatic fallbacks
- Robust error handling with detailed quality analysis
- yt-dlp based extraction for universal reliability (no cloud server blocking)
- Enhanced text processing with proper HTML entity decoding
🔧 8 Core Tools
Transcript Extraction
- get_youtube_transcript - Primary extraction with quality analysis
- get_youtube_transcript_ytdlp - Alternative extraction method
- get_plain_text_transcript - Clean text output with deduplication
- get_transcript_quality_analysis - Comprehensive quality metrics
Video Analysis
- get_enhanced_video_metadata - Rich video information and engagement metrics
- create_mcp_resource_from_transcript_v2 - Save transcripts as MCP resources
System Tools
- search_transcript - Find content within transcripts
- get_system_status - Server health and configuration info
📊 6 Smart Resources
Access cached data and enhanced content through MCP resources:
- transcripts://available - Browse all available transcripts
- transcripts://content/{video_id} - Access specific transcript content
- transcripts://cached - View all cached transcripts with metadata
- transcripts://quality_report - System-wide quality analytics and trends
- analytics://history - View previous analysis results and usage patterns
- system://status - Server status and configuration information
🎯 3 Essential Prompts
Guided workflows for comprehensive analysis:
- transcript_analysis_workshop - Deep-dive video content analysis
- study_notes_generator - Create structured study materials from videos
- video_insight_explorer - Comprehensive video exploration and insights
🎨 What You Can Do
Basic Operations
"Get the transcript from: [YouTube URL]"
"Extract transcript from this video: [URL]"
"Show me the quality analysis for: [URL]"
Advanced Analysis
"Analyze this video for key points: [URL]"
"Create study notes from: [Educational video URL]"
"Generate a comprehensive analysis of: [URL]"
"Compare the arguments in these videos: [URL1] [URL2]"
Resource Access
"Show me all cached transcripts"
"What's the quality report for the system?"
"Access the transcript content for video ID: abc123"
🏗️ Architecture
Modular Design
- src/youtube_core/ - Standalone extraction library (NEW!)
extractor.py- Core extraction classtranscript.py- Transcript extraction logiccomments.py- Comment retrieval systemquality.py- Quality analysis engineformatters.py- Output formattingconfig.py- Configuration management
- streamlined_server.py - Complete MCP server implementation
- main.py - Entry point for Claude Desktop integration
- scripts/youtube_extract.py - Professional standalone CLI tool
- scripts/youtube_to_mcp.py - Legacy MCP-focused tool
Quality-First Approach
- Smart Format Selection - Automatic fallback ensures best available quality
- Advanced Deduplication - Sophisticated algorithms remove caption overlaps
- Safety Validation - Multi-layer content quality checks
- Professional Output - Industry-standard transcript formatting
Standalone Core Library
The new youtube_core module provides:
- Zero MCP dependency - Use independently anywhere
- Programmatic API - Import and use in your Python projects
- Complete functionality - Transcript + comments + metadata + quality analysis
- Production-ready - Error handling, timeouts, retries
from youtube_core import YouTubeExtractor
extractor = YouTubeExtractor()
result = extractor.extract(url, include_comments=True)
Zero Dependencies Bloat
- yt-dlp - Reliable transcript extraction (no cloud server blocking)
- mcp - Model Context Protocol integration (optional for standalone use)
- Pure Python - No heavy AI libraries or API dependencies
📈 Version History
v0.5.0 (Current) - VTT→SRV1 Migration
- Smart format fallback system (SRV1 → JSON3 → TTML → VTT)
- Enhanced quality analysis with safety validation
- Advanced deduplication with effectiveness tracking
- Professional-grade transcript quality
v0.4.0 - Complete yt-dlp Migration
- Removed youtube-transcript-api dependency
- Universal reliability with yt-dlp-only approach
- Enhanced VTT processing and deduplication
- Eliminated cloud server blocking issues
v0.3.0 - Enhanced Quality & Rich Resources
- Major quality improvements (5,700% richer content)
- Advanced deduplication algorithms
- Comprehensive resource architecture
- Enhanced metadata integration
🔧 Troubleshooting
Common Issues
-
"Command not found: uv"
curl -LsSf https://astral.sh/uv/install.sh | sh source ~/.zshrc -
Claude Desktop not recognizing server
- Verify full path in claude_desktop_config.json
- Restart Claude Desktop completely
- Run
python quick_test.pyto validate setup
-
Transcript extraction fails
- Check internet connection
- Verify video has available transcripts
- Try alternative extraction method
Validation
# Test everything works
python quick_test.py
# Test manual extraction
uv run scripts/youtube_to_mcp.py https://www.youtube.com/watch?v=jNQXAC9IVRw
📚 Documentation
- DEPLOYMENT_GUIDE.md - Complete setup instructions
- docs/ - Comprehensive documentation
- CHANGELOG.md - Version history and changes
- SHARING_SUMMARY.md - Repository sharing guide
🎯 Success Criteria
You should be able to:
- [x] Extract transcripts from any YouTube video
- [x] Perform AI analysis without API keys
- [x] Access cached content through MCP resources
- [x] Use guided prompts for complex analysis
- [x] Run standalone extraction scripts
- [x] Get professional-grade transcript quality
🎉 Transform YouTube videos into intelligent, searchable content with professional-grade quality! 🎥✨
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。