YouTube Transcript Fetcher (YTT)
An MCP server that enables searching YouTube and retrieving high-accuracy video transcripts using local Whisper AI transcription without requiring an API key. It supports single or batch processing and provides transcripts in multiple formats including text, JSON, and SRT.
README
YouTube Transcript Fetcher
Fetch transcripts from any YouTube video using Whisper AI transcription. Search YouTube and get transcripts for the top results. No YouTube API key required.
Features
- Whisper-powered — State-of-the-art AI transcription, 99%+ accuracy
- YouTube Search — Search YouTube and get transcripts for top results
- No API key needed — Works without YouTube Data API credentials
- Multiple formats — Text, JSON, SRT, VTT output
- Caching — SQLite-backed cache avoids re-transcribing
- Rate-limit free — Whisper runs locally, no external API limits
- CLI & library — Use as a command-line tool or Python module
- MCP server — Integrate with AI tools via Model Context Protocol
Installation
# Clone the repository
git clone https://github.com/andrewctf/ytt.git
cd ytt
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Optional: GPU support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
Note: For detailed GPU/CUDA setup, see QUICKSTART.md.
Additional Setup for Whisper
Whisper requires ffmpeg for audio extraction:
Windows (with winget):
winget install ffmpeg
macOS:
brew install ffmpeg
Linux:
sudo apt install ffmpeg
Quick Start
For detailed installation and setup instructions, see QUICKSTART.md.
CLI
# Get transcript (Whisper is used by default)
python cli.py transcript VIDEO_ID
# Or with a full YouTube URL
python cli.py transcript "https://www.youtube.com/watch?v=a1JTPFfshI0"
# Different output formats
python cli.py transcript VIDEO_ID --format json
python cli.py transcript VIDEO_ID --format srt
python cli.py transcript VIDEO_ID --format vtt
# Save to file
python cli.py transcript VIDEO_ID --output transcript.txt
# Batch processing
python cli.py transcript VIDEO_ID1 VIDEO_ID2 VIDEO_ID3
# Search YouTube for videos and get transcripts
python cli.py search "Python tutorial" --limit 5 --with-transcripts
# Search only (no transcripts)
python cli.py search "Python tutorial" --limit 10
# JSON output for search
python cli.py search "Python tutorial" --format json
# Cache management
python cli.py cache-stats
python cli.py cache-stats --clean # Remove expired entries
Python Library
from src.service import get_transcript
from src.search_service import search, search_and_get_transcripts
# Basic usage
result = await get_transcript("VIDEO_ID")
print(result.content)
# With options
result = await get_transcript(
"VIDEO_ID",
language="en",
output_format="json",
use_cache=True,
)
# Access metadata
print(f"Source: {result.source}") # 'whisper' or 'innertube'
print(f"Language: {result.language}") # Detected language
print(f"Video ID: {result.video_id}")
# Search YouTube for videos
results = await search("Python tutorial", max_results=5)
for video in results:
print(f"{video.title} ({video.video_id}) - {video.channel_name}")
# Search and get transcripts for results
results = await search_and_get_transcripts("Python tutorial", max_results=3, language="en")
for video, transcript in results:
if transcript:
print(f"{video.title}: {transcript.content[:100]}...")
For synchronous usage:
import asyncio
from src.service import get_transcript
from src.search_service import search
def fetch_transcript(video_id):
return asyncio.run(get_transcript(video_id))
def search_videos(query, max_results=5):
return asyncio.run(search(query, max_results=max_results))
result = fetch_transcript("VIDEO_ID")
print(result.content)
videos = search_videos("Python tutorial")
MCP Server
Note: See QUICKSTART.md for detailed configuration with Claude Desktop, Cursor, and VS Code.
Start the MCP server:
python -m mcp_server.server
The server exposes three tools:
get_transcript- Get transcript for a single videoget_transcripts_batch- Get transcripts for multiple videos concurrentlysearch_videos- Search YouTube for videos matching a query
Or integrate with Claude Desktop by adding to your MCP settings:
{
"mcpServers": {
"yt-transcript": {
"command": "python",
"args": ["-m", "mcp_server.server"],
"cwd": "/absolute/path/to/ytt"
}
}
}
How It Works
Video ID → Cache Check
↓ found?
Return Cached
↓ not found
Whisper (primary)
- Download audio via yt-dlp
- Transcribe with faster-whisper
- Returns word-level timestamps
↓ fails?
Innertube API (fallback)
- Extract API key from video page
- Fetch caption tracks
- Parse JSON3 timed text
↓
Cache Result
↓
Format & Return
Whisper (Primary)
- Downloads audio using
yt-dlp - Transcribes using
faster-whisper(CPU-optimized) - Returns word-level timestamps and segment text
- Works on any video with audio
- ~1-3x real-time processing speed
Innertube API (Fallback)
- Scrapes YouTube's internal API
- No API key required
- Fast (~0.5-2s per video)
- ~85% coverage (some videos lack captions)
- Rate limited (~5 req/10s per IP)
Output Formats
Text (default)
Good morning, here we are, a live suturing course like nobody else has ever
done and what are we covering, we're covering every suturing technique...
JSON
{
"video_id": "a1JTPFfshI0",
"language": "en",
"source": "whisper",
"segments": [
{"start": 0.0, "end": 4.5, "text": "Good morning, here we are..."},
{"start": 4.5, "end": 9.2, "text": "a live suturing course..."}
]
}
SRT (SubRip)
1
00:00:00,000 --> 00:00:04,500
Good morning, here we are, a live suturing course...
2
00:00:04,500 --> 00:00:09,200
a live suturing course like nobody else...
VTT (WebVTT)
WEBVTT
00:00:00.000 --> 00:00:04.500
Good morning, here we are, a live suturing course...
00:00:04.500 --> 00:00:09.200
a live suturing course like nobody else...
Configuration
Edit config.py to customize behavior:
class Config:
# Whisper settings
WHISPER_MODEL = "base" # tiny/base/small/medium/large
WHISPER_FALLBACK_ENABLED = True
# Cache settings
CACHE_TTL_DAYS = 7
CACHE_DB_PATH = ".transcript_cache.db"
# Rate limiting (for Innertube fallback)
RATE_LIMIT_RATE = 0.5 # tokens per second
RATE_LIMIT_BURST = 5 # max bucket size
# Batch processing
MAX_BATCH_SIZE = 50
Whisper Models
| Model | Speed | Accuracy | Memory |
|---|---|---|---|
| tiny | 10x | ~75% | ~1GB |
| base | 7x | ~85% | ~1GB |
| small | 4x | ~90% | ~2GB |
| medium | 2x | ~95% | ~5GB |
| large | 1x | ~97% | ~6GB |
The base model is recommended for most use cases — fast and accurate enough.
File Structure
ytt/
├── src/
│ ├── __init__.py
│ ├── fetcher.py # Innertube API client
│ ├── whisper_runner.py # Whisper transcription
│ ├── parser.py # Caption parsing utilities
│ ├── formatters.py # Output formatters
│ ├── cache.py # SQLite cache
│ ├── rate_limiter.py # Token bucket
│ ├── service.py # Orchestrator
│ ├── searcher.py # YouTube search
│ ├── search_cache.py # Search result cache
│ ├── search_service.py # Search orchestrator
│ ├── cuda_dll_manager.py # Auto-download CUDA libraries
│ └── exceptions.py # Custom exceptions
├── mcp_server/
│ ├── __init__.py
│ └── server.py # FastMCP server
├── cli.py # CLI entrypoint
├── main.py # Library entrypoint
├── config.py # Configuration
├── requirements.txt # Core dependencies
├── requirements-mcp.txt # MCP dependencies
├── README.md
└── QUICKSTART.md
Troubleshooting
"No module named 'rich'"
Install dependencies:
pip install -r requirements.txt
Whisper fails with "ffmpeg not found"
Install ffmpeg (see Installation section above).
Slow transcription speed
- Use a smaller Whisper model (
baseinstead oflarge) - Use GPU acceleration by changing
device="cpu"todevice="cuda"inwhisper_runner.py - Enable cache to avoid re-transcribing
Rate limiting from Innertube
The Innertube fallback is rate-limited by YouTube (~5 req/10s). Use Whisper as primary (default) to avoid this. The cache also prevents redundant requests.
Cache not working
Check cache stats:
python cli.py cache-stats
Clean expired entries:
python cli.py cache-stats --clean
Development
Run tests
pytest
Format code
black src/
ruff check src/
License
MIT License
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。