mcp-video-analyzer
MCP server for video analysis — extracts transcripts, key frames with OCR, and annotated timelines from video URLs. Supports Loom and direct video files (.mp4, .webm). Zero auth required.
README
mcp-video-analyzer
MCP server for video analysis — extracts transcripts, key frames, and metadata from video URLs. Supports Loom, direct video files (.mp4, .webm), and more.
No existing video MCP combines transcripts + visual frames + metadata in one tool. This one does.
Quick Start
# One-command install for Claude Code
claude mcp add video-analyzer npx mcp-video-analyzer@latest
Or manually add to your MCP config (Claude Desktop, Cursor, VS Code):
{
"mcpServers": {
"video-analyzer": {
"command": "npx",
"args": ["mcp-video-analyzer@latest"]
}
}
}
Tools
analyze_video — Full video analysis
Extracts everything from a video URL in one call:
> Analyze this video: https://www.loom.com/share/abc123...
Returns:
- Transcript with timestamps and speakers
- Key frames extracted via scene-change detection (automatically deduplicated)
- OCR text extracted from frames (code, error messages, UI text visible on screen)
- Annotated timeline merging transcript + frames + OCR into a unified "what happened when" view
- Metadata (title, duration, platform)
- Comments from viewers
- Chapters and AI summary (when available)
The AI will automatically call this tool when it sees a video URL — no need to ask.
Options:
detail— analysis depth:"brief"(metadata + truncated transcript, no frames),"standard"(default),"detailed"(dense sampling, more frames)fields— array of specific fields to return, e.g.["metadata", "transcript"]. Available:metadata,transcript,frames,comments,chapters,ocrResults,timeline,aiSummarymaxFrames(1-60, default depends on detail level) — cap on extracted framesthreshold(0.0-1.0, default 0.1) — scene-change sensitivityforceRefresh— bypass cache and re-analyzeskipFrames— skip frame extraction for transcript-only analysis
get_transcript — Transcript only
> Get the transcript from this video
Quick transcript extraction. Falls back to Whisper transcription when no native transcript is available.
get_metadata — Metadata only
> What's this video about?
Returns metadata, comments, chapters, and AI summary without downloading the video.
get_frames — Frames only
> Extract frames from this video with dense sampling
Two modes:
- Scene-change detection (default) — captures visual transitions
- Dense sampling (
dense: true) — 1 frame/sec for full coverage
analyze_moment — Deep-dive on a time range
> Analyze what happens between 1:30 and 2:00 in this video
Combines burst frame extraction + filtered transcript + OCR + annotated timeline for a focused segment. Use when you need to understand exactly what happens at a specific moment.
get_frame_at — Single frame at a timestamp
> Show me the frame at 1:23 in this video
The AI reads the transcript, spots a critical moment, and requests the exact frame to see what's on screen.
get_frame_burst — N frames in a time range
> Show me 10 frames between 0:15 and 0:17 of this video
For motion, vibration, animations, or fast scrolling — burst mode captures N frames in a narrow window so the AI can see frame-by-frame changes.
Detail Levels
| Level | Frames | Transcript | OCR | Timeline | Use case |
|---|---|---|---|---|---|
brief |
None | First 10 entries | No | No | Quick check — what's this video about? |
standard |
Up to 20 (scene-change) | Full | Yes | Yes | Default — full analysis |
detailed |
Up to 60 (1fps dense) | Full | Yes | Yes | Deep analysis — every second captured |
Caching
Results are cached in memory for 10 minutes. Subsequent calls with the same URL and options return instantly. Use forceRefresh: true to bypass the cache.
Supported Platforms
| Platform | Transcript | Metadata | Comments | Frames | Auth |
|---|---|---|---|---|---|
| Loom | Yes | Yes | Yes | Yes | None |
| Direct URL (.mp4, .webm) | No | Duration only | No | Yes | None |
Frame Extraction Strategies
Frame extraction uses a two-strategy fallback chain — no single dependency is required:
| Strategy | How it works | Speed | Requirements |
|---|---|---|---|
| yt-dlp + ffmpeg (primary) | Downloads video, extracts frames via scene detection | Fast, precise | yt-dlp (pip install yt-dlp) |
| Browser (fallback) | Opens video in headless Chrome, seeks to timestamps, takes screenshots | Slower, no download needed | Chrome or Chromium installed |
The fallback is automatic — if yt-dlp is not available, the server tries browser-based extraction via puppeteer-core. If neither is available, analysis still returns transcript + metadata + comments, just no frames.
Post-Processing Pipeline
After frame extraction, the pipeline automatically applies:
| Step | What it does | Why |
|---|---|---|
| Frame deduplication | Removes near-identical consecutive frames using perceptual hashing (dHash + Hamming distance) | Screencasts often have long static moments — dedup removes redundant frames, saving tokens |
| OCR | Extracts text visible on screen from each frame (via tesseract.js) | Captures code, error messages, terminal output, UI text that the transcript doesn't cover |
| Annotated timeline | Merges transcript timestamps + frame timestamps + OCR text into a single chronological view | Gives the AI a unified "what was said, what changed visually, and what text appeared" at each moment |
The OCR step requires tesseract.js (included as a dependency). If it fails to load, analysis continues without OCR — no frames or transcript are lost.
Complementary Tools
Chrome DevTools MCP
For live web debugging alongside video analysis, pair this server with the Chrome DevTools MCP:
claude mcp add chrome-devtools npx @anthropic-ai/mcp-devtools@latest
When to use each:
| Scenario | Tool |
|---|---|
| Bug report recorded as a Loom video | mcp-video-analyzer — extract transcript, frames, and error text from the recording |
| Live debugging a web page | Chrome DevTools MCP — inspect DOM, console, network, take screenshots |
| Video shows UI issue, need to reproduce it | Use both: analyze the video first, then open the page in Chrome DevTools to reproduce |
The two MCPs complement each other: video analyzer understands recorded content, DevTools interacts with live pages.
Example Output
The examples/loom-demo/ folder contains real outputs from analyzing a public Loom video (Boost In-App Demo Video, 2:55).
| File | What it shows |
|---|---|
metadata.json |
Title, duration, platform |
transcript.json |
42 timestamped entries with speaker IDs |
timeline.json |
Unified chronological view (transcript + frames merged) |
moment-transcript-0m30s-0m45s.json |
Filtered transcript for analyze_moment (0:30–0:45) |
full-analysis.json |
Complete analyze_video output |
Frame images (19 total in examples/loom-demo/frames/):
scene_*.jpg— scene-change detection (key visual transitions)dense_*.jpg— 1fps dense sampling (every 10th frame saved as sample)burst_*.jpg— burst extraction for moment analysis (0:30–0:45)
Regenerate after changes:
npx tsx examples/generate.ts— requires yt-dlp + network access.
Development
# Install dependencies
npm install
# Run all checks (format, lint, typecheck, knip, tests)
npm run check
# Build
npm run build
# Run E2E tests (requires network)
npm run test:e2e
# Open MCP Inspector for manual testing
npm run inspect
Architecture
src/
├── index.ts # Entry point (shebang + stdio)
├── server.ts # FastMCP server + tool registration
├── tools/ # MCP tool definitions (7 tools)
│ ├── analyze-video.ts # Full analysis with detail levels + caching
│ ├── analyze-moment.ts # Deep-dive on a time range
│ ├── get-transcript.ts # Transcript-only with Whisper fallback
│ ├── get-metadata.ts # Metadata + comments + chapters
│ ├── get-frames.ts # Frames-only (scene-change or dense)
│ ├── get-frame-at.ts # Single frame at timestamp
│ └── get-frame-burst.ts # N frames in a time range
├── adapters/ # Platform-specific logic
│ ├── adapter.interface.ts # IVideoAdapter interface + registry
│ ├── loom.adapter.ts # Loom: authless GraphQL
│ └── direct.adapter.ts # Direct URL: any mp4/webm link
├── processors/ # Shared processing
│ ├── frame-extractor.ts # ffmpeg scene detection + dense + burst extraction
│ ├── browser-frame-extractor.ts # Headless Chrome fallback for frames
│ ├── audio-transcriber.ts # Whisper fallback (HF transformers → CLI → OpenAI)
│ ├── image-optimizer.ts # sharp resize/compress
│ ├── frame-dedup.ts # Perceptual dedup (dHash + Hamming distance)
│ ├── frame-ocr.ts # OCR text extraction (tesseract.js)
│ └── annotated-timeline.ts # Unified timeline (transcript + frames + OCR)
├── config/
│ └── detail-levels.ts # brief / standard / detailed config
├── utils/
│ ├── cache.ts # In-memory TTL cache with LRU eviction
│ ├── field-filter.ts # Selective field filtering for responses
│ ├── url-detector.ts # Platform detection from URL
│ ├── vtt-parser.ts # WebVTT → transcript entries
│ └── temp-files.ts # Temp directory management
└── types.ts # Shared TypeScript interfaces
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。