videoscan-mcp

videoscan-mcp

An MCP server for comprehensive video analysis — AI-powered transcription, visual frame analysis, and metadata extraction from 1000+ platforms.

Category
访问服务器

README

VideoScan MCP

An MCP (Model Context Protocol) server for comprehensive video analysis — AI-powered transcription, visual frame analysis, and metadata extraction from 1000+ platforms.

Features

  • Full video analysis — combines transcription, frame extraction, and metadata in a single call
  • AI vision analysis — describes frames and extracts on-screen text (OCR) using GPT-4o, Claude, or Gemini
  • Audio transcription — Whisper-based transcription with timestamps and language detection
  • Auto-tuning — automatically adjusts frame extraction density, interval, and detail level based on video duration
  • Smart frame extraction — scene-change detection, interval sampling, or combined strategy
  • Deduplication — perceptual hashing removes near-duplicate frames before analysis
  • Metadata extraction — title, duration, chapters, tags, view count, and more without full download
  • Multi-provider — OpenAI, Anthropic, and Google vision providers with per-request override
  • Caching — persistent cache for downloads, frames, and results to minimize repeat costs
  • 1000+ platforms — powered by yt-dlp (YouTube, Vimeo, Twitter/X, TikTok, and more)

Installation

pip install videoscan-mcp

System dependencies

VideoScan requires ffmpeg for video processing and yt-dlp for downloading from URLs.

# macOS
brew install ffmpeg yt-dlp

# Ubuntu/Debian
apt install ffmpeg
pip install yt-dlp

# Windows — install ffmpeg from https://ffmpeg.org/download.html, then:
pip install yt-dlp

Configuration

Copy .env.example to .env and fill in at minimum one API key:

# Vision provider (frame analysis)
VISION_PROVIDER=openai          # openai | anthropic | google
VISION_MODEL=                   # optional — defaults: gpt-4o / claude-sonnet-4-20250514 / gemini-2.0-flash

# Transcription provider
TRANSCRIPTION_PROVIDER=openai   # openai only for now
TRANSCRIPTION_MODEL=whisper-1

# Concurrency
VISION_CONCURRENCY=5            # max parallel vision API calls

# API keys — only need the key for your chosen provider
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

# Cache
CACHE_ENABLED=true
CACHE_DIR=~/.videoscan/cache
CACHE_MAX_SIZE_GB=5
CACHE_DOWNLOAD_TTL=3600         # 1 hour
CACHE_FRAMES_TTL=86400          # 24 hours
CACHE_RESULTS_TTL=604800        # 7 days

# Safety limits (set to 0 for unlimited)
MAX_VIDEO_DURATION=3600         # 60 minutes in seconds
MAX_DOWNLOAD_SIZE=2147483648    # 2 GB in bytes
MAX_ANALYZED_FRAMES=100
DOWNLOAD_TIMEOUT=300
FRAME_ANALYSIS_TIMEOUT=30

Quick Start — Claude Code

Add VideoScan to your Claude Code settings.json (usually at ~/.claude/settings.json):

{
  "mcpServers": {
    "videoscan": {
      "command": "videoscan",
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Or using uvx without a global install:

{
  "mcpServers": {
    "videoscan": {
      "command": "uvx",
      "args": ["videoscan-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Once connected, you can ask Claude things like:

  • "Analyze this YouTube video: https://youtube.com/watch?v=..."
  • "Transcribe the audio from this video file"
  • "What's on screen at the 2:30 mark of this video?"
  • "Extract frames from this video and describe what you see"

Auto-Tuning

When max_frames and interval are not explicitly set, VideoScan automatically adjusts frame extraction parameters based on video duration to optimize cost and coverage:

Duration Frames Interval Strategy Detail
< 2 min ~1/sec (dense) 1s combined detailed
2–10 min ~40 3s combined standard
10–30 min ~30 10s combined standard
30–60 min ~30 20s combined brief
> 60 min ~20 30s scene only brief

Short videos get dense frame extraction for maximum detail, while longer videos use lighter sampling to keep costs down. You can always override by setting max_frames or interval explicitly.

Tool Reference

analyze_video

Full pipeline — transcription + AI frame analysis + metadata in one call. Uses auto-tuning by default.

Parameter Type Default Description
source string required URL or local file path
detail string "standard" Vision level: "brief", "standard", "detailed"
max_frames int auto Maximum frames to analyze — set to -1 (default) for auto-tuning based on duration
threshold float 0.3 Scene change sensitivity (0.0–1.0)
strategy string "combined" Frame extraction: "scene", "interval", "combined"
interval int auto Seconds between frames — set to -1 (default) for auto-tuning based on duration
skip_frames bool false Skip visual analysis (transcription only)
skip_audio bool false Skip transcription (frames only)
language string "auto" Transcription language or "auto"
provider string null Override vision provider
force_refresh bool false Bypass cache

transcribe

Transcribe video or audio to text with timestamps.

Parameter Type Default Description
source string required URL or local file path
language string "auto" Preferred language or "auto" for detection

extract_frames

Extract and AI-analyze frames from a video.

Parameter Type Default Description
source string required URL or local file path
max_frames int 30 Maximum frames to extract (1–100)
threshold float 0.3 Scene change sensitivity (0.0–1.0)
strategy string "combined" "scene", "interval", or "combined"
interval int 5 Seconds between frames in interval mode
detail string "standard" Vision analysis level
deduplicate bool true Remove near-duplicate frames via dHash
provider string null Override vision provider
force_refresh bool false Bypass cache

analyze_moment

Deep-dive analysis on a specific time range.

Parameter Type Default Description
source string required URL or local file path
start float required Start time in seconds
end float required End time in seconds
dense bool true Extract 1 frame per second in the range
detail string "detailed" Vision analysis level
provider string null Override vision provider
force_refresh bool false Bypass cache

get_frame_at

Get a single frame at a specific timestamp, optionally analyzed by AI.

Parameter Type Default Description
source string required URL or local file path
timestamp float required Time in seconds
analyze bool true Run AI vision analysis
provider string null Override vision provider
force_refresh bool false Bypass cache

get_metadata

Fetch video metadata without downloading the full video.

Parameter Type Default Description
source string required URL or local file path
include list null Specific fields to return — "title", "duration", "channel", "description", "thumbnail", "chapters", "tags", "view_count". Returns all if omitted.

Supported Platforms

VideoScan uses yt-dlp under the hood, which supports 1000+ video platforms including:

  • YouTube, YouTube Shorts, YouTube Live
  • Vimeo, Dailymotion, Twitch
  • Twitter/X, Instagram, TikTok, Facebook
  • Reddit, LinkedIn, Pinterest
  • BBC iPlayer, CNN, NBC, CBS
  • SoundCloud, Bandcamp (audio)
  • And hundreds more — see the yt-dlp supported sites list

Local files in any format supported by ffmpeg (mp4, mov, avi, mkv, webm, mp3, wav, etc.) are also supported.

Cost Estimates

Costs depend on your chosen provider and usage:

Operation Provider Approx. Cost
Vision analysis OpenAI GPT-4o ~$0.015 per frame
Vision analysis Anthropic Claude ~$0.024 per frame
Vision analysis Google Gemini ~$0.002 per frame
Transcription OpenAI Whisper ~$0.006 per minute

A typical 10-minute video analyzed with analyze_video (30 frames + transcription) costs approximately $0.45–$0.51 with OpenAI.

Development

git clone https://github.com/guguborbh/videoscan-mcp
cd videoscan-mcp
pip install -e ".[dev]"
pytest

License

MIT License — see LICENSE for details.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选