MCP Gemini Video Understanding

MCP Gemini Video Understanding

An MCP server that uses Google's Gemini API to analyze videos and convert them to text descriptions that Claude Code can understand and act upon.

Category
访问服务器

README

MCP Gemini Video Understanding

An MCP (Model Context Protocol) server that uses Google's Gemini API to analyze videos and convert them to text descriptions that Claude Code can understand and act upon.

What is this?

This MCP server acts as a bridge between video content and Claude Code. When you have a video (screen recording, Loom video, YouTube tutorial, etc.), this server uses Gemini's powerful video understanding capabilities to extract meaningful text descriptions that Claude Code can then use to write code, fix bugs, or implement features.

Use Cases

  1. Bug Reproduction Videos: Record a video showing a bug → Get detailed steps to reproduce and debugging insights
  2. Design Mockups: Show a design in a video → Get implementation guidance with UI component breakdowns
  3. YouTube Tutorials: Share a tutorial URL → Extract key learnings and implementation steps
  4. Responsive Issues: Record layout problems → Get specific CSS fixes and responsive solutions

Installation

npm install -g @ugarchance/mcp-gemini-video-understanding

Or use directly with npx:

npx @ugarchance/mcp-gemini-video-understanding

Setup

1. Get a Gemini API Key

  1. Go to Google AI Studio
  2. Click "Get API Key"
  3. Create or select a project
  4. Copy your API key

2. Set Environment Variable

export GEMINI_API_KEY="your-api-key-here"

3. Configure Claude Code

Add to your claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "gemini-video": {
      "command": "npx",
      "args": [
        "-y",
        "@ugarchance/mcp-gemini-video-understanding"
      ],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "gemini-video": {
      "command": "mcp-gemini-video",
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Usage

All tools support these common parameters:

  • model (string, optional): Gemini model to use. Options:
    • gemini-2.5-pro - Most capable, best for complex analysis
    • gemini-2.5-flash - Default, balanced speed and quality
    • gemini-2.5-flash-lite - Fastest, lighter analysis
    • gemini-2.0-flash - Previous generation fast model
    • gemini-2.0-flash-exp - Experimental features
  • output_file (string, optional): Path to save analysis. If file exists, cached result is used (no re-analysis!)

Tool 1: analyze_bug_video

Analyze a video showing a bug or error.

Parameters:

  • video_path (string): Path to video file or YouTube URL
  • is_youtube (boolean, optional): Set to true if using YouTube URL
  • additional_context (string, optional): Extra context about the bug
  • model (string, optional): Gemini model to use
  • output_file (string, optional): Path to save analysis

Example with Claude Code:

I have a bug video at /Users/me/Desktop/bug-demo.mp4
Save the analysis to bug-analysis.md and fix the issue.

With model selection:

Analyze /Users/me/Desktop/complex-bug.mp4 using gemini-2.5-pro
Save to analysis.txt and help me fix it.

Tool 2: analyze_design_video

Analyze a video showing a design mockup or feature demonstration.

Parameters:

  • video_path (string): Path to video file or YouTube URL
  • is_youtube (boolean, optional): Set to true if using YouTube URL
  • tech_stack (string, optional): Technologies to use (e.g., "React with Tailwind")
  • model (string, optional): Gemini model to use
  • output_file (string, optional): Path to save analysis

Example with Claude Code:

I recorded a design mockup at /Users/me/Desktop/new-feature.mp4
Save analysis to design-spec.md then implement using React and Tailwind CSS.

Tool 3: analyze_tutorial_video

Analyze a YouTube tutorial to extract key learnings.

Parameters:

  • video_url (string): YouTube URL
  • focus_area (string, optional): Specific topic to focus on
  • model (string, optional): Gemini model to use
  • output_file (string, optional): Path to save analysis

Example with Claude Code:

Watch this tutorial: https://www.youtube.com/watch?v=xxxxx
Save the learnings to tutorial-notes.md then implement the auth system.

Using faster model for quick summaries:

Analyze https://www.youtube.com/watch?v=xxxxx with gemini-2.5-flash-lite
Just give me the key points.

Tool 4: analyze_responsive_issues

Analyze a video showing responsive design problems.

Parameters:

  • video_path (string): Path to video file or YouTube URL
  • is_youtube (boolean, optional): Set to true if using YouTube URL
  • target_devices (string, optional): Target devices (e.g., "mobile, tablet")
  • model (string, optional): Gemini model to use
  • output_file (string, optional): Path to save analysis

Example with Claude Code:

I recorded responsive issues at /Users/me/Desktop/mobile-issues.mp4
Save analysis to responsive-fixes.md and fix the layout for mobile.

How It Works

  1. You record a video or find a YouTube URL
  2. You ask Claude Code to analyze it via MCP (optionally specifying model and output file)
  3. MCP Server checks if cached analysis exists (if output_file specified)
  4. If no cache: Sends video to Gemini API with chosen model
  5. Gemini analyzes video and returns detailed text description
  6. MCP Server saves result to file (if output_file specified)
  7. Claude Code receives the text and can now write/fix code based on it

Caching Strategy

When you specify an output_file:

  • First run: Video is analyzed and result is saved to the file
  • Subsequent runs: Cached file is read instantly (no API call, no cost!)
  • To re-analyze: Delete the output file first

This is perfect for:

  • Iterating on implementations without re-analyzing videos
  • Sharing analysis results with team members
  • Reducing API costs and latency

Supported Video Formats

  • MP4
  • MOV
  • AVI
  • WebM
  • MKV
  • FLV
  • WMV
  • 3GP
  • MPEG

Available Models

Model Speed Quality Best For Cost
gemini-2.5-pro Slow Highest Complex bugs, detailed designs $$$
gemini-2.5-flash Fast High General use (default) $$
gemini-2.5-flash-lite Fastest Good Quick summaries, simple videos $
gemini-2.0-flash Fast Good Previous gen, reliable $$
gemini-2.0-flash-exp Fast Varies Experimental features $$

Limitations

  • YouTube: Only public videos (not private or unlisted)
  • File Size: Files >20MB automatically use Gemini's File API (may take longer to process)
  • Video Length: Longer videos take more time to process
  • Rate Limits: Subject to Gemini API rate limits
  • Caching: Only works when output_file is specified

Development

Local Development

# Clone the repo
git clone https://github.com/ugarchance/mcp-gemini-video-understanding
cd mcp-gemini-video-understanding

# Install dependencies
npm install

# Build
npm run build

# Test locally with Claude Code
# Add to claude_desktop_config.json:
{
  "mcpServers": {
    "gemini-video": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-gemini-video-understanding/build/index.js"],
      "env": {
        "GEMINI_API_KEY": "your-key"
      }
    }
  }
}

Publishing to npm

# Update package.json with your npm username
npm login
npm publish

Troubleshooting

"GEMINI_API_KEY environment variable is required"

Make sure you've set the GEMINI_API_KEY in your claude_desktop_config.json under the env section.

"Error analyzing video"

  • Check that the video file path is absolute (not relative)
  • Verify the video format is supported
  • For YouTube videos, ensure the URL is valid and the video is public
  • Check Gemini API quotas and rate limits

Tools not showing in Claude Code

  1. Restart Claude Code completely (Cmd+Q on Mac, not just close window)
  2. Check claude_desktop_config.json syntax is valid JSON
  3. Look at Claude Code logs: ~/Library/Logs/Claude/mcp*.log (macOS)

License

MIT

Contributing

Contributions welcome! Please open an issue or PR.

Credits

Built with:

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选