MCP Gemini Video Understanding
An MCP server that uses Google's Gemini API to analyze videos and convert them to text descriptions that Claude Code can understand and act upon.
README
MCP Gemini Video Understanding
An MCP (Model Context Protocol) server that uses Google's Gemini API to analyze videos and convert them to text descriptions that Claude Code can understand and act upon.
What is this?
This MCP server acts as a bridge between video content and Claude Code. When you have a video (screen recording, Loom video, YouTube tutorial, etc.), this server uses Gemini's powerful video understanding capabilities to extract meaningful text descriptions that Claude Code can then use to write code, fix bugs, or implement features.
Use Cases
- Bug Reproduction Videos: Record a video showing a bug → Get detailed steps to reproduce and debugging insights
- Design Mockups: Show a design in a video → Get implementation guidance with UI component breakdowns
- YouTube Tutorials: Share a tutorial URL → Extract key learnings and implementation steps
- Responsive Issues: Record layout problems → Get specific CSS fixes and responsive solutions
Installation
npm install -g @ugarchance/mcp-gemini-video-understanding
Or use directly with npx:
npx @ugarchance/mcp-gemini-video-understanding
Setup
1. Get a Gemini API Key
- Go to Google AI Studio
- Click "Get API Key"
- Create or select a project
- Copy your API key
2. Set Environment Variable
export GEMINI_API_KEY="your-api-key-here"
3. Configure Claude Code
Add to your claude_desktop_config.json:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"gemini-video": {
"command": "npx",
"args": [
"-y",
"@ugarchance/mcp-gemini-video-understanding"
],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}
Or if installed globally:
{
"mcpServers": {
"gemini-video": {
"command": "mcp-gemini-video",
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}
Usage
All tools support these common parameters:
model(string, optional): Gemini model to use. Options:gemini-2.5-pro- Most capable, best for complex analysisgemini-2.5-flash- Default, balanced speed and qualitygemini-2.5-flash-lite- Fastest, lighter analysisgemini-2.0-flash- Previous generation fast modelgemini-2.0-flash-exp- Experimental features
output_file(string, optional): Path to save analysis. If file exists, cached result is used (no re-analysis!)
Tool 1: analyze_bug_video
Analyze a video showing a bug or error.
Parameters:
video_path(string): Path to video file or YouTube URLis_youtube(boolean, optional): Set totrueif using YouTube URLadditional_context(string, optional): Extra context about the bugmodel(string, optional): Gemini model to useoutput_file(string, optional): Path to save analysis
Example with Claude Code:
I have a bug video at /Users/me/Desktop/bug-demo.mp4
Save the analysis to bug-analysis.md and fix the issue.
With model selection:
Analyze /Users/me/Desktop/complex-bug.mp4 using gemini-2.5-pro
Save to analysis.txt and help me fix it.
Tool 2: analyze_design_video
Analyze a video showing a design mockup or feature demonstration.
Parameters:
video_path(string): Path to video file or YouTube URLis_youtube(boolean, optional): Set totrueif using YouTube URLtech_stack(string, optional): Technologies to use (e.g., "React with Tailwind")model(string, optional): Gemini model to useoutput_file(string, optional): Path to save analysis
Example with Claude Code:
I recorded a design mockup at /Users/me/Desktop/new-feature.mp4
Save analysis to design-spec.md then implement using React and Tailwind CSS.
Tool 3: analyze_tutorial_video
Analyze a YouTube tutorial to extract key learnings.
Parameters:
video_url(string): YouTube URLfocus_area(string, optional): Specific topic to focus onmodel(string, optional): Gemini model to useoutput_file(string, optional): Path to save analysis
Example with Claude Code:
Watch this tutorial: https://www.youtube.com/watch?v=xxxxx
Save the learnings to tutorial-notes.md then implement the auth system.
Using faster model for quick summaries:
Analyze https://www.youtube.com/watch?v=xxxxx with gemini-2.5-flash-lite
Just give me the key points.
Tool 4: analyze_responsive_issues
Analyze a video showing responsive design problems.
Parameters:
video_path(string): Path to video file or YouTube URLis_youtube(boolean, optional): Set totrueif using YouTube URLtarget_devices(string, optional): Target devices (e.g., "mobile, tablet")model(string, optional): Gemini model to useoutput_file(string, optional): Path to save analysis
Example with Claude Code:
I recorded responsive issues at /Users/me/Desktop/mobile-issues.mp4
Save analysis to responsive-fixes.md and fix the layout for mobile.
How It Works
- You record a video or find a YouTube URL
- You ask Claude Code to analyze it via MCP (optionally specifying model and output file)
- MCP Server checks if cached analysis exists (if
output_filespecified) - If no cache: Sends video to Gemini API with chosen model
- Gemini analyzes video and returns detailed text description
- MCP Server saves result to file (if
output_filespecified) - Claude Code receives the text and can now write/fix code based on it
Caching Strategy
When you specify an output_file:
- First run: Video is analyzed and result is saved to the file
- Subsequent runs: Cached file is read instantly (no API call, no cost!)
- To re-analyze: Delete the output file first
This is perfect for:
- Iterating on implementations without re-analyzing videos
- Sharing analysis results with team members
- Reducing API costs and latency
Supported Video Formats
- MP4
- MOV
- AVI
- WebM
- MKV
- FLV
- WMV
- 3GP
- MPEG
Available Models
| Model | Speed | Quality | Best For | Cost |
|---|---|---|---|---|
gemini-2.5-pro |
Slow | Highest | Complex bugs, detailed designs | $$$ |
gemini-2.5-flash |
Fast | High | General use (default) | $$ |
gemini-2.5-flash-lite |
Fastest | Good | Quick summaries, simple videos | $ |
gemini-2.0-flash |
Fast | Good | Previous gen, reliable | $$ |
gemini-2.0-flash-exp |
Fast | Varies | Experimental features | $$ |
Limitations
- YouTube: Only public videos (not private or unlisted)
- File Size: Files >20MB automatically use Gemini's File API (may take longer to process)
- Video Length: Longer videos take more time to process
- Rate Limits: Subject to Gemini API rate limits
- Caching: Only works when
output_fileis specified
Development
Local Development
# Clone the repo
git clone https://github.com/ugarchance/mcp-gemini-video-understanding
cd mcp-gemini-video-understanding
# Install dependencies
npm install
# Build
npm run build
# Test locally with Claude Code
# Add to claude_desktop_config.json:
{
"mcpServers": {
"gemini-video": {
"command": "node",
"args": ["/absolute/path/to/mcp-gemini-video-understanding/build/index.js"],
"env": {
"GEMINI_API_KEY": "your-key"
}
}
}
}
Publishing to npm
# Update package.json with your npm username
npm login
npm publish
Troubleshooting
"GEMINI_API_KEY environment variable is required"
Make sure you've set the GEMINI_API_KEY in your claude_desktop_config.json under the env section.
"Error analyzing video"
- Check that the video file path is absolute (not relative)
- Verify the video format is supported
- For YouTube videos, ensure the URL is valid and the video is public
- Check Gemini API quotas and rate limits
Tools not showing in Claude Code
- Restart Claude Code completely (Cmd+Q on Mac, not just close window)
- Check
claude_desktop_config.jsonsyntax is valid JSON - Look at Claude Code logs:
~/Library/Logs/Claude/mcp*.log(macOS)
License
MIT
Contributing
Contributions welcome! Please open an issue or PR.
Credits
Built with:
- Gemini API for video understanding
- Model Context Protocol for Claude integration
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。