MCP Video Parser

MCP Video Parser

A video analysis system that uses AI vision models to process, analyze, and query video content through natural language, enabling users to search videos by time, location, and content.

Category
访问服务器

README

MCP Video Parser

A powerful video analysis system that uses the Model Context Protocol (MCP) to process, analyze, and query video content using AI vision models.

🎬 Features

  • AI-Powered Video Analysis: Automatically extracts and analyzes frames using vision LLMs (Llava)
  • Natural Language Queries: Search videos using conversational queries
  • Time-Based Search: Query videos by relative time ("last week") or specific dates
  • Location-Based Organization: Organize videos by location (shed, garage, etc.)
  • Audio Transcription: Extract and search through video transcripts
  • Chat Integration: Natural conversations with Mistral/Llama while maintaining video context
  • Scene Detection: Intelligent frame extraction based on visual changes
  • MCP Protocol: Standards-based integration with Claude and other MCP clients

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • Ollama installed and running
  • ffmpeg (for video processing)

Installation

  1. Clone the repository:
git clone https://github.com/michaelbaker-dev/mcpVideoParser.git
cd mcpVideoParser
  1. Install dependencies:
pip install -r requirements.txt
  1. Pull required Ollama models:
ollama pull llava:latest    # For vision analysis
ollama pull mistral:latest  # For chat interactions
  1. Start the MCP server:
python mcp_video_server.py --http --host localhost --port 8000

Basic Usage

  1. Process a video:
python process_new_video.py /path/to/video.mp4 --location garage
  1. Start the chat client:
python standalone_client/mcp_http_client.py --chat-llm mistral:latest
  1. Example queries:
  • "Show me the latest videos"
  • "What happened at the garage yesterday?"
  • "Find videos with cars"
  • "Give me a summary of all videos from last week"

🏗️ Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Video Files   │────▶│ Video Processor │────▶│ Frame Analysis  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                │                         │
                                ▼                         ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   MCP Server    │◀────│ Storage Manager │◀────│   Ollama LLM    │
└─────────────────┘     └─────────────────┘     └─────────────────┘
         │
         ▼
┌─────────────────┐
│   HTTP Client   │
└─────────────────┘

🛠️ Configuration

Edit config/default_config.json to customize:

  • Frame extraction rate: How many frames to analyze
  • Scene detection sensitivity: When to capture scene changes
  • Storage settings: Where to store videos and data
  • LLM models: Which models to use for vision and chat

See Configuration Guide for details.

🔧 MCP Tools

The server exposes these MCP tools:

  • process_video - Process and analyze a video file
  • query_location_time - Query videos by location and time
  • search_videos - Search video content and transcripts
  • get_video_summary - Get AI-generated summary of a video
  • ask_video - Ask questions about specific videos
  • analyze_moment - Analyze specific timestamp in a video
  • get_video_stats - Get system statistics
  • get_video_guide - Get usage instructions

🛠️ Utility Scripts

Video Cleanup

Clean all videos from the system and reset to a fresh state:

# Dry run to see what would be deleted
python clean_videos.py --dry-run

# Clean processed files and database (keeps originals)
python clean_videos.py

# Clean everything including original video files
python clean_videos.py --clean-originals

# Skip confirmation and backup
python clean_videos.py --yes --no-backup

This script will:

  • Remove all video entries from the database
  • Delete all processed frames and transcripts
  • Delete all videos from the location-based structure
  • Optionally delete original video files
  • Create a backup of the database before cleaning (unless --no-backup)

Video Processing

Process individual videos:

# Process a video with automatic location detection
python process_new_video.py /path/to/video.mp4

# Process with specific location
python process_new_video.py /path/to/video.mp4 --location garage

📖 Documentation

🚦 Development

Running Tests

# All tests
python -m pytest tests/ -v

# Unit tests only
python -m pytest tests/unit/ -v

# Integration tests (requires Ollama)
python -m pytest tests/integration/ -v

Project Structure

mcp-video-server/
├── src/
│   ├── llm/            # LLM client implementations
│   ├── processors/     # Video processing logic
│   ├── storage/        # Database and file management
│   ├── tools/          # MCP tool definitions
│   └── utils/          # Utilities and helpers
├── standalone_client/  # HTTP client implementation
├── config/            # Configuration files
├── tests/             # Test suite
└── video_data/        # Video storage (git-ignored)

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

📝 Roadmap

  • ✅ Basic video processing and analysis
  • ✅ MCP server implementation
  • ✅ Natural language queries
  • ✅ Chat integration with context
  • 🚧 Enhanced time parsing (see INTELLIGENT_QUERY_PLAN.md)
  • 🚧 Multi-camera support
  • 🚧 Real-time processing
  • 🚧 Web interface

🐛 Troubleshooting

Common Issues

  1. Ollama not running:
ollama serve  # Start Ollama
  1. Missing models:
ollama pull llava:latest
ollama pull mistral:latest
  1. Port already in use:
# Change port in command
python mcp_video_server.py --http --port 8001

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

  • Built on FastMCP framework
  • Uses Ollama for local LLM inference
  • Inspired by the Model Context Protocol specification

💬 Support


Version: 0.1.1
Author: Michael Baker
Status: Beta - Breaking changes possible

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选