Link Scan MCP Server
Automatically scans and summarizes video links (YouTube, Instagram Reels) and text links (blogs, articles) using AI-powered transcription and summarization. Provides concise 3-sentence summaries without requiring API keys.
README
Link Scan MCP Server 🚀
링크를 스캔하고 요약을 제공하는 포괄적인 Model Context Protocol (MCP) 서버입니다. YouTube, Instagram Reels 등 비디오 링크와 블로그, 기사 등 텍스트 링크를 자동으로 감지하고 분석하여 3문장 이내의 간결한 요약을 제공합니다. API 키 없이 모든 기능을 사용할 수 있습니다!
Link Scan MCP Server - A comprehensive Model Context Protocol (MCP) server for scanning and summarizing links. Automatically detects and analyzes video links (YouTube, Instagram Reels) and text links (blogs, articles) to provide concise 3-sentence summaries. All features work without requiring API keys!
Python 3.11+ | MCP Compatible | License: MIT
✨ Features
🎥 Video Link Analysis
- YouTube Support
- Comprehensive metadata extraction (title, description)
- Subtitle extraction for first 7 seconds (yt-dlp)
- Audio transcription using OpenAI Whisper
- Integrated summarization combining all text sources
- Instagram Reels Support
- Audio download and transcription (first 7 seconds)
- Automatic content summarization
- Smart Link Detection
- Automatic video/text link type detection
- Error handling for unsupported URLs
📝 Text Link Analysis
- Web Content Extraction
- BeautifulSoup-based HTML parsing
- Main content area detection
- Automatic navigation/ad removal
- Intelligent Summarization
- Llama3-powered text summarization
- 3-sentence limit enforcement
- Natural Korean output
🤖 AI-Powered Summarization
- Llama3 Integration
- Local LLM via Ollama (no API keys required)
- Separate prompts for video and text content
- Fallback to original text on errors
- Whisper Transcription
- High-quality speech-to-text conversion
- Optimized for speed and accuracy
- Supports multiple languages
🐳 Docker Support
- One-Command Setup
- Docker Compose configuration
- Automatic Ollama service setup
- Llama3 model auto-download
- Development mode with hot reload
🔧 Developer-Friendly
- Type-safe with Pydantic models
- Async/await support for better performance
- Comprehensive error handling
- Extensible architecture
- Hot reload in development mode
🚀 Quick Start
Installation
# Clone the repository
git clone https://github.com/your-username/mcp-link-scan.git
cd mcp-link-scan
# Install dependencies
pip install -r requirements.txt
System Dependencies
ffmpeg (required for audio processing):
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get install ffmpeg - Windows: Download from https://ffmpeg.org/download.html
Ollama (required for summarization):
- macOS:
brew install ollamaor download from https://ollama.com/download - Linux:
curl -fsSL https://ollama.com/install.sh | sh - Windows: Download from https://ollama.com/download
- After installation:
ollama pull llama3:latest
Configuration
Create a .env file:
# 서버 설정
PORT=8000 # 서버 포트 (기본값: 8000)
HOST=0.0.0.0 # 서버 호스트 (기본값: 0.0.0.0)
DEBUG=False # 디버그 모드 (기본값: False)
# API 경로 prefix (선택)
# 같은 서버에 여러 MCP 서버를 호스팅할 때 사용
# 기본값: /link-scan
API_PREFIX=/link-scan
# Ollama 설정 (선택)
# Docker Compose를 사용하는 경우 자동으로 설정됨
OLLAMA_API_URL=http://localhost:11434 # Ollama API URL (기본값: http://localhost:11434)
OLLAMA_MODEL=llama3:latest # 사용할 Ollama 모델 (기본값: llama3)
환경 변수 설명
| 변수명 | 필수 | 기본값 | 설명 |
|---|---|---|---|
PORT |
❌ | 8000 |
서버가 사용할 포트 번호 |
HOST |
❌ | 0.0.0.0 |
서버가 바인딩할 호스트 주소 |
DEBUG |
❌ | False |
디버그 모드 활성화 (True/False) |
API_PREFIX |
❌ | /link-scan |
API 엔드포인트 경로 prefix |
OLLAMA_API_URL |
❌ | http://localhost:11434 |
Ollama API 서버 URL |
OLLAMA_MODEL |
❌ | llama3 |
사용할 Ollama 모델 이름 |
Running as MCP Server
Local Mode (stdio):
python -m src.server
Remote Mode (HTTP):
python run_server.py
Or with uvicorn directly:
uvicorn src.server_http:app --host 0.0.0.0 --port 8000
Docker Setup (Recommended)
Using Docker Compose:
# Start all services (link-scan + Ollama)
docker-compose up -d
# Check logs
docker-compose logs -f
# Stop services
docker-compose down
Docker Compose automatically:
- Sets up Ollama service with 8GB memory
- Downloads Llama3 model
- Configures link-scan service
- Enables development mode with hot reload
Development Mode:
The docker-compose.yml is configured for development with:
- Source code volume mounting
- Hot reload enabled (
DEBUG=True) - Automatic code changes detection
Testing with MCP Inspector
You can test the server using the MCP Inspector tool:
# Test with Python
npx @modelcontextprotocol/inspector python run_server.py
# Or test stdio mode
npx @modelcontextprotocol/inspector python -m src.server
The MCP Inspector provides a web interface to:
- View available tools and their schemas
- Test tool execution with sample inputs
- Debug server responses and error handling
- Validate MCP protocol compliance
🛠️ Available Tools
1. scan_video_link
Scan and summarize video links (YouTube, Instagram Reels, etc.).
Parameters:
url(string, required): Video URL to scan
Example:
{
"name": "scan_video_link",
"arguments": {
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}
}
Process:
- Detects link type (YouTube, Instagram, etc.)
- For YouTube: Extracts title, description, subtitles (first 7s)
- Downloads audio (first 7 seconds)
- Transcribes audio with Whisper
- Combines all text sources
- Summarizes with Llama3 (3 sentences max)
2. scan_text_link
Scan and summarize text links (blogs, articles, etc.).
Parameters:
url(string, required): Text URL to scan
Example:
{
"name": "scan_text_link",
"arguments": {
"url": "https://example.com/blog/article"
}
}
Process:
- Fetches HTML content
- Extracts main text content
- Removes navigation, ads, and noise
- Summarizes with Llama3 (3 sentences max)
📊 Example Outputs
Video Link Summary
Input: YouTube video URL
Output:
이 영상은 Python 프로그래밍 언어의 기본 개념을 소개합니다.
변수, 함수, 클래스 등 핵심 문법을 실습 예제와 함께 설명합니다.
초보자도 쉽게 따라할 수 있도록 단계별로 구성되어 있습니다.
Text Link Summary
Input: Blog article URL
Output:
이 글은 Docker 컨테이너 기술의 장단점을 분석합니다.
가상화 기술과 비교하여 리소스 효율성과 배포 편의성을 강점으로 제시합니다.
다만 보안과 복잡성 측면에서 주의가 필요하다고 조언합니다.
🏗️ Architecture
mcp-link-scan/
├── src/
│ ├── server.py # Local server (stdio)
│ ├── server_http.py # Remote server (HTTP)
│ ├── tools/ # MCP tools
│ │ ├── link_scanner.py # Main tool definitions
│ │ ├── media_handler.py # Video processing (Whisper)
│ │ └── text_handler.py # Text extraction
│ ├── utils/ # Utilities
│ │ ├── link_detector.py # Link type detection
│ │ ├── youtube_extractor.py # YouTube metadata/subtitles
│ │ └── llm_summarizer.py # Llama3 integration
│ └── prompts/ # LLM prompts
│ └── __init__.py # Video/text prompt templates
├── docker/
│ └── init-ollama.sh # Ollama initialization script
├── docker-compose.yml # Docker services
├── Dockerfile # Container build config
├── requirements.txt # Python dependencies
└── run_server.py # Server entry point
🔧 Development
Setting up Development Environment
# Clone and install
git clone https://github.com/your-username/mcp-link-scan.git
cd mcp-link-scan
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env with your settings
# Start Ollama (if not using Docker)
ollama serve
ollama pull llama3:latest
Development Mode with Docker
# Start in development mode (hot reload enabled)
docker-compose up -d
# View logs
docker-compose logs -f link-scan
# Code changes are automatically reloaded
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_link_scanner.py
Customizing Prompts
Edit src/prompts/__init__.py to customize LLM prompts:
# Video summarization prompt
VIDEO_SUMMARIZE_SYSTEM = """
Your custom system prompt here...
"""
# Text summarization prompt
TEXT_SUMMARIZE_SYSTEM = """
Your custom system prompt here...
"""
Configuring Whisper Model
Edit src/tools/media_handler.py:
# Change model size (tiny, base, small, medium, large)
_whisper_model = whisper.load_model("base") # Default: "base"
📋 Requirements
- Python 3.11+
- ffmpeg - Audio processing
- Ollama - LLM runtime (for summarization)
- yt-dlp - Video/audio download
- openai-whisper - Speech-to-text
- torch - PyTorch (for Whisper)
- aiohttp - Async HTTP client
- beautifulsoup4 - HTML parsing
- fastapi - HTTP server framework
- uvicorn - ASGI server
- mcp - Model Context Protocol SDK
🌐 Deployment
PlayMCP Registration
- Deploy Server: Deploy to cloud hosting (Render, Railway, Fly.io, AWS, GCP, etc.)
- Get Server URL: Example:
https://your-server.railway.app - Register in PlayMCP: Use URL
https://your-server.railway.app/messages
Important: Server URL must be publicly accessible and support HTTPS for production use.
Using with MCP Clients
Amazon Q CLI:
{
"mcpServers": {
"link-scan": {
"command": "python",
"args": ["run_server.py"],
"cwd": "/path/to/mcp-link-scan"
}
}
}
Other MCP Clients:
{
"mcpServers": {
"link-scan": {
"url": "https://your-server.com/messages"
}
}
}
🤝 Contributing
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Workflow
# Install in development mode
pip install -e .
# Run tests
pytest
# Format code (if using formatters)
black src/ tests/
isort src/ tests/
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- yt-dlp team for the excellent YouTube extraction library
- OpenAI Whisper team for the speech-to-text model
- Ollama team for the local LLM runtime
- MCP team for the Model Context Protocol specification
- Pydantic team for the data validation library
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
🗺️ Roadmap
- [ ] Batch processing for multiple links
- [ ] Caching layer for improved performance
- [ ] Export functionality (JSON, CSV, etc.)
- [ ] Advanced analytics (sentiment analysis, topic extraction)
- [ ] Support for more video platforms (TikTok, Vimeo, etc.)
- [ ] WebSocket support for real-time updates
- [ ] Integration examples with popular MCP clients
- [ ] Custom prompt templates via API
- [ ] Multi-language support for summaries
- [ ] Video thumbnail extraction
📝 Notes
- Audio downloads are temporarily stored and automatically cleaned up
- Whisper model is loaded once and reused for better performance
- Processing time depends on video length and Whisper model size
- YouTube videos are processed for first 7 seconds only to reduce processing time
- All text sources (title, description, subtitles, transcription) are combined for YouTube videos
- Summaries are limited to 3 sentences maximum
- For production, consider using GPU for faster Whisper conversion
- Ollama timeout is set to 5 minute for tool calls
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。