VidLens

VidLens

Enables AI agents to search, analyze, and extract insights from YouTube videos including transcripts, visual frames, and benchmarks without requiring API keys. Supports semantic search across playlists, sentiment analysis, and visual content indexing with automatic fallback chains for reliable access.

Category
访问服务器

README

<p align="center"> <img src="https://raw.githubusercontent.com/thatsrajan/vidlens-mcp/main/assets/readme-banner.png" alt="VidLens — YouTube as a queryable database for AI agents" width="800" /> </p>

<p align="center"> <a href="https://www.npmjs.com/package/vidlens-mcp"><img src="https://img.shields.io/npm/v/vidlens-mcp?style=flat-square&color=red" alt="npm" /></a> <a href="https://github.com/thatsrajan/vidlens-mcp/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="License" /></a> <a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-compatible-green?style=flat-square" alt="MCP" /></a> <img src="https://img.shields.io/badge/tools-41-orange?style=flat-square" alt="41 tools" /> <img src="https://img.shields.io/badge/zero--config-✓-brightgreen?style=flat-square" alt="Zero Config" /> </p>

<p align="center"> <a href="https://youtu.be/0BqrMKWIXkg"> <img src="https://img.shields.io/badge/▶%20Watch%20the%2060s%20demo-FF0000?style=for-the-badge&logo=youtube&logoColor=white" alt="Watch the 60s demo" /> </a> </p>

<p align="center"> <em>Most tools can read what was said in a video. VidLens can see what was shown.</em> </p>


🔍 What is VidLens?

Stop watching 10 videos to answer one question. VidLens searches YouTube, reads the transcripts, and synthesizes what creators actually said — across multiple videos, with timestamps, benchmark charts, and sources.

VidLens is a Model Context Protocol server that gives AI agents deep, reliable access to YouTube. Not just transcripts — full intelligence: search, analysis, visual search, and auto-generated comparison charts.

No API key required to start. Every tool has a three-tier fallback chain (YouTube API → yt-dlp → page extraction) so nothing breaks when quota runs out or keys aren't configured.

Try it — paste any of these into Claude:

"I'm thinking about buying the M5 Max MacBook Pro. Search YouTube for top tech reviewers and tell me what they're saying. Is it worth the upgrade from M3/M4?"

VidLens finds 10+ reviews, reads the transcripts, extracts benchmark scores, and presents comparison charts — all from one prompt.

"I want to understand how AI agents work. Search YouTube for the best videos for a beginner and summarize what I need to know."

Discovers videos across creators, ranks by learning value, and prepares transcripts for follow-up questions.

"Search YouTube for reviews comparing the iPhone 17 Pro vs Samsung S26 Ultra. What do reviewers agree on? Where do they disagree?"

Searches, reads transcripts from multiple reviewers, and synthesizes consensus vs disagreements with sources.


🎯 Core Capabilities

🔍 Explore — One Prompt, Full Pipeline

Ask a question about YouTube and VidLens does the rest: searches, ranks by creator match and freshness, reads transcripts, extracts benchmark data, and presents comparison charts automatically. Works for product research, learning, competitive analysis — anything on YouTube.

🔎 Semantic Search Across Playlists

Import entire playlists or video sets, index every transcript with Gemini embeddings, and search across hundreds of hours of content by meaning — not just keywords.

👁️ Visual Search — See What's In Videos

Extract keyframes, describe them with Gemini Vision, run OCR on slides and whiteboards, and search by what you see — not just what's said.

📊 Intelligence Layer — Not Just Data

Sentiment analysis, niche trend discovery, content gap detection, hook pattern analysis, upload timing recommendations. The LLM does the thinking — VidLens gives it the right data.

⚡ Zero Config, Always Works

No API key needed to start. Three-tier fallback chain on every tool. Nothing breaks when quota runs out. Keys are optional power-ups.

🎬 Full Media Pipeline

Download videos/audio/thumbnails. Extract keyframes. Index comments for semantic search. Build a local knowledge base from any YouTube content.


⚡ Why VidLens?

<table> <tr><th></th><th>VidLens</th><th>Other YouTube MCP servers</th></tr> <tr><td>🔑 <strong>Setup</strong></td><td>✅ Works immediately - no keys needed</td><td>❌ Most require YouTube API key upfront</td></tr> <tr><td>🛡️ <strong>Reliability</strong></td><td>✅ Three-tier fallback on every tool</td><td>❌ Single point of failure - API down = broken</td></tr> <tr><td>🧠 <strong>Intelligence</strong></td><td>✅ Sentiment, trends, content gaps, hooks</td><td>❌ Raw data dumps - you do the analysis</td></tr> <tr><td>📦 <strong>Token efficiency</strong></td><td>✅ 75-87% smaller responses</td><td>❌ Verbose JSON with thumbnails, etags, junk</td></tr> <tr><td>🔬 <strong>Depth</strong></td><td>✅ 41 tools across 10 modules</td><td>⚠️ 1-5 tools, mostly transcripts only</td></tr> <tr><td>🖼️ <strong>Visual evidence</strong></td><td>✅ Returns actual frame paths + timestamps, not just text hits</td><td>⚠️ Usually transcript-only or raw frame dumps</td></tr> <tr><td>⚖️ <strong>Trademark</strong></td><td>✅ Compliant naming</td><td>⚠️ Most violate YouTube trademark</td></tr> </table>


🚀 Quick Start

1. Install

npx vidlens-mcp setup

This auto-detects your MCP clients (Claude Desktop, Claude Code), downloads yt-dlp if needed, and configures everything. No manual setup required.

2. Or configure manually

Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "vidlens-mcp": {
      "command": "npx",
      "args": ["-y", "vidlens-mcp", "serve"]
    }
  }
}

Claude Code — add to ~/.claude/settings.json:

{
  "mcpServers": {
    "vidlens-mcp": {
      "command": "npx",
      "args": ["-y", "vidlens-mcp", "serve"]
    }
  }
}

3. Restart your MCP client

Fully quit and reopen Claude Desktop (⌘Q). Claude Code picks up changes automatically.

4. Try it

Start with "Search YouTube" to activate VidLens:

"Search YouTube for the top M5 Max MacBook Pro reviews and tell me if it's worth upgrading from M4."

"Search YouTube for the best videos about agentic AI for a beginner."

"Import this playlist and search across all videos for mentions of machine learning."

"Search this video's frames for the benchmark comparison chart."

"What's trending in the AI coding niche right now?"


🧰 Tools - 41 across 10 modules

🔍 Explore - YouTube Discovery & Research

The front door — one prompt, full pipeline

Tool What it does
exploreYouTube Intent-aware search with multi-query ranking, parallel enrichment, transcript summaries, structured benchmark data, and background indexing. One call replaces 5-8 individual tool calls.

📺 Core - Video & Channel Intelligence

Always available, no API key needed

Tool What it does
findVideos Search YouTube by query with metadata
inspectVideo Deep metadata - tags, engagement, language, category
inspectChannel Channel stats, description, recent uploads
listChannelCatalog Browse a channel's full video library
readTranscript Full transcript with timestamps and chapters
readComments Top comments with likes and engagement
expandPlaylist List all videos in any playlist

🔎 Knowledge Base - Semantic Search

Index transcripts and search across them with natural language

Tool What it does
importPlaylist Index an entire playlist's transcripts
importVideos Index specific videos by URL/ID
searchTranscripts Natural language search across indexed content
listCollections Browse your indexed collections
setActiveCollection Scope searches to one collection
clearActiveCollection Search across all collections
removeCollection Delete a collection and its index

💬 Sentiment & Analysis

Understand what audiences think and feel

Tool What it does
measureAudienceSentiment Comment sentiment with themes and risk signals
analyzeVideoSet Compare performance across multiple videos
analyzePlaylist Playlist-level engagement analytics
buildVideoDossier Complete single-video deep analysis

🎯 Creator Intelligence

Insights for content strategy

Tool What it does
scoreHookPatterns Analyze what makes video openings work
researchTagsAndTitles Tag and title optimization insights
compareShortsVsLong Short-form vs long-form performance
recommendUploadWindows Best times to publish for engagement

📈 Discovery & Trends

Find what's working in any niche

Tool What it does
discoverNicheTrends Momentum, saturation, content gaps in any topic
exploreNicheCompetitors Channel landscape and top performers

🎬 Media Assets

Download and manage video files locally

Tool What it does
downloadAsset Download video, audio, or thumbnails
listMediaAssets Browse stored media files
removeMediaAsset Clean up downloaded assets
extractKeyframes Extract key frames from videos
mediaStoreHealth Storage usage and diagnostics

🖼️ Visual Search

Three-layer visual intelligence. Not transcript reuse.

Tool What it does
indexVisualContent Extract frames, run Apple Vision OCR + feature prints, Gemini frame descriptions, and Gemini semantic embeddings
searchVisualContent Search visual frames using semantic embeddings + lexical matching. Returns actual image paths + timestamps as evidence
findSimilarFrames Image-to-image frame similarity using Apple Vision feature prints

Three layers, all real:

  1. Apple Vision feature prints — image-to-image similarity (find frames that look alike)
  2. Gemini 2.5 Flash frame descriptions — natural language scene understanding per frame
  3. Gemini semantic embeddings — 768-dim embedding retrieval over OCR + description text for true text→visual search

What you always get back: frame path on disk, timestamp, source video URL/title, match explanation, OCR text, visual description.

What is NOT happening: no transcript embeddings are reused for visual search. This is a separate visual index.

💭 Comment Knowledge Base

Index and semantically search YouTube comments

Tool What it does
importComments Index a video's comments for search
searchComments Natural language search over comment corpus
listCommentCollections Browse comment collections
setActiveCommentCollection Scope comment searches
clearActiveCommentCollection Search all comment collections
removeCommentCollection Delete a comment collection

🏥 Diagnostics

Health checks and pre-flight validation

Tool What it does
checkSystemHealth Full system diagnostic report
checkImportReadiness Validate before importing content

🔑 API Keys (Optional)

VidLens works without any API keys. Add them to unlock more capabilities:

Key What it unlocks Free? How to get it
YOUTUBE_API_KEY Better metadata, comment API, search via YouTube API ✅ Free tier (10,000 units/day) Google Cloud Console → APIs → Enable YouTube Data API v3 → Credentials → Create API Key
GEMINI_API_KEY Higher-quality embeddings for semantic search (768d vs 384d) ✅ Free tier Google AI Studio → Get API Key

⚠️ These are separate keys from separate Google services. A Gemini key will NOT work for YouTube API calls and vice versa. Create them independently.

# Configure via setup wizard
npx vidlens-mcp setup --youtube-api-key YOUR_YOUTUBE_KEY --gemini-api-key YOUR_GEMINI_KEY

# Or via environment variables
export YOUTUBE_API_KEY=your_youtube_key
export GEMINI_API_KEY=your_gemini_key

💻 CLI

npx vidlens-mcp               # Start MCP server (stdio)
npx vidlens-mcp serve         # Start MCP server (explicit)
npx vidlens-mcp setup         # Auto-configure Claude Desktop + Claude Code
npx vidlens-mcp doctor        # Run diagnostics
npx vidlens-mcp version       # Print version
npx vidlens-mcp help          # Usage guide

Doctor - diagnose issues

npx vidlens-mcp doctor --no-live

Checks: Node.js version, yt-dlp availability, API key validation, data directory health, MCP client registration (Claude Desktop, Claude Code).


📱 Works Everywhere — Desktop, Cowork, Phone

VidLens works across the full Claude ecosystem. Set it up once, use it everywhere.

Claude Desktop — Chat

The classic experience. Ask a question, get charts and analysis inline. Best for interactive research sessions.

Claude Desktop — Cowork Projects (March 2026)

Create a persistent research project with VidLens connected. Claude remembers context across sessions — last week's competitive research informs this week's analysis. Set up scheduled tasks that run automatically:

"Every Monday, search YouTube for new AI agent framework videos and compare to last week's findings."

Claude Dispatch — From Your Phone (March 2026)

Trigger any VidLens research from the Claude mobile app. Ask from your phone, Claude Desktop runs the tools locally, results come back to your pocket:

"Run my competitive research project — what new M5 Max content dropped this weekend?"

Claude Code — Remote Control

Start a Claude Code session with claude --remote-control, then continue from any browser or your phone at claude.ai/code. Full tool access, full context.

Note: Your Mac must be awake with Claude Desktop open for Cowork, Dispatch, and scheduled tasks to execute.


🏗️ Architecture

System Overview

<p align="center"> <img src="https://raw.githubusercontent.com/thatsrajan/vidlens-mcp/main/assets/arch-system-overview.png" alt="VidLens System Overview" width="800" /> </p>

How the Fallback Chain Works

Every tool that touches YouTube data uses the same resilience pattern:

<p align="center"> <img src="https://raw.githubusercontent.com/thatsrajan/vidlens-mcp/main/assets/arch-fallback-chain.png" alt="VidLens Fallback Chain" width="800" /> </p>

Every response includes a provenance field telling you exactly which tier served the data and whether anything was partial. No silent degradation — you always know what happened.

Visual Search Pipeline

Visual search is not transcript reuse. It's a dedicated three-layer index:

<p align="center"> <img src="https://raw.githubusercontent.com/thatsrajan/vidlens-mcp/main/assets/arch-visual-pipeline.png" alt="VidLens Visual Search Pipeline" width="800" /> </p>

Three layers, all real:

  1. Apple Vision feature prints — image-to-image similarity (find frames that look alike)
  2. Gemini Vision frame descriptions — natural language scene understanding per frame
  3. Gemini semantic embeddings — 768-dim retrieval over OCR + description text

Data Storage

Everything lives in a single directory. No external databases, no Docker, no infrastructure.

<p align="center"> <img src="https://raw.githubusercontent.com/thatsrajan/vidlens-mcp/main/assets/arch-data-storage.png" alt="VidLens Data Storage" width="600" /> </p>

One directory. Portable. Back it up by copying. Delete it to start fresh.


📋 Requirements

Requirement Status Notes
Node.js ≥ 22 Required Uses node:sqlitenode --version to check
yt-dlp Auto-installed Downloaded automatically during npx vidlens-mcp setup
ffmpeg Optional Needed for frame extraction and visual indexing
YouTube API key Optional Unlocks comments, better metadata
Gemini API key Optional Upgrades transcript embeddings and frame descriptions for visual search
macOS Apple Vision Automatic on macOS Powers native OCR and image similarity for visual search

🔧 Troubleshooting

"Tool not found" in Claude Desktop

Fully quit Claude Desktop (⌘Q, not just close window) and reopen. MCP servers only load on startup.

"YOUTUBE_API_KEY not configured" warning

This is informational, not an error. VidLens works without it. Add a key only if you need comments/sentiment features.

"API_KEY_SERVICE_BLOCKED" error

Your API key has restrictions. Create a new unrestricted key in Google Cloud Console, or remove the API restriction from the existing key.

Gemini key doesn't work for YouTube API

These are separate services. You need a YouTube API key from Google Cloud Console AND a Gemini key from Google AI Studio. They are not interchangeable.

Build errors

npx vidlens-mcp doctor     # Run diagnostics
npx vidlens-mcp doctor --no-live  # Skip network checks

📄 License

MIT


<p align="center"> <a href="https://github.com/thatsrajan/vidlens-mcp">GitHub</a> · <a href="https://www.npmjs.com/package/vidlens-mcp">npm</a> · <a href="https://modelcontextprotocol.io/">Model Context Protocol</a> </p>

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选