YouTube Transcript MCP Server
There are a few ways to approach building an MCP (Microservices Communication Protocol) server for fetching YouTube transcripts. Here's a breakdown of the concepts and potential implementations: **Understanding the Requirements** * **YouTube Data API:** You'll need to use the YouTube Data API to retrieve transcript information. This API requires authentication (API key or OAuth 2.0). * **Transcript Retrieval:** The API provides different ways to get transcripts: * **Automatic Transcripts (ASR):** Generated by YouTube's automatic speech recognition. These are often less accurate. * **Community Contributions:** Transcripts provided by the YouTube community. * **Official Transcripts:** Transcripts uploaded by the video creator. * **MCP (Microservices Communication Protocol):** This defines how your server will communicate with other microservices in your architecture. Common choices include: * **REST (HTTP):** Simple, widely understood. Good for basic operations. * **gRPC:** High-performance, uses Protocol Buffers for data serialization. Excellent for complex data structures and demanding performance. * **Message Queues (e.g., RabbitMQ, Kafka):** Asynchronous communication. Useful for decoupling services and handling large volumes of requests. * **Scalability and Reliability:** Consider how your server will handle a large number of requests and potential failures. * **Error Handling:** Implement robust error handling to gracefully deal with API errors, network issues, and invalid requests. **Implementation Options** Here are a few implementation options, focusing on different MCP approaches: **1. REST (HTTP) based MCP Server (Python with Flask/FastAPI)** * **Language:** Python (popular for API development) * **Framework:** Flask (simple) or FastAPI (modern, asynchronous) ```python # FastAPI example from fastapi import FastAPI, HTTPException from youtube_transcript_api import YouTubeTranscriptApi app = FastAPI() @app.get("/transcript/{video_id}") async def get_transcript(video_id: str, lang: str = 'en'): """ Fetches the transcript for a YouTube video. Args: video_id: The YouTube video ID. lang: The desired language of the transcript (default: 'en'). Returns: A list of transcript entries (text, start, duration). """ try: transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=[lang]) return transcript except Exception as e: raise HTTPException(status_code=500, detail=str(e)) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` **Explanation:** * **`YouTubeTranscriptApi`:** This is a Python library that simplifies interacting with the YouTube transcript API. Install it with `pip install youtube-transcript-api`. * **`FastAPI`:** A modern, high-performance web framework for building APIs. * **`/transcript/{video_id}`:** An endpoint that accepts the YouTube video ID as a path parameter. * **`lang`:** An optional query parameter to specify the desired language. * **Error Handling:** The `try...except` block catches potential errors and returns an HTTP 500 error with a descriptive message. * **`uvicorn`:** An ASGI server to run the FastAPI application. **To use this:** 1. Install dependencies: `pip install fastapi uvicorn youtube-transcript-api` 2. Run the server: `python your_script_name.py` 3. Access the API: `http://localhost:8000/transcript/VIDEO_ID` (replace `VIDEO_ID` with the actual YouTube video ID). You can also specify the language: `http://localhost:8000/transcript/VIDEO_ID?lang=fr` **2. gRPC based MCP Server (Python with gRPC)** * **Language:** Python * **Framework:** gRPC **Steps:** 1. **Define the Protocol Buffer (.proto) file:** This defines the service and message structure. ```protobuf syntax = "proto3"; package youtube_transcript; service TranscriptService { rpc GetTranscript (TranscriptRequest) returns (TranscriptResponse) {} } message TranscriptRequest { string video_id = 1; string language = 2; } message TranscriptResponse { repeated TranscriptEntry entries = 1; } message TranscriptEntry { string text = 1; double start = 2; double duration = 3; } ``` 2. **Generate gRPC code:** Use the `grpc_tools.protoc` compiler to generate Python code from the `.proto` file. ```bash python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. youtube_transcript.proto ``` 3. **Implement the gRPC server:** ```python # youtube_transcript_server.py import grpc from concurrent import futures from youtube_transcript_api import YouTubeTranscriptApi import youtube_transcript_pb2 as youtube_transcript_pb2 import youtube_transcript_pb2_grpc as youtube_transcript_pb2_grpc class TranscriptServicer(youtube_transcript_pb2_grpc.TranscriptServiceServicer): def GetTranscript(self, request, context): try: transcript = YouTubeTranscriptApi.get_transcript(request.video_id, languages=[request.language]) entries = [] for entry in transcript: entries.append(youtube_transcript_pb2.TranscriptEntry( text=entry['text'], start=entry['start'], duration=entry['duration'] )) return youtube_transcript_pb2.TranscriptResponse(entries=entries) except Exception as e: context.abort(grpc.StatusCode.INTERNAL, str(e)) def serve(): server = grpc.server(futures.ThreadPoolExecutor(max_workers=10)) youtube_transcript_pb2_grpc.add_TranscriptServiceServicer_to_server(TranscriptServicer(), server) server.add_insecure_port('[::]:50051') server.start() server.wait_for_termination() if __name__ == '__main__': serve() ``` 4. **Implement the gRPC client (example):** ```python # youtube_transcript_client.py import grpc import youtube_transcript_pb2 as youtube_transcript_pb2 import youtube_transcript_pb2_grpc as youtube_transcript_pb2_grpc def get_transcript(video_id, language): with grpc.insecure_channel('localhost:50051') as channel: stub = youtube_transcript_pb2_grpc.TranscriptServiceStub(channel) request = youtube_transcript_pb2.TranscriptRequest(video_id=video_id, language=language) try: response = stub.GetTranscript(request) for entry in response.entries: print(f"[{entry.start:.2f} - {entry.start + entry.duration:.2f}] {entry.text}") except grpc.RpcError as e: print(f"Error: {e.details()}") if __name__ == '__main__': get_transcript("VIDEO_ID", "en") # Replace with a real video ID ``` **Explanation:** * **`.proto` file:** Defines the service (`TranscriptService`) and the messages (`TranscriptRequest`, `TranscriptResponse`, `TranscriptEntry`). * **`grpc_tools.protoc`:** Compiles the `.proto` file into Python code. * **`TranscriptServicer`:** Implements the `GetTranscript` method, which retrieves the transcript using `YouTubeTranscriptApi` and converts it into the gRPC response format. * **gRPC Client:** Connects to the server, sends a `TranscriptRequest`, and prints the received transcript entries. **To use this:** 1. Install dependencies: `pip install grpcio grpcio-tools protobuf youtube-transcript-api` 2. Compile the `.proto` file: `python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. youtube_transcript.proto` 3. Run the server: `python youtube_transcript_server.py` 4. Run the client: `python youtube_transcript_client.py` **3. Message Queue based MCP Server (Python with RabbitMQ/Kafka)** * **Language:** Python * **Message Queue:** RabbitMQ or Kafka **Conceptual Outline (RabbitMQ Example):** 1. **Producer (Client):** Sends a message to the queue with the video ID and language. 2. **Consumer (Server):** Listens to the queue, receives the message, fetches the transcript, and potentially publishes the transcript to another queue or stores it in a database. **RabbitMQ Example (Simplified):** * **Producer (Client):** ```python # producer.py import pika import json connection = pika.BlockingConnection(pika.ConnectionParameters('localhost')) channel = connection.channel() channel.queue_declare(queue='transcript_requests') message = {'video_id': 'VIDEO_ID', 'language': 'en'} # Replace with a real video ID channel.basic_publish(exchange='', routing_key='transcript_requests', body=json.dumps(message)) print(" [x] Sent %r" % message) connection.close() ``` * **Consumer (Server):** ```python # consumer.py import pika import json from youtube_transcript_api import YouTubeTranscriptApi connection = pika.BlockingConnection(pika.ConnectionParameters('localhost')) channel = connection.channel() channel.queue_declare(queue='transcript_requests') def callback(ch, method, properties, body): message = json.loads(body.decode('utf-8')) video_id = message['video_id'] language = message['language'] print(f" [x] Received request for video ID: {video_id}, language: {language}") try: transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=[language]) # Process the transcript (e.g., store in a database, publish to another queue) print(f" [x] Transcript fetched successfully for {video_id}") # Example: Print the first few lines for i in range(min(5, len(transcript))): print(transcript[i]) except Exception as e: print(f" [x] Error fetching transcript: {e}") channel.basic_consume(queue='transcript_requests', on_message_callback=callback, auto_ack=True) print(' [*] Waiting for messages. To exit press CTRL+C') channel.start_consuming() ``` **Explanation:** * **RabbitMQ:** A message broker that allows asynchronous communication between services. * **`transcript_requests` queue:** The queue where the client sends requests for transcripts. * **Producer:** Sends a JSON message containing the video ID and language to the queue. * **Consumer:** Listens to the queue, retrieves the message, fetches the transcript using `YouTubeTranscriptApi`, and processes the transcript. * **`auto_ack=True`:** Automatically acknowledges the message after it's processed. Consider using manual acknowledgements for more robust error handling. **To use this:** 1. Install RabbitMQ: Follow the instructions on the RabbitMQ website. 2. Install dependencies: `pip install pika youtube-transcript-api` 3. Run the consumer: `python consumer.py` 4. Run the producer: `python producer.py` **Key Considerations and Best Practices** * **API Key Management:** Store your YouTube Data API key securely (e.g., environment variables, secrets management). Never hardcode it in your code. * **Rate Limiting:** The YouTube Data API has rate limits. Implement retry logic with exponential backoff to handle rate limit errors gracefully. Consider caching transcripts to reduce API calls. * **Error Handling:** Implement comprehensive error handling to catch API errors, network issues, and invalid requests. Log errors for debugging. * **Asynchronous Operations:** For gRPC and message queue implementations, use asynchronous operations (e.g., `asyncio` in Python) to improve performance and scalability. * **Data Validation:** Validate the input (video ID, language) to prevent errors and security vulnerabilities. * **Logging:** Use a logging library (e.g., `logging` in Python) to log important events and errors. * **Monitoring:** Monitor the performance of your server (e.g., request latency, error rates) to identify and address issues. * **Security:** If your server handles sensitive data, implement appropriate security measures (e.g., authentication, authorization, encryption). * **Scalability:** Design your server to be scalable to handle a large number of requests. Consider using a load balancer and multiple instances of your server. * **Deployment:** Choose a suitable deployment environment (e.g., cloud platform, containerization with Docker). * **Caching:** Implement caching mechanisms (e.g., Redis, Memcached) to store frequently accessed transcripts and reduce the load on the YouTube Data API. Consider using a cache invalidation strategy. * **Transcript Availability:** Not all YouTube videos have transcripts available. Handle cases where a transcript is not found. * **Language Support:** The `YouTubeTranscriptApi` library supports multiple languages. Allow users to specify the desired language. * **Transcript Types:** Consider supporting different types of transcripts (automatic, community, official). The `YouTubeTranscriptApi` library provides methods to access different transcript types. **Choosing the Right Approach** * **REST (HTTP):** Good for simple use cases and when you need a widely accessible API. Easy to implement and debug. * **gRPC:** Best for high-performance communication between microservices. Requires more setup but offers significant performance benefits. * **Message Queue:** Ideal for asynchronous processing and decoupling services. Useful for handling large volumes of requests and ensuring that requests are processed even if one service is temporarily unavailable. The best approach depends on your specific requirements and the overall architecture of your microservices. Start with REST if you're unsure, and then consider gRPC or message queues if you need better performance or scalability. Remember to prioritize security, error handling, and rate limiting in all implementations.
PraveenKishore
README
YouTube 字幕 MCP 服务器
本项目实现了一个模型上下文协议 (MCP) 服务器,它提供了一个用于获取各种格式的 YouTube 视频字幕的工具。 通过利用 youtube-transcript-api
,该服务器允许大型语言模型 (LLM) 安全有效地访问 YouTube 字幕。
概述
该服务器公开了一个工具 fetch_youtube_transcript
,该工具根据提供的视频 ID、语言代码和所需格式检索 YouTube 视频的字幕。 此功能使 LLM 能够无缝地访问和处理 YouTube 视频字幕。
特性
- YouTube 字幕检索: 获取多种语言的 YouTube 视频字幕。
- 灵活的输出格式: 以纯文本或 JSON 格式获取字幕。
- MCP 集成: 旨在与 MCP 兼容的客户端和工具无缝协作。
使用 MCP 客户端进行配置
"mcpServers": {
"youtube-transcripts": {
"command": "uv",
"args": [
"--directory",
"/ABSOLUTE/PATH/TO/PARENT/FOLDER/mcp-transcripts/src",
"run",
"server.py"
]
}
}
设置
本项目使用 uv 进行包/项目管理。 要运行此项目,请按照以下设置说明进行操作。
- 如果尚未安装 uv,请安装它。 此处 是安装说明。
- 克隆存储库。
git clone https://github.com/PraveenKishore/mcp-server-youtube.git cd mcp-server-youtube
- 创建虚拟环境并安装依赖项。
uv sync
- 激活虚拟环境。
source .venv/bin/activate # 激活虚拟环境 (Linux/MacOS) # OR .\.venv\Scripts\activate # 激活虚拟环境 (Windows)
- 一切就绪!
测试 MCP 服务器
1. 仅测试 MCP 服务器
要启动 MCP 检查器,请运行以下命令:
mcp dev src/server.py
这将启动服务器,允许您在“工具”选项卡中查看公开的工具列表。 您还可以使用适当的输入调用这些工具中的任何一个。
2. 使用 Claude Desktop 进行测试
要使用 Claude Desktop 进行测试,请将 MCP 配置添加到 claude_desktop_config.json
文件。
有关更多详细信息,请参阅此链接。 配置完成后,您应该能够在 Claude Desktop 界面中直接调用该工具。
3. 使用 mcp-client-cli 进行测试
mcp-client-cli
是一个简单的命令行工具,用于运行 LLM 提示并实现模型上下文协议 (MCP) 客户端。
要使用此工具,请将 MCP 配置添加到 ~/.llm/config.json
。 有关进一步的设置说明,请查看官方设置指南。 配置完成后,您将能够在 mcp-client-cli
中调用该工具。
推荐服务器
Crypto Price & Market Analysis MCP Server
一个模型上下文协议 (MCP) 服务器,它使用 CoinCap API 提供全面的加密货币分析。该服务器通过一个易于使用的界面提供实时价格数据、市场分析和历史趋势。 (Alternative, slightly more formal and technical translation): 一个模型上下文协议 (MCP) 服务器,利用 CoinCap API 提供全面的加密货币分析服务。该服务器通过用户友好的界面,提供实时价格数据、市场分析以及历史趋势数据。
MCP PubMed Search
用于搜索 PubMed 的服务器(PubMed 是一个免费的在线数据库,用户可以在其中搜索生物医学和生命科学文献)。 我是在 MCP 发布当天创建的,但当时正在度假。 我看到有人在您的数据库中发布了类似的服务器,但还是决定发布我的服务器。
mixpanel
连接到您的 Mixpanel 数据。 从 Mixpanel 分析查询事件、留存和漏斗数据。

Sequential Thinking MCP Server
这个服务器通过将复杂问题分解为顺序步骤来促进结构化的问题解决,支持修订,并通过完整的 MCP 集成来实现多条解决方案路径。

Nefino MCP Server
为大型语言模型提供访问德国可再生能源项目新闻和信息的能力,允许按地点、主题(太阳能、风能、氢能)和日期范围进行筛选。
Vectorize
将 MCP 服务器向量化以实现高级检索、私有深度研究、Anything-to-Markdown 文件提取和文本分块。
Mathematica Documentation MCP server
一个服务器,通过 FastMCP 提供对 Mathematica 文档的访问,使用户能够从 Wolfram Mathematica 检索函数文档和列出软件包符号。
kb-mcp-server
一个 MCP 服务器,旨在实现便携性、本地化、简易性和便利性,以支持对 txtai “all in one” 嵌入数据库进行基于语义/图的检索。任何 tar.gz 格式的 txtai 嵌入数据库都可以被加载。
Research MCP Server
这个服务器用作 MCP 服务器,与 Notion 交互以检索和创建调查数据,并与 Claude Desktop Client 集成以进行和审查调查。

Cryo MCP Server
一个API服务器,实现了模型补全协议(MCP),用于Cryo区块链数据提取,允许用户通过任何兼容MCP的客户端查询以太坊区块链数据。