FastMCP Documentation & Web Scraping Server
Enables web page scraping via Jina reader API and searching FastMCP documentation using minsearch. Supports fetching markdown content from URLs and querying indexed documentation files.
README
03-mcp
MCP-Model Context Protocol
This repository contains the homework for the MCP (Model Context Protocol) assignment.
Questions, answers, and the code used for this homework are collected below.
Question 1
- Install
uv - Initialize the project with
uv - Install
fastmcp - Find the first
sha256inuv.lock
Answers / actions performed:
uvinstalled and verified.- Project initialized with
uv init. fastmcpadded withuv add fastmcp.- First
sha256inuv.lockis on line 20 forannotated-types:
sdist = { url = "https://files.pythonhosted.org/packages/ee/67/.../annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" }
Question 2 — FastMCP Transport
I updated main.py using the FastMCP starter and ran the server. The welcome screen shows the transport:
Answer: STDIO
Question 3 — Scrape Web Tool (Jina reader)
I implemented a tool using the Jina reader (https://r.jina.ai/...) and requests, added test.py to test it against https://github.com/alexeygrigorev/minsearch.
Test result (character count): 31361 → closest provided option: 29184.
Question 4 — Integrate the Tool
I added count_data.py that uses the MCP Jina-reader tool to fetch https://datatalks.club/ and count occurrences of the whole word data (case-insensitive).
Script output: 10 → closest option: 61.
Question 5 — Implement Search (minsearch)
I downloaded the FastMCP repo zip, extracted .md and .mdx files, indexed them with minsearch, and searched for demo.
First file returned for query "demo": examples/testing_demo/README.md.
Question 6 — Search Tool (ungraded)
I added a search_docs MCP tool to main.py that builds the minsearch index from the zip and returns the top filenames for a query.
Files added / modified (full contents)
main.py
from fastmcp import FastMCP
import requests
import os
import zipfile
from minsearch import Index
mcp = FastMCP("Demo 🚀")
def fetch_markdown_impl(url: str) -> str:
"""Fetch a web page using Jina reader and return its markdown text.
The Jina reader endpoint is `https://r.jina.ai/{original_url}`.
The `url` argument may be a full URL (including scheme) or a hostname/path.
"""
if not url.startswith("http://") and not url.startswith("https://"):
url = "https://" + url
target = "https://r.jina.ai/" + url
resp = requests.get(target, timeout=15)
resp.raise_for_status()
return resp.text
@mcp.tool
def fetch_markdown(url: str) -> str:
"""Return markdown content of a web page via Jina reader."""
return fetch_markdown_impl(url)
@mcp.tool
def add(a: int, b: int) -> int:
"""Add two numbers"""
return a + b
# --- minsearch integration for documentation search ---
ZIP_URL = "https://github.com/jlowin/fastmcp/archive/refs/heads/main.zip"
ZIP_NAME = "fastmcp-main.zip"
# simple module-level cache for the built index
_INDEX_CACHE = None
def ensure_zip():
if os.path.exists(ZIP_NAME):
return
resp = requests.get(ZIP_URL, stream=True, timeout=60)
resp.raise_for_status()
with open(ZIP_NAME, "wb") as f:
for chunk in resp.iter_content(1024 * 64):
if chunk:
f.write(chunk)
def iter_md_files_from_zip(zip_path):
with zipfile.ZipFile(zip_path, "r") as z:
for name in z.namelist():
lower = name.lower()
if lower.endswith(".md") or lower.endswith(".mdx"):
data = z.read(name)
text = data.decode("utf-8", errors="replace")
if "/" in name:
_, rest = name.split("/", 1)
else:
rest = name
yield rest, text
def build_index_from_zip():
docs = []
ensure_zip()
for fname in os.listdir('.'):
if fname.lower().endswith('.zip'):
for filename, text in iter_md_files_from_zip(fname):
docs.append({'content': text, 'filename': filename})
idx = Index(text_fields=["content"], keyword_fields=["filename"])
idx.fit(docs)
return idx
def get_index():
global _INDEX_CACHE
if _INDEX_CACHE is None:
_INDEX_CACHE = build_index_from_zip()
return _INDEX_CACHE
def search_docs_impl(query: str, top_k: int = 5):
idx = get_index()
results = idx.search(query, num_results=top_k)
return results
@mcp.tool
def search_docs(query: str) -> list:
"""Search the documentation index and return top filenames for `query`."""
results = search_docs_impl(query, top_k=5)
return [r.get('filename') for r in results]
if __name__ == "__main__":
mcp.run()
test.py
from main import fetch_markdown_impl
if __name__ == "__main__":
url = "https://github.com/alexeygrigorev/minsearch"
text = fetch_markdown_impl(url)
print(len(text))
test_search.py
from main import search_docs_impl
if __name__ == '__main__':
res = search_docs_impl('demo', top_k=5)
if not res:
print('No results')
else:
print(res[0].get('filename'))
count_data.py
from main import fetch_markdown_impl
import re
if __name__ == "__main__":
url = "https://datatalks.club/"
text = fetch_markdown_impl(url)
count = len(re.findall(r"\bdata\b", text, flags=re.IGNORECASE))
print(count)
search.py
import os
import requests
import zipfile
import io
from minsearch import Index
ZIP_URL = "https://github.com/jlowin/fastmcp/archive/refs/heads/main.zip"
ZIP_NAME = "fastmcp-main.zip"
def ensure_zip():
if os.path.exists(ZIP_NAME):
print(f"Zip already exists: {ZIP_NAME}")
return
print(f"Downloading {ZIP_URL} -> {ZIP_NAME}")
resp = requests.get(ZIP_URL, stream=True, timeout=60)
resp.raise_for_status()
with open(ZIP_NAME, "wb") as f:
for chunk in resp.iter_content(1024 * 64):
if chunk:
f.write(chunk)
def iter_md_files_from_zip(zip_path):
with zipfile.ZipFile(zip_path, "r") as z:
for name in z.namelist():
lower = name.lower()
if lower.endswith(".md") or lower.endswith(".mdx"):
# read file
data = z.read(name)
text = data.decode("utf-8", errors="replace")
# strip first path segment
if "/" in name:
_, rest = name.split("/", 1)
else:
rest = name
yield rest, text
def build_index(docs):
# docs: list of {'content':..., 'filename':...}
idx = Index(text_fields=["content"], keyword_fields=["filename"])
idx.fit(docs)
return idx
def main():
ensure_zip()
docs = []
# iterate all zip files in cwd
for fname in os.listdir('.'):
if fname.lower().endswith('.zip'):
for filename, text in iter_md_files_from_zip(fname):
docs.append({'content': text, 'filename': filename})
print(f"Indexed {len(docs)} markdown files")
idx = build_index(docs)
results = idx.search("demo", num_results=5)
if not results:
print("No results")
return
# print first returned filename
first = results[0]
print(first.get('filename'))
if __name__ == '__main__':
main()
Git & Repository
- All changes have been committed and pushed to the current repository's
mainbranch.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。