MCP Web Scraper Server

MCP Web Scraper Server

An advanced web search and scraping server that enables AI models to perform targeted DuckDuckGo searches and extract clean content, tables, and metadata from webpages. It provides specialized tools for news discovery, link extraction, and comprehensive search-and-scrape workflows.

Category
访问服务器

README

🚀 MCP Web Scraper Server A production-ready MCP (Model Context Protocol) server for advanced web scraping and search, easily deployable on Railway.

✨ Features 🔍 Advanced Web Search - Search anything on the web using DuckDuckGo 🤖 Smart Search - Intelligent search with quick/standard/comprehensive modes 📰 News Search - Dedicated news article search with dates and sources 🎯 Search & Scrape - Automatically search and extract full content from results 📄 Article Extraction - Clean article content extraction (removes ads/navigation) 🔗 Link Extraction - Extract all links with regex filtering 📊 Table Extraction - Extract table data from webpages 📝 Metadata Extraction - Get page metadata and Open Graph tags 🚀 Easy Railway Deployment 💪 Production-ready 🛠️ Tools Available 🔍 Search Tools web_search - Search the web for anything (just give a query!) smart_search - Intelligent search with modes (quick/standard/comprehensive) search_and_scrape - Search + automatically scrape full content news_search - Search specifically for news articles 📄 Scraping Tools scrape_html - Scrape HTML content with optional CSS selectors extract_links - Extract all links with optional filtering extract_metadata - Get page metadata and Open Graph tags scrape_table - Extract table data from webpages extract_article - Clean article extraction (removes ads/navigation) 🚀 Quick Deploy to Railway Step 1: Create GitHub Repository bash

Clone or download this repository

git clone https://github.com/yourusername/mcp-web-scraper.git cd mcp-web-scraper

Or create new repository

mkdir mcp-web-scraper cd mcp-web-scraper

Copy all files here

Initialize git

git init git add . git commit -m "Initial commit: MCP Web Scraper Server" git branch -M main git remote add origin https://github.com/YOUR_USERNAME/mcp-web-scraper.git git push -u origin main Step 2: Deploy to Railway Go to railway.app Click "New Project" Select "Deploy from GitHub repo" Choose your repository Railway automatically detects Dockerfile and deploys! 🎉 Step 3: Get Your URL Click on your deployment in Railway Go to "Settings" → "Domains" Click "Generate Domain" Copy your URL (e.g., https://mcp-web-scraper-production.up.railway.app) Step 4: Test Your Server bash

Health check

curl https://your-app.up.railway.app/health

List available tools

curl https://your-app.up.railway.app/tools

Test web search

curl -X POST https://your-app.up.railway.app/call-tool
-H "Content-Type: application/json"
-d '{"name": "web_search", "arguments": {"query": "latest AI news"}}' 💻 Local Development bash

Clone repository

git clone https://github.com/yourusername/mcp-web-scraper.git cd mcp-web-scraper

Create virtual environment

python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Run server

uvicorn src.server:app --reload --port 8000 Visit http://localhost:8000 to see the server running!

🔌 Connect to Claude Desktop Add to your Claude Desktop config (claude_desktop_config.json):

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

json { "mcpServers": { "web-scraper": { "command": "npx", "args": [ "-y", "mcp-remote", "https://your-app.up.railway.app/sse" ] } } } Then restart Claude Desktop!

📋 Example Usage Search the Web bash curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{ "name": "web_search", "arguments": { "query": "best pizza recipe", "max_results": 5 } }' Smart Search (Comprehensive) bash curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{ "name": "smart_search", "arguments": { "query": "climate change solutions", "mode": "comprehensive" } }' Search and Scrape bash curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{ "name": "search_and_scrape", "arguments": { "query": "machine learning tutorials", "num_results": 3 } }' News Search bash curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{ "name": "news_search", "arguments": { "query": "technology", "max_results": 10 } }' Extract Article bash curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{ "name": "extract_article", "arguments": { "url": "https://example.com/article" } }' 🎯 Use Cases in Claude Once connected, you can ask Claude:

"Search for the best Italian restaurants in Rome" "Find me recent articles about quantum computing" "What's the latest news on AI developments?" "Research blockchain technology and give me detailed info" "Scrape the table from this webpage: [URL]" "Extract all links from example.com" 📁 Project Structure mcp-web-scraper/ ├── src/ │ ├── init.py # Package initialization │ ├── server.py # FastAPI server and MCP integration │ └── tools.py # Web scraping and search tools ├── requirements.txt # Python dependencies ├── Dockerfile # Docker configuration ├── railway.json # Railway deployment config ├── .gitignore # Git ignore file └── README.md # This file 🔧 Configuration Environment Variables (Optional) You can set these in Railway dashboard under "Variables":

LOG_LEVEL - Logging level (default: INFO) PORT - Server port (default: 8000) HOST - Server host (default: 0.0.0.0) 📊 Monitoring Railway provides built-in monitoring:

Metrics - CPU, Memory, Network usage Logs - Real-time application logs Deployments - Deployment history and rollbacks Access these in your Railway dashboard.

💰 Cost Railway Free Tier:

$5 free credit per month 500 hours of usage Perfect for personal use and testing For production use, consider upgrading to Railway Pro.

🔒 Security Notes ⚠️ This server is deployed without authentication for easy use. For production:

Consider adding API key authentication Implement rate limiting Restrict allowed domains Use environment variables for sensitive data 🐛 Troubleshooting Server not starting? Check Railway logs in dashboard Verify all files are committed to Git Ensure Dockerfile is in root directory Tools not working? Check tool names match exactly Verify JSON format in requests Check server logs for errors Can't connect to Claude? Verify Railway URL is correct Ensure /sse endpoint is accessible Restart Claude Desktop after config change 🤝 Contributing Contributions are welcome! Feel free to:

Report bugs Suggest new features Submit pull requests 📄 License MIT License - feel free to use and modify!

🙏 Acknowledgments Built with:

FastAPI - Web framework MCP - Model Context Protocol DuckDuckGo Search - Web search Trafilatura - Content extraction BeautifulSoup - HTML parsing Railway - Deployment platform 📞 Support GitHub Issues: Report a bug Railway Docs: docs.railway.app MCP Docs: modelcontextprotocol.io Made with ❤️ for the MCP community

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选