Web Research Assistant
Comprehensive web research toolkit with 13 tools for searching (via SearXNG), crawling, package discovery, GitHub metrics, error translation, API documentation lookup, data extraction, technology comparison, and service status checking.
README
Web Research Assistant MCP Server
Comprehensive Model Context Protocol (MCP) server that provides web research and discovery capabilities.
Includes 13 tools for searching, crawling, and analyzing web content, powered by your local Docker SearXNG
instance, the crawl4ai project, and Pixabay API:
web_search— federated search across multiple engines via SearXNGsearch_examples— find code examples, tutorials, and articles (defaults to recent content)search_images— find high-quality stock photos, illustrations, and vectors via Pixabaycrawl_url— full page content extraction with advanced crawlingpackage_info— detailed package metadata from npm, PyPI, crates.io, Gopackage_search— discover packages by keywords and functionalitygithub_repo— repository health metrics and development activitytranslate_error— find solutions for error messages and stack traces from Stack Overflow (auto-detects CORS, fetch, and web errors)api_docs— auto-discover and crawl official API documentation with examples (works for any API - no hardcoded URLs)extract_data— extract structured data (tables, lists, fields, JSON-LD) from web pages with automatic detectioncompare_tech— compare technologies side-by-side with NPM downloads, GitHub stars, and aspect analysis (React vs Vue, PostgreSQL vs MongoDB, etc.)get_changelog— NEW! Get release notes and changelogs with breaking change detection (upgrade safely from version X to Y)check_service_status— NEW! Instant health checks for 25+ services (Stripe, AWS, GitHub, OpenAI, etc.) - "Is it down or just me?"
All tools feature comprehensive error handling, response size limits, usage tracking, and clear documentation for optimal AI agent integration.
Quick Start
-
Set up SearXNG (5 minutes):
# Using Docker (recommended) docker run -d -p 2288:8080 searxng/searxng:latestThen configure search engines - see SEARXNG_SETUP.md for optimized settings.
-
Install the MCP server:
uvx web-research-assistant # or: pip install web-research-assistant -
Configure Claude Desktop - add to
claude_desktop_config.json:{ "mcpServers": { "web-research-assistant": { "command": "uvx", "args": ["web-research-assistant"] } } } -
Restart Claude Desktop and start researching!
⚠️ For best results: Configure SearXNG with GitHub, Stack Overflow, and other code-focused search engines. See SEARXNG_SETUP.md for the recommended configuration.
Prerequisites
Required
- Python 3.10+
- A running SearXNG instance on
http://localhost:2288- 📖 See SEARXNG_SETUP.md for complete Docker setup guide
- ⚠️ IMPORTANT: For best results, enable these search engines in SearXNG:
- GitHub, Stack Overflow, GitLab (for code search - critical!)
- DuckDuckGo, Brave (for web search)
- MDN, Wikipedia (for documentation)
- Reddit, HackerNews (for tutorials and discussions)
- See SEARXNG_SETUP.md for the full optimized configuration
Optional
- Pixabay API key for image search - Get free key
- Playwright browsers for advanced crawling (auto-installed with
crawl4ai-setup)
Developer Setup (if running from source)
uv tool install uv # if you do not already have uv
uv sync # creates the virtual environment
uv run crawl4ai-setup # installs Chromium for crawl4ai
You can also use
pip install -r requirements.txtif you prefer pip over uv.
Installation
Option 1: Using uvx (Recommended - No installation needed!)
uvx web-research-assistant
This runs the server directly from PyPI without installing it globally.
Option 2: Install with pip
pip install web-research-assistant
web-research-assistant
Option 3: Install with uv
uv tool install web-research-assistant
web-research-assistant
By default the server communicates over stdio, which makes it easy to wire into Claude Desktop or any other MCP host.
MCP Client Configuration
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
Option 1: Using uvx (Recommended - No installation needed!)
{
"mcpServers": {
"web-research-assistant": {
"command": "uvx",
"args": ["web-research-assistant"]
}
}
}
Option 2: Using installed package
{
"mcpServers": {
"web-research-assistant": {
"command": "web-research-assistant"
}
}
}
OpenCode
Add to ~/.config/opencode/opencode.json:
Using uvx (Recommended)
{
"mcp": {
"web-research-assistant": {
"type": "local",
"command": ["uvx", "web-research-assistant"],
"enabled": true
}
}
}
Using installed package
{
"mcp": {
"web-research-assistant": {
"type": "local",
"command": ["web-research-assistant"],
"enabled": true
}
}
}
Development (Running from source)
For Claude Desktop:
{
"mcpServers": {
"web-research-assistant": {
"command": "uv",
"args": [
"--directory",
"/ABSOLUTE/PATH/TO/web-research-assistant",
"run",
"web-research-assistant"
]
}
}
}
For OpenCode:
{
"mcp": {
"web-research-assistant": {
"type": "local",
"command": [
"uv",
"--directory",
"/ABSOLUTE/PATH/TO/web-research-assistant",
"run",
"web-research-assistant"
],
"enabled": true
}
}
}
Restart your MCP client afterwards. The MCP tools will be available immediately.
Tool behavior
| Tool | When to use | Arguments |
|---|---|---|
web_search |
Use first to gather recent information and URLs from SearXNG. Returns 1–10 ranked snippets with clickable URLs. | query (required), reasoning (required), optional category (defaults to general), and max_results (defaults to 5). |
search_examples |
Find code examples, tutorials, and technical articles. Optimized for technical content with optional time filtering. Perfect for learning APIs or finding usage patterns. | query (required, e.g., "Python async examples"), reasoning (required), content_type (code/articles/both, defaults to both), time_range (day/week/month/year/all, defaults to all), optional max_results (defaults to 5). |
search_images |
Find high-quality royalty-free stock images from Pixabay. Returns photos, illustrations, or vectors. Requires PIXABAY_API_KEY environment variable. |
query (required, e.g., "mountain landscape"), reasoning (required), image_type (all/photo/illustration/vector, defaults to all), orientation (all/horizontal/vertical, defaults to all), optional max_results (defaults to 10). |
crawl_url |
Call immediately after search when you need the actual article body for quoting, summarizing, or extracting data. | url (required), reasoning (required), optional max_chars (defaults to 8000 characters). |
package_info |
Look up specific npm, PyPI, crates.io, or Go package metadata including version, downloads, license, and dependencies. Use when you know the package name. | name (required package name), reasoning (required), registry (npm/pypi/crates/go, defaults to npm). |
package_search |
Search for packages by keywords or functionality (e.g., "web framework", "json parser"). Use when you need to find packages that solve a specific problem. | query (required search terms), reasoning (required), registry (npm/pypi/crates/go, defaults to npm), optional max_results (defaults to 5). |
github_repo |
Get GitHub repository health metrics including stars, forks, issues, recent commits, and project details. Use when evaluating open source projects. | repo (required, owner/repo or full URL), reasoning (required), optional include_commits (defaults to true). |
translate_error |
Find Stack Overflow solutions for error messages and stack traces. Auto-detects language/framework, extracts key terms (CORS, map, undefined, etc.), filters irrelevant results, and prioritizes Stack Overflow solutions. Handles web-specific errors (CORS, fetch). | error_message (required stack trace or error text), reasoning (required), optional language (auto-detected), optional framework (auto-detected), optional max_results (defaults to 5). |
api_docs |
Auto-discover and crawl official API documentation. Dynamically finds docs URLs using patterns (docs.{api}.com, {api}.com/docs, etc.), searches for specific topics, crawls pages, and extracts overview, parameters, examples, and related links. Works for ANY API - no hardcoded URLs. Perfect for API integration and learning. | api_name (required, e.g., "stripe", "react"), topic (required, e.g., "create customer", "hooks"), reasoning (required), optional max_results (defaults to 2 pages). |
extract_data |
Extract structured data from HTML pages. Supports tables, lists, fields (via CSS selectors), JSON-LD, and auto-detection. Returns clean JSON output. More efficient than parsing full page text. Perfect for scraping pricing tables, package specs, release notes, or any structured content. | url (required), reasoning (required), extract_type (table/list/fields/json-ld/auto, defaults to auto), optional selectors (CSS selectors for fields mode), optional max_items (defaults to 100). |
compare_tech |
Compare 2-5 technologies side-by-side. Auto-detects category (framework/database/language) and gathers data from NPM, GitHub, and web search. Returns structured comparison with popularity metrics (downloads, stars), performance insights, and best-use summaries. Fast parallel processing (3-4s). | technologies (required list of 2-5 names), reasoning (required), optional category (auto-detects if not provided), optional aspects (auto-selected by category), optional max_results_per_tech (defaults to 3). |
get_changelog |
NEW! Get release notes and changelogs for package upgrades. Fetches GitHub releases, highlights breaking changes, and provides upgrade recommendations. Answers "What changed in version X → Y?" and "Are there breaking changes?" Perfect for planning dependency updates. | package (required name), reasoning (required), optional registry (npm/pypi/auto, defaults to auto), optional max_releases (defaults to 5). |
check_service_status |
NEW! Instantly check if external services are experiencing issues. Covers 25+ popular services (Stripe, AWS, GitHub, OpenAI, Vercel, etc.). Returns operational status, current incidents, and component health. Critical for production debugging - know immediately if the issue is external. Response time < 2s. | service (required name, e.g., "stripe", "aws"), reasoning (required). |
Results are automatically trimmed (default 8 KB) so they stay well within MCP response expectations. If truncation happens, the text ends with a note reminding the model that more detail is available on request.
Configuration
Environment variables let you adapt the server without touching code:
| Variable | Default | Description |
|---|---|---|
SEARXNG_BASE_URL |
http://localhost:2288/search |
Endpoint queried by web_search. |
SEARXNG_DEFAULT_CATEGORY |
general |
Category used when none is provided. |
SEARXNG_DEFAULT_RESULTS |
5 |
Default number of search hits. |
SEARXNG_MAX_RESULTS |
10 |
Hard cap on hits per request. |
SEARXNG_CRAWL_MAX_CHARS |
8000 |
Default character budget for crawl_url. |
MCP_MAX_RESPONSE_CHARS |
8000 |
Overall response limit applied to every tool reply. |
SEARXNG_MCP_USER_AGENT |
web-research-assistant/0.1 |
User-Agent header for outward HTTP calls. |
PIXABAY_API_KEY |
(empty) | API key for Pixabay image search. Get free key at pixabay.com/api/docs. |
MCP_USAGE_LOG |
~/.config/web-research-assistant/usage.json |
Location for usage analytics data. |
Development
The codebase is intentionally modular and organized:
web-research-assistant/
├── src/searxng_mcp/ # Source code
│ ├── config.py # Configuration and environment
│ ├── search.py # SearXNG integration
│ ├── crawler.py # Crawl4AI wrapper
│ ├── images.py # Pixabay client
│ ├── registry.py # Package registries (npm, PyPI, crates, Go)
│ ├── github.py # GitHub API client
│ ├── errors.py # Error parser (language/framework detection)
│ ├── api_docs.py # API docs discovery (NO hardcoded URLs)
│ ├── tracking.py # Usage analytics
│ └── server.py # MCP server + 9 tools
├── docs/ # Documentation (27 files)
└── [config files]
Each module is well under 400 lines, making the codebase easy to understand and extend.
Usage Analytics
All tools automatically track usage metrics including:
- Tool invocation counts and success rates
- Response times and performance trends
- Common use case patterns (via the
reasoningparameter) - Error frequencies and types
Analytics data is stored in ~/.config/web-research-assistant/usage.json and can be analyzed
to optimize tool usage and identify patterns. Each tool requires a reasoning parameter
that helps categorize why tools are being used, enabling better analytics and insights.
Note: As of the latest update, the reasoning parameter is required for all tools (previously optional with defaults). This ensures meaningful analytics data collection.
Documentation
Comprehensive documentation is available in the docs/ directory:
- Project Status - Current status, metrics, roadmap
- API Docs Implementation - NEW tool documentation
- Error Translator Design - Error translator details
- Tool Ideas Ranked - Prioritization and progress
- SearXNG Configuration - Recommended setup
- Quick Start Examples - Usage examples
See the docs README for a complete index.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。