MetaSearchMCP
Aggregates search results from multiple providers (web, academic, code, finance) with a unified JSON schema, providing both an HTTP API and an MCP server for AI agent tooling.
README
MetaSearchMCP
Open-source metasearch backend for MCP, AI agents, and LLM workflows.
MetaSearchMCP aggregates results from multiple search providers, normalizes them into a stable JSON schema, and exposes both an HTTP API and an MCP server for agent tooling.
Positioning
- MCP-first metasearch backend
- Structured search API for AI pipelines
- Multi-provider search orchestration with deduplication and fallback
- Python FastAPI alternative to browser-first metasearch projects
Why It Exists
Most search aggregators are designed around browser UX: HTML pages, pagination, and interactive result cards. Agents and LLM workflows need a different contract: predictable JSON, stable field names, partial-failure tolerance, and provider-level execution metadata.
MetaSearchMCP is built for that machine-consumable workflow. The design is centered on search orchestration, normalized contracts, and MCP integration.
Core Features
- Concurrent multi-provider aggregation
- Unified result schema for web, academic, developer, and knowledge sources
- Provider-level timeout isolation and partial-failure handling
- Result deduplication across engines
- Provider selection by explicit names or semantic tags such as
web,academic,code, andgoogle - Final result caps for agent-friendly payload sizing
- HTTP API with OpenAPI docs
- MCP server over stdio for Claude Desktop, Cline, Continue, and similar clients
- Configurable provider allowlist via environment variables
Google Support
Google support now includes a direct scraper provider implemented inside this project.
The direct Google implementation uses browser-like requests, consent cookie handling, locale-aware query parameters, and resilient HTML result parsing. It is implemented locally in this repository.
Currently supported Google providers:
| Provider | Env var | Notes |
|---|---|---|
| Direct Google | ALLOW_UNSTABLE_PROVIDERS=true |
Primary path; HTML scraping, best effort, may be blocked from datacenter IPs |
| serpbase.dev | SERPBASE_API_KEY |
Pay-per-use; typically cheaper for low-volume usage |
| serper.dev | SERPER_API_KEY |
Includes a free tier, then pay-per-use |
Provider priority for /search/google is now google first, then google_serpbase, then google_serper.
Supported Providers
| Provider | Name | Method |
|---|---|---|
| Direct Google | google |
HTML scraping with browser-like request handling |
| SerpBase | google_serpbase |
Hosted Google SERP API |
| Serper | google_serper |
Hosted Google SERP API |
Web Search
| Provider | Name | Method |
|---|---|---|
| DuckDuckGo | duckduckgo |
HTML scraping |
| Bing | bing |
RSS feed |
| Yahoo | yahoo |
HTML scraping, best effort |
| Brave | brave |
Official Search API |
| Mwmbl | mwmbl |
Public JSON API |
| Ecosia | ecosia |
HTML scraping |
| Mojeek | mojeek |
HTML scraping |
| Startpage | startpage |
HTML scraping, best effort |
| Qwant | qwant |
Internal JSON API, best effort |
| Yandex | yandex |
HTML scraping, best effort |
| Baidu | baidu |
JSON endpoint, best effort |
Knowledge And Reference
| Provider | Name | Method |
|---|---|---|
| Wikipedia | wikipedia |
MediaWiki API |
| Wikidata | wikidata |
Wikidata API |
| Internet Archive | internet_archive |
Advanced Search API |
| Open Library | openlibrary |
Open Library search API |
Developer Sources
| Provider | Name | Method |
|---|---|---|
| GitHub | github |
GitHub REST API |
| GitLab | gitlab |
GitLab REST API |
| Stack Overflow | stackoverflow |
Stack Exchange API |
| Hacker News | hackernews |
Algolia HN API |
reddit |
Reddit API | |
| npm | npm |
npm registry API |
| PyPI | pypi |
HTML scraping |
| RubyGems | rubygems |
RubyGems search API |
| crates.io | crates |
crates.io API |
| lib.rs | lib_rs |
HTML scraping |
| Docker Hub | dockerhub |
Docker Hub search API |
| pkg.go.dev | pkg_go_dev |
HTML scraping |
| MetaCPAN | metacpan |
MetaCPAN REST API |
Academic Sources
| Provider | Name | Method |
|---|---|---|
| arXiv | arxiv |
Atom API |
| PubMed | pubmed |
NCBI E-utilities |
| Semantic Scholar | semanticscholar |
Graph API |
| CrossRef | crossref |
REST API |
Finance Sources
| Provider | Name | Key Required | Free Tier |
|---|---|---|---|
| Yahoo Finance | yahoo_finance |
No | Unofficial endpoint, no key needed |
| Alpha Vantage | alpha_vantage |
ALPHA_VANTAGE_API_KEY |
25 req/day — get key |
| Finnhub | finnhub |
FINNHUB_API_KEY |
60 req/min — get key |
Installation
One-command local install:
python scripts/install.py
Install, run tests, and start the HTTP API:
python scripts/install.py --dev --test --run
Deploy with Docker Compose:
python scripts/install.py --mode docker
The installer creates .env from .env.example when .env does not already exist. Existing .env files are kept unless --force-env is passed.
Manual install:
git clone https://github.com/gefsikatsinelou/MetaSearchMCP
cd MetaSearchMCP
pip install -e ".[dev]"
Or with uv:
uv pip install -e ".[dev]"
Configuration
Copy .env.example to .env and configure any providers you want to enable.
cp .env.example .env
Key settings:
HOST=0.0.0.0
PORT=8000
DEFAULT_TIMEOUT=10
AGGREGATOR_TIMEOUT=15
SERPBASE_API_KEY=
SERPER_API_KEY=
BRAVE_API_KEY=
GITHUB_TOKEN=
STACKEXCHANGE_API_KEY=
REDDIT_CLIENT_ID=
REDDIT_CLIENT_SECRET=
NCBI_API_KEY=
SEMANTIC_SCHOLAR_API_KEY=
ALPHA_VANTAGE_API_KEY=
FINNHUB_API_KEY=
ENABLED_PROVIDERS=
ALLOW_UNSTABLE_PROVIDERS=false
MAX_RESULTS_PER_PROVIDER=10
Running
HTTP API
python -m metasearchmcp.server
# or
metasearchmcp
The API starts on http://localhost:8000.
MCP Server
python -m metasearchmcp.broker
# or
metasearchmcp-mcp
The MCP server communicates over stdio.
Docker
docker build -t metasearchmcp .
docker run --rm -p 8000:8000 --env-file .env metasearchmcp
Or with Compose:
docker compose up --build
HTTP API
POST /search
Aggregate across all enabled providers or a selected provider subset.
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "rust async runtime",
"providers": ["duckduckgo", "wikipedia"],
"params": {"num_results": 5, "max_total_results": 8, "language": "en"}
}'
You can also narrow providers by tags:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "transformer attention",
"tags": ["academic", "knowledge"],
"params": {"num_results": 5, "max_total_results": 6}
}'
When multiple tags are provided, the default behavior is tag_match="any".
Set tag_match to "all" when you want providers that satisfy every requested tag:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "npm cli argument parser",
"tags": ["code", "packages"],
"tag_match": "all",
"params": {"num_results": 5, "max_total_results": 6}
}'
num_results controls how many results each provider can contribute. max_total_results caps the final merged response after deduplication.
POST /search/google
Search Google through the configured Google provider chain. If ALLOW_UNSTABLE_PROVIDERS=true, MetaSearchMCP will prefer the direct google provider automatically.
curl -X POST http://localhost:8000/search/google \
-H "Content-Type: application/json" \
-d '{"query": "site:github.com rust tokio"}'
To force the direct Google route explicitly:
curl -X POST http://localhost:8000/search/google \
-H "Content-Type: application/json" \
-d '{"query": "site:github.com rust tokio", "provider": "google"}'
GET /providers
Return the currently available provider catalog.
The response includes provider descriptions and a tag-to-provider index for quick discovery.
You can filter the catalog by tag:
curl "http://localhost:8000/providers?tag=academic&tag=web"
Use tag_match=all to require every tag instead of the default any-match behavior:
curl "http://localhost:8000/providers?tag=code&tag=packages&tag_match=all"
GET /health
Simple health check endpoint. Returns service status, version, provider count, and the current provider name list.
Response Schema
Every aggregated response includes:
enginequeryresultsrelated_searchessuggestionsanswer_boxtiming_msproviderserrors
Every result item includes:
titleurlsnippetsourcerankproviderpublished_dateextra
Example response:
{
"engine": "metasearchmcp",
"query": "rust async runtime",
"results": [
{
"title": "Tokio - An asynchronous Rust runtime",
"url": "https://tokio.rs",
"snippet": "Tokio is an event-driven, non-blocking I/O platform...",
"source": "tokio.rs",
"rank": 1,
"provider": "duckduckgo",
"published_date": null,
"extra": {}
}
],
"related_searches": [],
"suggestions": [],
"answer_box": null,
"timing_ms": 843.2,
"providers": [
{
"name": "duckduckgo",
"success": true,
"result_count": 10,
"latency_ms": 840.1,
"error": null
}
],
"errors": []
}
MCP Tools
MetaSearchMCP exposes these MCP tools:
search_websearch_googlesearch_academicsearch_githubcompare_enginessearch_financesearch_code
search_web also accepts optional tags so agents can limit search to categories such as web, academic, code, or google. When multiple tags are present, tag_match="all" requires a provider to satisfy the full set.
All search tools accept max_total_results to keep the final payload compact.
Example Claude Desktop config:
{
"mcpServers": {
"MetaSearchMCP": {
"command": "metasearchmcp-mcp",
"env": {
"ALLOW_UNSTABLE_PROVIDERS": "true",
"SERPBASE_API_KEY": "your_key",
"SERPER_API_KEY": "your_key"
}
}
}
}
Development
pip install -e ".[dev]"
pytest
uvicorn metasearchmcp.server:app --reload
Architecture
The public package is organized around these modules:
contracts.py: request and response modelscatalog.py: provider discovery and selectionorchestrator.py: concurrent search execution and response assemblymerge.py: URL normalization and deduplicationserver.py: FastAPI entrypointbroker.py: MCP entrypoint
Legacy module names are kept as compatibility shims for earlier imports.
Roadmap
- Caching and provider-aware query reuse
- Better scoring and ranking signals across providers
- Streaming aggregation responses
- Provider health telemetry
- More first-party API integrations where they improve reliability
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。