webmcp
MCP server for web search and content extraction using DuckDuckGo or SearXNG, with Playwright-based fetching and LLM-powered data extraction.
README
webmcp
webmcp is an MCP server for web search and content extraction. LLM agents can use it to:
- search the web with DuckDuckGo (default) or SearXNG (optional)
- fetch and clean page content from one or more URLs
- send cleaned content to a local LLM for structured extraction
Features
search_web(query, limit=10)returns web results (title, URL, description)extract(urls, prompt=None, schema=None, use_browser=True)extracts data from pages- browser-based fetching with Playwright for JavaScript-heavy sites
- lightweight HTTP fetching mode for faster/simple pages
- persistent tool-call logging to
tool_calls.log.json - configurable search provider: DDG by default, optional SearXNG
Critical Requirement
For the main researcher llama.cpp server, include --webui-mcp-proxy in launch parameters. Without this flag, this workflow will not function correctly.
Prompting And Tested Setup
For best results, use research_prompt.txt as your system prompt. This prompt is a core part of the intended workflow and quality; it is effectively half of how this repository is meant to function.
Tested setup:
- Main researcher LLM:
Qwen3.5:27b-Q3_K_M.ggufvia llama.cpp on an RTX 4090, context length 200,000, about 40 tok/s. - Extract tool LLM:
Qwen3.5:9b-Q4_K_M.ggufvia llama.cpp on a GTX 1080 Ti, context length 32,768, about 40 tok/s. - This workflow has been tested with the llama.cpp WebUI specifically, and has not been validated with other MCP clients yet.
Requirements
- Python 3.10+
- A local OpenAI-compatible LLM endpoint (for example, llama.cpp, LM Studio, vLLM, ollama, etc)
Configuration
The app reads LLM settings from environment variables and supports a local .env file.
- Copy
.env.exampleto.env - Set values:
LLM_URL=http://localhost:1234
LLM_MODEL=your-model-name
SEARCH_PROVIDER=ddg
# Optional when SEARCH_PROVIDER=searxng
SEARXNG_URL=http://localhost:8080
LLM_URL and LLM_MODEL are required at startup.
SEARCH_PROVIDER defaults to ddg. Set it to searxng to replace DDG, and provide SEARXNG_URL.
Search Providers
search_web supports two providers:
ddg(default): uses DuckDuckGo viaddgssearxng: uses your SearXNG instance
SearXNG notes:
- Set
SEARCH_PROVIDER=searxng - Set
SEARXNG_URLto your instance base URL (for example,http://192.168.0.55:8888) webmcpcalls<SEARXNG_URL>/searchwithformat=json
Install
Install dependencies from the pinned requirements file:
pip install -r requirements.txt
python -m playwright install chromium
Run
python app.py
Server starts on:
http://0.0.0.0:8642
MCP Usage Notes
extract(..., use_browser=True)is best for dynamic pages that require JS rendering.extract(..., use_browser=False)is faster for static pages.- If extraction quality is poor, the LLM should provide a more specific
promptand/or a stricterschema.
TODO
- Revisit JS page rendering and extraction strategy. Right now, roughly 25-30% of pages return little or no usable content even when fetched successfully.
- Improve anti-bot handling for page fetches. Many targets still return 400-range errors, so investigate stronger browser mimicry (Playwright/Chromium behavior, headers, fingerprinting, and potentially user-agent/profile rotation).
License
MIT. See LICENSE.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
mcp-server-qdrant
这个仓库展示了如何为向量搜索引擎 Qdrant 创建一个 MCP (Managed Control Plane) 服务器的示例。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器