websearch-mcp

websearch-mcp

Enables web searching via SearXNG, page content extraction with Crawl4AI, and image analysis using vision language models. It provides AI agents with tools for information synthesis and web-based data retrieval through OpenAI-compatible LLM endpoints.

Category
访问服务器

README

websearch-mcp

An MCP server that provides web search and page fetching tools for AI agents. Uses SearXNG for search, Crawl4AI for content extraction, and any OpenAI-compatible LLM for server-side synthesis.

Prerequisites

  • Python 3.12+
  • SearXNG instance with JSON format enabled (search.formats: [json] in settings.yml)
  • OpenAI-compatible LLM endpoint (OpenAI, Ollama, vLLM, LiteLLM, etc.)

Installation

# Run directly from GitHub
uvx --from "git+https://github.com/<org>/websearch-mcp" websearch-mcp

# Or clone and install locally
git clone https://github.com/<org>/websearch-mcp
cd websearch-mcp
uv sync
uv run websearch-mcp

Tools

web_search

Search the web via SearXNG, fetch top result pages, and synthesize with LLM.

Parameter Type Required Description
query string Yes Search query
max_results int No Max results (default: 10)
allowed_domains string[] No Only include these domains
blocked_domains string[] No Exclude these domains

webfetch

Fetch a single URL, extract content, and process with LLM.

Parameter Type Required Description
url string Yes URL to fetch
prompt string No Custom instruction for LLM processing

image-description

Describe an image using a vision language model (VLM). Accepts either base64-encoded image data or an absolute filesystem path to an image file.

Parameter Type Required Description
image string Yes Base64-encoded image data or absolute filesystem path

Returns a JSON object with description, success status, and optional error message.

Environment Variables

Variable Required Default Description
SEARXNG_URL Yes Base URL of SearXNG instance
LLM_BASE_URL Yes OpenAI-compatible endpoint base URL
LLM_API_KEY Yes API key for the LLM endpoint
LLM_MODEL Yes Model name for chat completions
CACHE_TTL_SECONDS No 900 Cache TTL in seconds (0 to disable)
CACHE_MAX_ENTRIES No 1000 Max cache entries before LRU eviction
FETCH_TIMEOUT No 30 Per-page fetch timeout in seconds
LLM_TIMEOUT No 60 LLM request timeout in seconds
MAX_CONTENT_SIZE No 5242880 Max content size in bytes (5MB)
DEFAULT_MAX_RESULTS No 10 Default result count for web_search

VLM Configuration (for image-description tool)

Variable Required Default Description
VLM_BASE_URL No LLM_BASE_URL OpenAI-compatible endpoint for VLM
VLM_API_KEY No LLM_API_KEY API key for VLM endpoint
VLM_MODEL No LLM_MODEL Model name for image description
MAX_IMAGE_SIZE No 10485760 Max image size in bytes (10MB)

Agent Configuration

Claude Desktop (stdio)

{
  "mcpServers": {
    "websearch": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/<org>/websearch-mcp", "websearch-mcp"],
      "env": {
        "SEARXNG_URL": "http://localhost:8888",
        "LLM_BASE_URL": "http://localhost:11434/v1",
        "LLM_API_KEY": "ollama",
        "LLM_MODEL": "llama3"
      }
    }
  }
}

Generic MCP Config (stdio)

{
  "command": "uvx",
  "args": ["--from", "git+https://github.com/<org>/websearch-mcp", "websearch-mcp"],
  "env": {
    "SEARXNG_URL": "http://localhost:8888",
    "LLM_BASE_URL": "https://api.openai.com/v1",
    "LLM_API_KEY": "sk-...",
    "LLM_MODEL": "gpt-4o-mini"
  }
}

HTTP Transport

websearch-mcp --transport http --port 3000
{
  "url": "http://localhost:3000/mcp"
}

Development

uv sync
uv run pytest tests/ -v

Example Usage

image-description tool

With base64-encoded image:

# Using base64 encoded image data
image_b64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
result = await image_description(image_b64)
# Returns: {"description": "A small white square", "success": true, "error": null}

With filesystem path:

# Using absolute filesystem path
result = await image_description("/path/to/image.png")
# Returns: {"description": "A detailed description of the image", "success": true, "error": null}

With Ollama (using llava or other VLM):

{
  "mcpServers": {
    "websearch": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/<org>/websearch-mcp", "websearch-mcp"],
      "env": {
        "SEARXNG_URL": "http://localhost:8888",
        "LLM_BASE_URL": "http://localhost:11434/v1",
        "LLM_API_KEY": "ollama",
        "LLM_MODEL": "llama3",
        "VLM_BASE_URL": "http://localhost:11434/v1",
        "VLM_API_KEY": "ollama",
        "VLM_MODEL": "llava"
      }
    }
  }
}

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选