MCP SearXNG Enhanced

MCP SearXNG Enhanced

A Model Context Protocol server that enables web search with category support, website content scraping with citation metadata, and timezone-aware date/time tools.

Category
访问服务器

README

MCP SearXNG Enhanced Server

A Model Context Protocol (MCP) server for category-aware web search, website scraping, and date/time tools. Designed for seamless integration with SearXNG and modern MCP clients.

Features

  • 🔍 SearXNG-powered web search with category support (general, images, videos, files, map, social media)
  • 📄 Website content scraping with citation metadata and automatic Reddit URL conversion
  • 💾 In-memory caching with automatic freshness validation
  • 🚦 Domain-based rate limiting to prevent service abuse
  • 🕒 Timezone-aware date/time tool
  • ⚠️ Robust error handling with custom exception types
  • 🐳 Dockerized and configurable via environment variables
  • ⚙️ Configuration persistence between container restarts

Quick Start

Prerequisites

  • Docker installed on your system
  • A running SearXNG instance (self-hosted or accessible endpoint)

Installation & Usage

Build the Docker image:

docker build -t overtlids/mcp-searxng-enhanced:latest .

Run with your SearXNG instance (Manual Docker Run):

docker run -i --rm --network=host \
  -e SEARXNG_ENGINE_API_BASE_URL="http://127.0.0.1:8080/search" \
  -e DESIRED_TIMEZONE="America/New_York" \
  overtlids/mcp-searxng-enhanced:latest

In this example, SEARXNG_ENGINE_API_BASE_URL is explicitly set. DESIRED_TIMEZONE is also explicitly set to America/New_York, which matches its default value. If an environment variable is not provided using an -e flag during the docker run command, the server will automatically use the default value defined in its Dockerfile (refer to the Environment Variables table below). Thus, if you intend to use the default for DESIRED_TIMEZONE, you could omit the -e DESIRED_TIMEZONE="America/New_York" flag. However, SEARXNG_ENGINE_API_BASE_URL is critical and usually needs to be set to match your specific SearXNG instance's address if the Dockerfile default (http://host.docker.internal:8080/search) is not appropriate.

Note on Manual Docker Run: This command runs the Docker container independently. If you are using an MCP client (like Cline in VS Code) to manage this server, the client will start its own instance of the container using the settings defined in its own configuration. For the MCP client to use specific environment variables, they must be configured within the client's settings for this server (see below).

Configure your MCP client (e.g., Cline in VS Code):

For your MCP client to correctly manage and run this server, you must define all necessary environment variables within the client's settings for the overtlids/mcp-searxng-enhanced server. The MCP client will use these settings to construct the docker run command.

The following is the recommended default configuration for this server within your MCP client's JSON settings (e.g., cline_mcp_settings.json). This example explicitly lists all environment variables set to their default values as defined in the Dockerfile. You can copy and paste this directly and then customize any values as needed.

{
  "mcpServers": {
    "overtlids/mcp-searxng-enhanced": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm", "--network=host",
        "-e", "SEARXNG_ENGINE_API_BASE_URL=http://host.docker.internal:8080/search",
        "-e", "DESIRED_TIMEZONE=America/New_York",
        "-e", "ODS_CONFIG_PATH=/config/ods_config.json",
        "-e", "RETURNED_SCRAPPED_PAGES_NO=3",
        "-e", "SCRAPPED_PAGES_NO=5",
        "-e", "PAGE_CONTENT_WORDS_LIMIT=5000",
        "-e", "CITATION_LINKS=True",
        "-e", "MAX_IMAGE_RESULTS=10",
        "-e", "MAX_VIDEO_RESULTS=10",
        "-e", "MAX_FILE_RESULTS=5",
        "-e", "MAX_MAP_RESULTS=5",
        "-e", "MAX_SOCIAL_RESULTS=5",
        "-e", "TRAFILATURA_TIMEOUT=15",
        "-e", "SCRAPING_TIMEOUT=20",
        "-e", "CACHE_MAXSIZE=100",
        "-e", "CACHE_TTL_MINUTES=5",
        "-e", "CACHE_MAX_AGE_MINUTES=30",
        "-e", "RATE_LIMIT_REQUESTS_PER_MINUTE=10",
        "-e", "RATE_LIMIT_TIMEOUT_SECONDS=60",
        "-e", "IGNORED_WEBSITES=",
        "overtlids/mcp-searxng-enhanced:latest"
      ],
      "timeout": 60
    }
  }
}

Key Points for MCP Client Configuration:

  • The example above provides a complete set of arguments to run the Docker container with all environment variables set to their default values.
  • To customize any setting, simply modify the value for the corresponding -e "VARIABLE_NAME=value" line within the args array in your MCP client's configuration. For instance, to change SEARXNG_ENGINE_API_BASE_URL and DESIRED_TIMEZONE, you would adjust their respective lines.
  • Refer to the "Environment Variables" table below for a detailed description of each variable and its default.
  • The server's behavior is primarily controlled by these environment variables. While an ods_config.json file can also influence settings (see Configuration Management), environment variables passed by the MCP client take precedence.

Running Natively (Without Docker)

If you prefer to run the server directly using Python without Docker, follow these steps:

1. Python Installation:

  • This server requires Python 3.9 or newer. Python 3.11 (as used in the Docker image) is recommended.
  • You can download Python from python.org.

2. Clone the Repository:

  • Get the code from GitHub:
    git clone https://github.com/OvertliDS/mcp-searxng-enhanced.git
    cd mcp-searxng-enhanced
    

3. Create and Activate a Virtual Environment (Recommended):

  • Using a virtual environment helps manage dependencies and avoid conflicts with other Python projects.
    # For Linux/macOS
    python3 -m venv .venv
    source .venv/bin/activate
    
    # For Windows (Command Prompt)
    python -m venv .venv
    .\.venv\Scripts\activate.bat
    
    # For Windows (PowerShell)
    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
    

4. Install Dependencies:

  • Install the required Python packages:
    pip install -r requirements.txt
    
    Key dependencies include httpx, BeautifulSoup4, pydantic, trafilatura, python-dateutil, cachetools, and zoneinfo.

5. Ensure SearXNG is Accessible:

  • You still need a running SearXNG instance. Make sure you have its API base URL (e.g., http://127.0.0.1:8080/search).

6. Set Environment Variables:

  • The server is configured via environment variables. At a minimum, you'll likely need to set SEARXNG_ENGINE_API_BASE_URL.
  • Linux/macOS (bash/zsh):
    export SEARXNG_ENGINE_API_BASE_URL="http://your-searxng-instance:port/search"
    export DESIRED_TIMEZONE="America/Los_Angeles"
    
  • Windows (Command Prompt):
    set SEARXNG_ENGINE_API_BASE_URL="http://your-searxng-instance:port/search"
    set DESIRED_TIMEZONE="America/Los_Angeles"
    
  • Windows (PowerShell):
    $env:SEARXNG_ENGINE_API_BASE_URL="http://your-searxng-instance:port/search"
    $env:DESIRED_TIMEZONE="America/Los_Angeles"
    
  • Refer to the "Environment Variables" table below for all available options. If not set, defaults from the script or an ods_config.json file (if present in the root directory or at ODS_CONFIG_PATH) will be used.

7. Run the Server:

  • Execute the Python script:
    python mcp_server.py
    
  • The server will start and listen for MCP client connections via stdin/stdout.

8. Configuration File (ods_config.json):

  • Alternatively, or in combination with environment variables, you can create an ods_config.json file in the project's root directory (or the path specified by the ODS_CONFIG_PATH environment variable). Environment variables will always take precedence over values in this file. Example:
```json
{
  "searxng_engine_api_base_url": "http://127.0.0.1:8080/search",
  "desired_timezone": "America/New_York"
}
```

Environment Variables

The following environment variables control the server's behavior. You can set them in your MCP client's configuration (recommended for client-managed servers) or when running Docker manually.

Variable Description Default (from Dockerfile) Notes
SEARXNG_ENGINE_API_BASE_URL SearXNG search endpoint http://host.docker.internal:8080/search Crucial for server operation
DESIRED_TIMEZONE Timezone for date/time tool America/New_York E.g., America/Los_Angeles. List of tz database time zones: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
ODS_CONFIG_PATH Path to persistent configuration file /config/ods_config.json Typically left as default within the container.
RETURNED_SCRAPPED_PAGES_NO Max pages to return per search 3
SCRAPPED_PAGES_NO Max pages to attempt scraping 5
PAGE_CONTENT_WORDS_LIMIT Max words per scraped page 5000
CITATION_LINKS Enable/disable citation events True True or False
MAX_IMAGE_RESULTS Maximum image results to return 10
MAX_VIDEO_RESULTS Maximum video results to return 10
MAX_FILE_RESULTS Maximum file results to return 5
MAX_MAP_RESULTS Maximum map results to return 5
MAX_SOCIAL_RESULTS Maximum social media results to return 5
TRAFILATURA_TIMEOUT Content extraction timeout (seconds) 15
SCRAPING_TIMEOUT HTTP request timeout (seconds) 20
CACHE_MAXSIZE Maximum number of cached websites 100
CACHE_TTL_MINUTES Cache time-to-live (minutes) 5
CACHE_MAX_AGE_MINUTES Maximum age for cached content (minutes) 30
RATE_LIMIT_REQUESTS_PER_MINUTE Max requests per domain per minute 10
RATE_LIMIT_TIMEOUT_SECONDS Rate limit tracking window (seconds) 60
IGNORED_WEBSITES Comma-separated list of sites to ignore "" (empty) E.g., "example.com,another.org"

Configuration Management

The server uses a three-tier configuration approach:

  1. Script defaults (hardcoded in Python)
  2. Config file (loaded from ODS_CONFIG_PATH, defaults to /config/ods_config.json)
  3. Environment variables (highest precedence)

The config file is only updated when:

  • The file doesn't exist yet (first-time initialization)
  • Environment variables are explicitly provided for the current run

This ensures that user configurations are preserved between container restarts when no new environment variables are set.

Tools & Aliases

Tool Name Purpose Aliases
search_web Web search via SearXNG search, web_search, find, lookup_web, search_online, access_internet, lookup*
get_website Scrape website content fetch_url, scrape_page, get, load_website, lookup*
get_current_datetime Current date/time current_time, get_time, current_date

*lookup is context-sensitive:

  • If called with a url argument, it maps to get_website
  • Otherwise, it maps to search_web

Example: Calling Tools

Web Search

{ "name": "search_web", "arguments": { "query": "open source ai" } }

or using an alias:

{ "name": "search", "arguments": { "query": "open source ai" } }

Category-Specific Search

{ "name": "search_web", "arguments": { "query": "landscapes", "category": "images" } }

Website Scraping

{ "name": "get_website", "arguments": { "url": "example.com" } }

or using an alias:

{ "name": "lookup", "arguments": { "url": "example.com" } }

Current Date/Time

{ "name": "get_current_datetime", "arguments": {} }

or:

{ "name": "current_time", "arguments": {} }

Advanced Features

Category-Specific Search

The search_web tool supports different categories with tailored outputs:

  • images: Returns image URLs, titles, and source pages with optional Markdown embedding
  • videos: Returns video information including titles, source, and embed URLs
  • files: Returns downloadable file information including format and size
  • map: Returns location data including coordinates and addresses
  • social media: Returns posts and profiles from social platforms
  • general: Default category that scrapes and returns full webpage content

Reddit URL Conversion

When scraping Reddit content, URLs are automatically converted to use the old.reddit.com domain for better content extraction.

Rate Limiting

Domain-based rate limiting prevents excessive requests to the same domain within a time window. This prevents overwhelming target websites and potential IP blocking.

Cache Validation

Cached website content is automatically validated for freshness based on age. Stale content is refreshed automatically while valid cached content is served quickly.

Error Handling

The server implements a robust error handling system with these exception types:

  • MCPServerError: Base exception class for all server errors
  • ConfigurationError: Raised when configuration values are invalid
  • SearXNGConnectionError: Raised when connection to SearXNG fails
  • WebScrapingError: Raised when web scraping fails
  • RateLimitExceededError: Raised when rate limit for a domain is exceeded

Errors are properly propagated to the client with informative messages.

Troubleshooting

  • Cannot connect to SearXNG: Ensure your SearXNG instance is running and the SEARXNG_ENGINE_API_BASE_URL environment variable points to the correct endpoint.
  • Rate limit errors: Adjust RATE_LIMIT_REQUESTS_PER_MINUTE if you're experiencing too many rate limit errors.
  • Slow content extraction: Increase TRAFILATURA_TIMEOUT to allow more time for content processing on complex pages.
  • Docker networking issues: If using Docker Desktop on Windows/Mac, host.docker.internal should resolve to the host machine. On Linux, you may need to use the host's IP address instead.

Acknowledgements

Inspired by:

License

MIT License © 2025 OvertliDS

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选