crawl4ai-mcp
Okay, here's a Python outline and explanation of how you might structure an MCP (Model Context Protocol) server that wraps the Crawl4AI library. This is a conceptual framework; you'll need to fill in the details based on your specific requirements and the Crawl4AI library's API. **Conceptual Overview** 1. **MCP Server:** This will be the main entry point. It listens for requests conforming to the MCP standard. These requests will specify which Crawl4AI function to execute and provide the necessary parameters. 2. **Crawl4AI Wrapper:** This layer translates MCP requests into calls to the Crawl4AI library. It handles parameter conversion, error handling, and result formatting. 3. **Crawl4AI Library:** This is the core library that performs the actual crawling and AI-related tasks. **Python Code Outline** ```python # Import necessary libraries import json from http.server import BaseHTTPRequestHandler, HTTPServer # For a simple HTTP server # Or use a more robust framework like Flask or FastAPI # from flask import Flask, request, jsonify # Example with Flask # Assuming Crawl4AI is installed and importable import crawl4ai # Replace with the actual import statement # --- Crawl4AI Wrapper --- class Crawl4AIWrapper: def __init__(self): # Initialize any necessary Crawl4AI resources here pass def crawl_website(self, url, max_depth=1): """ Wraps the Crawl4AI website crawling function. Args: url (str): The URL to start crawling from. max_depth (int): The maximum depth to crawl. Returns: dict: A dictionary containing the crawling results. Format this according to your MCP requirements. Could include: - `status`: "success" or "error" - `data`: The crawled data (e.g., list of URLs, extracted text) - `error_message`: If an error occurred. """ try: # Call the Crawl4AI function results = crawl4ai.crawl_website(url, max_depth=max_depth) # Replace with actual Crawl4AI call # Format the results into an MCP-compatible dictionary response = { "status": "success", "data": results # Adapt this to the MCP format } return response except Exception as e: # Handle errors gracefully response = { "status": "error", "error_message": str(e) } return response def analyze_text(self, text): """ Wraps the Crawl4AI text analysis function. Args: text (str): The text to analyze. Returns: dict: A dictionary containing the analysis results. """ try: analysis_results = crawl4ai.analyze_text(text) # Replace with actual Crawl4AI call response = { "status": "success", "data": analysis_results } return response except Exception as e: response = { "status": "error", "error_message": str(e) } return response # Add more wrapper functions for other Crawl4AI functionalities # --- MCP Server (Simple HTTP Server Example) --- class MCPRequestHandler(BaseHTTPRequestHandler): def do_POST(self): """Handles POST requests (MCP requests).""" content_length = int(self.headers['Content-Length']) post_data = self.rfile.read(content_length) try: request_data = json.loads(post_data.decode('utf-8')) # Process the request response = self.process_request(request_data) # Send the response self.send_response(200) # OK self.send_header('Content-type', 'application/json') self.end_headers() self.wfile.write(json.dumps(response).encode('utf-8')) except json.JSONDecodeError: self.send_response(400) # Bad Request self.send_header('Content-type', 'application/json') self.end_headers() self.wfile.write(json.dumps({"status": "error", "error_message": "Invalid JSON"}).encode('utf-8')) except Exception as e: self.send_response(500) # Internal Server Error self.send_header('Content-type', 'application/json') self.end_headers() self.wfile.write(json.dumps({"status": "error", "error_message": str(e)}).encode('utf-8')) def process_request(self, request_data): """ Processes the MCP request and calls the appropriate Crawl4AI function. Args: request_data (dict): The JSON-decoded MCP request. This should contain information like the function name and parameters. Returns: dict: The response from the Crawl4AI wrapper. """ global crawl4ai_wrapper # Access the global instance try: function_name = request_data.get("function") parameters = request_data.get("parameters", {}) # Default to empty dict if function_name == "crawl_website": url = parameters.get("url") max_depth = parameters.get("max_depth", 1) # Default max_depth if not url: return {"status": "error", "error_message": "Missing 'url' parameter"} return crawl4ai_wrapper.crawl_website(url, max_depth) elif function_name == "analyze_text": text = parameters.get("text") if not text: return {"status": "error", "error_message": "Missing 'text' parameter"} return crawl4ai_wrapper.analyze_text(text) else: return {"status": "error", "error_message": "Invalid function name"} except Exception as e: return {"status": "error", "error_message": str(e)} def run_server(server_class=HTTPServer, handler_class=MCPRequestHandler, port=8000): """Starts the MCP server.""" server_address = ('', port) httpd = server_class(server_address, handler_class) print(f"Starting MCP server on port {port}") httpd.serve_forever() # --- Main --- if __name__ == "__main__": # Initialize the Crawl4AI wrapper crawl4ai_wrapper = Crawl4AIWrapper() # Create a global instance # Start the server run_server() # --- Example MCP Request (JSON) --- # { # "function": "crawl_website", # "parameters": { # "url": "https://www.example.com", # "max_depth": 2 # } # } # --- Example MCP Request (JSON) --- # { # "function": "analyze_text", # "parameters": { # "text": "This is some text to analyze." # } # } ``` **Key Improvements and Explanations** * **Error Handling:** Includes `try...except` blocks to catch potential errors during Crawl4AI calls and JSON processing. Returns error messages in the MCP response. * **Parameter Handling:** The `process_request` function extracts parameters from the JSON request and passes them to the Crawl4AI wrapper functions. It also includes default values for optional parameters. It checks for missing required parameters. * **MCP-Compliant Responses:** The responses are formatted as JSON dictionaries with a `status` field ("success" or "error") and either a `data` field (for successful results) or an `error_message` field. Adapt the `data` format to your specific MCP requirements. * **Function Dispatch:** The `process_request` function uses `if/elif/else` to dispatch the request to the correct Crawl4AI wrapper function based on the `function` field in the MCP request. * **Crawl4AI Wrapper Class:** Encapsulates the Crawl4AI library calls within a class. This allows you to initialize resources (e.g., API keys, models) in the `__init__` method and reuse them across multiple requests. * **Global Crawl4AI Wrapper Instance:** A global instance `crawl4ai_wrapper` is created to avoid re-initializing the wrapper for each request. This can improve performance if the wrapper initialization is expensive. * **Example MCP Requests:** Includes example JSON requests that you can use to test the server. * **Clearer Structure:** Separates the Crawl4AI wrapper logic from the MCP server logic for better organization. * **Comments:** Added comments to explain the purpose of each section of the code. * **Uses `json` library:** Uses the standard `json` library for encoding and decoding JSON data. * **HTTP Status Codes:** Returns appropriate HTTP status codes (200, 400, 500) to indicate the success or failure of the request. * **Flexibility:** The code is designed to be easily extended to support more Crawl4AI functions. Just add more wrapper functions to the `Crawl4AIWrapper` class and update the `process_request` function to handle the new function names. **How to Use** 1. **Install Crawl4AI:** Make sure you have the Crawl4AI library installed (`pip install crawl4ai` or however it's installed). *Replace `crawl4ai` with the actual package name if it's different.* 2. **Replace Placeholders:** Replace the placeholder `crawl4ai.crawl_website()` and `crawl4ai.analyze_text()` calls with the actual calls to the Crawl4AI library. Adapt the parameter passing and result formatting to match the Crawl4AI API. 3. **Define MCP Format:** Clearly define the format of your MCP requests and responses. The code assumes a JSON-based format with a `function` field and a `parameters` field. 4. **Run the Server:** Run the Python script. It will start an HTTP server on port 8000 (by default). 5. **Send MCP Requests:** Send HTTP POST requests to the server with the MCP requests in the body. Use a tool like `curl`, `Postman`, or a Python `requests` library. **Example using `curl`:** ```bash curl -X POST -H "Content-Type: application/json" -d '{ "function": "crawl_website", "parameters": { "url": "https://www.example.com", "max_depth": 1 } }' http://localhost:8000 ``` **Important Considerations** * **Security:** This is a *very basic* HTTP server. For production environments, use a more robust framework like Flask or FastAPI, and implement proper security measures (authentication, authorization, input validation, etc.). * **Asynchronous Operations:** If Crawl4AI operations are long-running, consider using asynchronous programming (e.g., `asyncio` with FastAPI) to avoid blocking the server. * **Scalability:** For high-volume traffic, you'll need to consider scalability. This might involve using a load balancer, multiple server instances, and a message queue for handling requests. * **Error Logging:** Implement proper error logging to help you debug and monitor the server. * **MCP Standard:** Ensure that your implementation fully conforms to the MCP standard. This includes the request and response formats, error codes, and any other requirements. * **Crawl4AI API:** Thoroughly understand the Crawl4AI library's API and how to use its functions effectively. * **Rate Limiting:** Implement rate limiting to prevent abuse of your server and to comply with the terms of service of the websites you are crawling. * **User Agent:** Set a proper user agent string when crawling websites to identify your crawler and avoid being blocked. * **Robots.txt:** Respect the `robots.txt` file of the websites you are crawling. **Chinese Translation of Key Terms** * **MCP (Model Context Protocol):** 模型上下文协议 (Móxíng Shàngxiàwén Xiéyì) * **Crawl4AI:** 网络爬虫AI库 (Wǎngluò Páchóng AI Kù) or AI爬虫库 (AI Páchóng Kù) * **Server:** 服务器 (Fúwùqì) * **Wrapper:** 封装器 (Fēngzhuāngqì) or 包装器 (Bāozhuāngqì) * **Function:** 函数 (Hánshù) * **Parameter:** 参数 (Cānshù) * **Request:** 请求 (Qǐngqiú) * **Response:** 响应 (Xiǎngyìng) * **Error:** 错误 (Cuòwù) * **Status:** 状态 (Zhuàngtài) * **Data:** 数据 (Shùjù) * **URL:** 网址 (Wǎngzhǐ) * **Text:** 文本 (Wénběn) * **Analysis:** 分析 (Fēnxī) * **JSON:** JSON (JSON) (commonly used without translation) * **HTTP:** HTTP (HTTP) (commonly used without translation) This comprehensive outline should give you a solid foundation for building your MCP server with Crawl4AI. Remember to adapt the code to your specific needs and the Crawl4AI library's API. Good luck!
wyattowalsh
README
crawl4ai-mcp
推荐服务器
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
MCP Package Docs Server
促进大型语言模型高效访问和获取 Go、Python 和 NPM 包的结构化文档,通过多语言支持和性能优化来增强软件开发。
Claude Code MCP
一个实现了 Claude Code 作为模型上下文协议(Model Context Protocol, MCP)服务器的方案,它可以通过标准化的 MCP 接口来使用 Claude 的软件工程能力(代码生成、编辑、审查和文件操作)。
@kazuph/mcp-taskmanager
用于任务管理的模型上下文协议服务器。它允许 Claude Desktop(或任何 MCP 客户端)在基于队列的系统中管理和执行任务。
mermaid-mcp-server
一个模型上下文协议 (MCP) 服务器,用于将 Mermaid 图表转换为 PNG 图像。
Jira-Context-MCP
MCP 服务器向 AI 编码助手(如 Cursor)提供 Jira 工单信息。

Linear MCP Server
一个模型上下文协议(Model Context Protocol)服务器,它与 Linear 的问题跟踪系统集成,允许大型语言模型(LLM)通过自然语言交互来创建、更新、搜索和评论 Linear 问题。

Sequential Thinking MCP Server
这个服务器通过将复杂问题分解为顺序步骤来促进结构化的问题解决,支持修订,并通过完整的 MCP 集成来实现多条解决方案路径。
Curri MCP Server
通过管理文本笔记、提供笔记创建工具以及使用结构化提示生成摘要,从而实现与 Curri API 的交互。