
Grabba MCP Server
Microservice Connector Protocol server that exposes Grabba API functionalities as callable tools, allowing AI agents and applications to interact with Grabba's data extraction and management services.
README
Grabba MCP Server
This repository contains the Grabba Microservice Connector Protocol (MCP) server, designed to expose Grabba API functionalities as a set of callable tools. Built on FastMCP
, this server allows AI agents, orchestrators (like LangChain), and other applications to seamlessly interact with the Grabba data extraction and management services.
Table of Contents
- Features
- Getting Started
- Configuration
- Available Tools
- Connecting to the MCP Server
- Development Notes
- Links & Resources
- License
Features
- Grabba API Exposure: Exposes key Grabba API functionalities (data extraction, job management, statistics) as accessible tools.
- Multiple Transports: Supports
stdio
,streamable-http
, andsse
transports, offering flexibility for different deployment and client scenarios. - Dependency Injection: Leverages FastAPI's robust dependency injection for secure and efficient
GrabbaService
initialization (e.g., handling API keys). - Containerized Deployment: Optimized for Docker for easy packaging and deployment.
- Configurable: Allows configuration via environment variables and command-line arguments.
Getting Started
Prerequisites
- Python 3.10+
- Docker (for containerized deployment)
- A Grabba API Key (you can get one from the Grabba website)
Installation
Via PyPI (Recommended)
The grabba-mcp
package is available on PyPI. This is the simplest way to get started.
pip install grabba-mcp
From Source (Development)
If you plan to contribute or modify the server, you'll want to install from source.
-
Clone the repository:
git clone https://github.com/grabba-dev/grabba-mcp cd grabba-mcp
-
Install Poetry: If you don't have Poetry installed, follow their official guide:
pip install poetry
-
Install project dependencies: Navigate to the
apps/mcp
directory wherepyproject.toml
resides, then install:cd apps/mcp poetry install
Running the Server
Locally
After installation (either via pip
or from source), you can run the server.
-
Create a
.env
file: In theapps/mcp
directory (if running from source) or the directory from which you'll execute thegrabba-mcp
command, create a.env
file and add your Grabba API key:API_KEY="YOUR_API_KEY_HERE" # Optional: configure the server port PORT=8283 # Optional: configure the default transport (overridden by CLI) MCP_SERVER_TRANSPORT="streamable-http"
-
Execute the server:
-
If installed via
pip
:grabba-mcp
To specify a transport via command line:
grabba-mcp streamable-http
-
If running from source (using Poetry):
cd apps/mcp poetry run python src/server.py
To specify a transport via command line:
poetry run python src/server.py stdio
You should see output indicating the server is starting and listening on the specified port (e.g.,
http://0.0.0.0:8283
) if using HTTP transports. Note that thestdio
transport will exit after a single request/response cycle, making it unsuitable for persistent services. -
Docker Container
A pre-built Docker image is available on Docker Hub, making deployment straightforward.
-
Pull the image:
docker pull itsobaa/grabba-mcp:latest
-
Run the container: For a persistent server, you'll typically use the
streamable-http
transport and map ports.docker run -d \ -p 8283:8283 \ -e API_KEY="YOUR_API_KEY_HERE" \ -e MCP_SERVER_TRANSPORT="streamable-http" \ itsobaa/grabba-mcp:latest
You can also use
docker-compose
for more complex setups:# docker-compose.yml version: '3.8' services: grabba-mcp: image: itsobaa/grabba-mcp:latest container_name: grabba-mcp environment: API_KEY: ${API_KEY} # Reads from a .env file next to docker-compose.yml MCP_SERVER_TRANSPORT: streamable-http PORT: 8283 ports: - "8283:8283" healthcheck: test: ["CMD-SHELL", "curl -f http://localhost:8283/tools/openapi.json || exit 1"] interval: 10s timeout: 5s retries: 5
With a
docker-compose.yml
file, create a.env
file next to it (e.g.,API_KEY="YOUR_API_KEY_HERE"
) and run:docker-compose up -d
Public Instance
The Grabba MCP Server is publicly accessible at:
- URL:
https://mcp.grabba.dev/
- Transports: Supports
sse
andstreamable-http
. - Authentication: Requires an
API_KEY
header with your Grabba API key.
Configuration
The server can be configured via environment variables and command-line arguments.
Environment Variables
API_KEY
(Required): Your Grabba API key. This is critical for authenticating with Grabba services.PORT
(Optional, default:8283
): The port on which the MCP server's HTTP transports (streamable-http
,sse
) will listen.MCP_SERVER_TRANSPORT
(Optional, default:stdio
): The default transport protocol for the MCP server. Can bestdio
,streamable-http
, orsse
.
Command-Line Arguments
The server also accepts a single positional command-line argument which overrides MCP_SERVER_TRANSPORT
:
grabba-mcp [transport_protocol]
# or for source: python src/server.py [transport_protocol]
[transport_protocol]
: Can bestdio
,streamable-http
, orsse
.- Example:
grabba-mcp streamable-http
- Example:
Available Tools
The Grabba MCP Server exposes a suite of tools that wrap the Grabba Python SDK functionalities.
Authentication
For streamable-http
and sse
transports, authentication is performed by including an API_KEY
HTTP header with your Grabba API Key.
Example: API_KEY: YOUR_API_KEY_HERE
For stdio
transport, the API_KEY
environment variable must be set in the environment where the grabba-mcp
command is executed, as there are no HTTP headers in this communication mode.
Tool Details
extract_data
- Description: Schedules a new data extraction job with Grabba. Suitable for web search tasks.
- Input:
Job
object (Pydantic model) detailing the extraction tasks. - Output:
tuple[str, Optional[Dict]]
- A message and theJobResult
as a dictionary.
schedule_existing_job
- Description: Schedules an existing Grabba job to run immediately.
- Input:
job_id
(string) - The ID of the existing job. - Output:
tuple[str, Optional[Dict]]
- A message and theJobResult
as a dictionary.
fetch_all_jobs
- Description: Fetches all Grabba jobs for the current user.
- Input: None.
- Output:
tuple[str, Optional[List[Job]]]
- A message and a list ofJob
objects.
fetch_specific_job
- Description: Fetches details of a specific Grabba job by its ID.
- Input:
job_id
(string) - The ID of the job. - Output:
tuple[str, Optional[Job]]
- A message and theJob
object.
delete_job
- Description: Deletes a specific Grabba job.
- Input:
job_id
(string) - The ID of the job to delete. - Output:
tuple[str, None]
- A success message.
fetch_job_result
- Description: Fetches results of a completed Grabba job by its result ID.
- Input:
job_result_id
(string) - The ID of the job result. - Output:
tuple[str, Optional[Dict]]
- A message and the job result data as a dictionary.
delete_job_result
- Description: Deletes results of a completed Grabba job.
- Input:
job_result_id
(string) - The ID of the job result to delete. - Output:
tuple[str, None]
- A success message.
fetch_stats_data
- Description: Fetches usage statistics and current user token balance for Grabba.
- Input: None.
- Output:
tuple[str, Optional[JobStats]]
- A message and theJobStats
object.
estimate_job_cost
- Description: Estimates the cost of a Grabba job before creation or scheduling.
- Input:
Job
object (Pydantic model) detailing the extraction tasks. - Output:
tuple[str, Optional[Dict]]
- A message and the estimated cost details as a dictionary.
create_job
- Description: Creates a new data extraction job in Grabba without immediately scheduling it for execution.
- Input:
Job
object (Pydantic model) detailing the extraction tasks. - Output:
tuple[str, Optional[Job]]
- A message and the createdJob
object.
fetch_available_regions
- Description: Fetches a list of all available puppet (web agent) regions that can be used for scheduling web data extractions.
- Input: None.
- Output:
tuple[str, Optional[List[PuppetRegion]]]
- A message and a list ofPuppetRegion
objects.
Connecting to the MCP Server
The MultiServerMCPClient
from mcp.client
is designed to connect to FastMCP servers.
Python Client (LangChain Example)
This example assumes you have the mcp-client
package installed (often as part of a larger LangChain/Agent setup), along with grabba
and pydantic
.
import asyncio
import os
from typing import List, Dict, Optional
from langchain_core.tools import BaseTool, Tool
from mcp.models.mcp_server_config import McpServerConfig, McpServer
from mcp.client.transports.streamable_http import StreamableHttpConnection
from mcp.client.transports.stdio import StdioConnection
from mcp.client.multi_server_client import MultiServerMCPClient
from grabba import Job, JobStats, PuppetRegion # Import necessary Grabba Pydantic models
from dotenv import load_dotenv # For loading API key from .env
async def connect_and_use_mcp_tools(mcp_server_configs: List[McpServerConfig], api_key: Optional[str] = None) -> List[Tool]:
"""
Connects to the MCP server(s), discovers its tools, and wraps them as LangChain Tools.
Handles API key injection for HTTP connections.
"""
try:
mcp_client_config = {}
for config in mcp_server_configs:
# Pydantic V2 model validation
mcp_server_model = McpServer.model_validate(config.mcp_server.model_dump())
connection_headers = {}
if api_key:
# Use standard header name for API keys
connection_headers["API_KEY"] = api_key
if mcp_server_model.transport == "streamable_http":
server_params: StreamableHttpConnection = {
"transport": "streamable_http",
"url": str(mcp_server_model.url),
"env": config.env_variables or {}, # For other env variables, if any
"headers": connection_headers # Pass headers for HTTP transports
}
elif mcp_server_model.transport == "stdio":
server_params: StdioConnection = {
"transport": "stdio",
"command": mcp_server_model.command,
"args": mcp_server_model.args,
"env": config.env_variables # For stdio, env maps to subprocess env vars
}
else:
raise ValueError(f"Unsupported transport: {mcp_server_model.transport}")
print(f"Client connecting with params: {server_params}")
mcp_client_config[mcp_server_model.name] = server_params
mcp_client = MultiServerMCPClient(mcp_client_config)
tools: List[BaseTool] = await mcp_client.get_tools()
print(f"Successfully loaded {len(tools)} tools.")
return tools
except Exception as e:
print(f"Error connecting to MCP server or loading tools: {e}")
return []
async def main():
load_dotenv() # Load API key from a client-side .env file
API_KEY = os.getenv("API_KEY", "YOUR_API_KEY_HERE_IF_NOT_ENV")
# --- Configuration for Streamable HTTP Transport (Local or Public Instance) ---
# For local: url="http://localhost:8283"
# For public: url="https://mcp.grabba.dev/"
http_mcp_config = McpServerConfig(
mcp_server=McpServer(
name="grabba-agent-http",
transport="streamable_http",
url="http://localhost:8283" # Or "https://mcp.grabba.dev/" for public
)
)
print("\n--- Connecting via Streamable HTTP ---")
http_tools = await connect_and_use_mcp_tools(
mcp_server_configs=[http_mcp_config],
api_key=API_KEY
)
if http_tools:
print("\nAvailable HTTP Tools:")
for tool in http_tools:
print(f"- {tool.name}: {tool.description.split('.')[0]}.")
# Example: Using the extract_data tool (adjust as per your Job Pydantic model)
extract_tool = next((t for t in http_tools if t.name == "extract_data"), None)
if extract_tool:
print("\n--- Testing extract_data tool via HTTP ---")
sample_job = Job(
url="https://example.com/some-page",
type="markdown", # or "pdf", "html" etc.
parser="text-content",
strategy="auto"
# ... other required fields for Job
)
try:
result_msg, result_data = await extract_tool.ainvoke({"extraction_data": sample_job})
print(f"Extraction Result (HTTP): {result_msg}")
if result_data:
print(f"Extraction Data (HTTP): {result_data.get('extracted_text', 'No text extracted')[:100]}...") # Print first 100 chars
except Exception as e:
print(f"Error calling extract_data via HTTP: {e}")
else:
print("extract_data tool not found in HTTP tools.")
# Example: Using fetch_all_jobs tool
fetch_jobs_tool = next((t for t in http_tools if t.name == "fetch_all_jobs"), None)
if fetch_jobs_tool:
print("\n--- Testing fetch_all_jobs tool via HTTP ---")
try:
result_msg, jobs_list = await fetch_jobs_tool.ainvoke({})
print(f"Fetch Jobs Result (HTTP): {result_msg}")
if jobs_list:
print(f"Fetched {len(jobs_list)} jobs.")
for job in jobs_list[:2]: # Print first 2 jobs
print(f" - Job ID: {job.job_id}, URL: {job.url}")
except Exception as e:
print(f"Error calling fetch_all_jobs via HTTP: {e}")
# Example: Using fetch_stats_data tool
fetch_stats_tool = next((t for t in http_tools if t.name == "fetch_stats_data"), None)
if fetch_stats_tool:
print("\n--- Testing fetch_stats_data tool via HTTP ---")
try:
result_msg, stats_data = await fetch_stats_tool.ainvoke({})
print(f"Fetch Stats Result (HTTP): {result_msg}")
if stats_data:
print(f"Token Balance (HTTP): {stats_data.token_balance}")
print(f"Jobs Run (HTTP): {stats_data.jobs_run_count}")
except Exception as e:
print(f"Error calling fetch_stats_data via HTTP: {e}")
# --- Configuration for Stdio Transport (e.g., to a Docker container running the server) ---
# This assumes you have the 'itsobaa/grabba-mcp:latest' Docker image available.
# The client launches a temporary Docker container for each tool call.
stdio_mcp_config = McpServerConfig(
mcp_server=McpServer(
name="grabba-agent-stdio",
transport="stdio",
command="docker",
args=[
"run",
"-i", # Keep STDIN open for interactive communication
"--rm", # Remove container after exit
"itsobaa/grabba-mcp:latest", # The Docker Hub image for Grabba MCP server
"grabba-mcp", "stdio" # Command to run the server in stdio mode inside container
],
env_variables={"API_KEY": API_KEY} # Pass API key as env var for stdio
)
)
print("\n--- Connecting via Stdio (to Docker container as a subprocess) ---")
stdio_tools = await connect_and_use_mcp_tools(
mcp_server_configs=[stdio_mcp_config],
api_key=API_KEY # Client might still pass for internal consistency, though env_variables is primary for stdio
)
if stdio_tools:
print("\nAvailable Stdio Tools:")
for tool in stdio_tools:
print(f"- {tool.name}: {tool.description.split('.')[0]}.")
# Example: Using the fetch_available_regions tool via Stdio
fetch_regions_tool = next((t for t in stdio_tools if t.name == "fetch_available_regions"), None)
if fetch_regions_tool:
print("\n--- Testing fetch_available_regions tool via Stdio ---")
try:
result_msg, regions_list = await fetch_regions_tool.ainvoke({})
print(f"Fetch Regions Result (Stdio): {result_msg}")
if regions_list:
print(f"Fetched {len(regions_list)} regions.")
for region in regions_list[:3]: # Print first 3 regions
print(f" - {region.display_name} ({region.code})")
except Exception as e:
print(f"Error calling fetch_available_regions via Stdio: {e}")
else:
print("fetch_available_regions tool not found in Stdio tools.")
if __name__ == "__main__":
asyncio.run(main())
Development Notes
Project Structure
your_project_root/
├── src/
│ └── server.py # Main FastMCP server application
├── .env # Environment variables for local development
├── pyproject.toml # Poetry project configuration
└── poetry.lock # Poetry dependency lock file
├── Dockerfile # Docker build instructions for the server
├── docker-compose.yml # Docker Compose configuration for local development/deployment
├── .dockerignore # Files to ignore during Docker build
├── .env # Example .env for docker-compose (for API_KEY)
├── README.md # This documentation file
├── pyproject.toml # Root pyproject.toml (if using monorepo structure)
├── poetry.lock # Root poetry.lock (if using monorepo structure)
├── src/ # Source code (often for the root project if it's a monorepo)
├── tests/ # Project tests
└── ... (other project files like dist, docs, tox.ini, project.json etc.)
Running Tests
To run tests (as configured by your pyproject.toml
):
poetry run pytest
Links & Resources
- Grabba Website: https://www.grabba.dev/
- Grabba MCP Server Public Instance: https://mcp.grabba.dev/
- GitHub Repository: https://github.com/grabba-dev/grabba-mcp
- Docker Hub Image: https://hub.docker.com/r/itsobaa/grabba-mcp
- PyPI Package: https://pypi.org/project/grabba-mcp/
License
This project is licensed under the Proprietary License. Please see the LICENSE
file in the repository root for full details.
推荐服务器

Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。