MCP 服务器

Browser Tools MCP Extension

Enables AI tools to interact with your browser for enhanced frontend development, providing context-rich capabilities like API call analysis, screenshot capture, element inspection, and API testing with automatic authentication.

README

Browser Tools MCP Extension

🚀 Optimized for Autonomous AI-Powered Frontend Development Workflows

Browser Tools MCP Extension enables AI tools to interact with your browser for enhanced development capabilities. This document provides an overview of the available tools within the MCP server. For setup instructions, please refer to SETUP_GUIDE.md.

Motivation

At this point in time, I think the models are capable of doing a lot of things, but they are not able to do it in a way that is helpful to the user because of a lack of context.

We humans can do tasks accurately because we have a lot of context about the task we are doing, and we can use that context to make decisions.

Too much context also makes it hard for LLMs to make decisions. So, giving the right context at the right time is very important, and this will be the key to making LLMs more helpful to the user. MCP servers are one of the ways to provide context to LLMs at the right time.

One day, I came across AgentDeskAI's repo (https://github.com/AgentDeskAI/browser-tools-mcp). This repo consisted of a Chrome extension and an MCP server. It had tools like get browser logs, get network status, etc. This inspired me, and I started using these tools in my development workflow. I came to the realization that when I am writing code, I am juggling a lot of things and managing this context so I know what to write. So, what if we can provide this context to LLMs at the right time? AgentDeskAI was a huge inspiration and starting point for this project, and that is why you will see that this is a fork of that repository. Though at this moment, I am not using most of the tools they had in their repo except the getSelectedElement tool, they do have many interesting tools, and I am planning to use some again depending on how this setup works.

I am a Frontend Developer and Applied AI enthusiast, and I am working on this project to make already good AI coding IDEs better by creating a custom workflow on top of these tools. This workflow allows me to automate my work of frontend development and delegate the tasks to these AI IDEs, and they can autonomously work. This allows me to focus on important tasks like future-proof project setup. Oh yeah, one important thing to note is that currently, this workflow only works if the project is already set up and has basic things like auth context, API calling structure, routing, and how those routes are exposed, etc. All of this context should be set up in AI IDEs. I use Windsurf's Memories to store this context, which allows the agent to retrieve the important memories based on my prompt. You can use Cursor's Rule file also, but I don't know how well this will work because I haven't tried it.

Now, to make Frontend development autonomous, we have to understand what a frontend developer uses to code and how he/she thinks.

A frontend developer uses API documentation, browser, browser logs, browser errors, the ability to make API calls, functional requirement documents, developer tools, and his/her visual capability to see the UI and make decisions. Considering these aspects of frontend development, we can create an MCP server that can provide context to AI IDEs at the right time. So, I made tools that can access all these aspects of frontend development and provide context to AI IDEs at the right time. These tools include: analyzeApiCalls, takeScreenshot, getSelectedElement, analyzeImageFile, ingestFrdDocument, getFrdIngestionStatus, searchApiDocs... and more coming soon.

I plan to make such workflows for backend and QA testers also, but primarily I am a frontend guy, so I chose this first. If you are interested in this project, please let me know, and I will be happy to help you. We can create something big and awesome.

Available Tools

The following tools are available through the Browser Tools MCP server:

analyzeApiCalls
- Description: Analyzes API interactions between the frontend and backend by retrieving filtered network request details. This tool is useful for inspecting API calls to specific endpoints, debugging network errors and status codes, examining request/response payloads, investigating authentication headers, or monitoring AJAX requests. Results include timestamps to help distinguish between identical API calls made at different times.
- Parameters:
  - urlFilter (string, required): A substring or pattern to filter request URLs.
  - details (array of strings, required): Specific details to retrieve for each request. Possible values include: "url", "method", "status", "timestamp", "requestHeaders", "responseHeaders", "requestBody", "responseBody".
  - timeStart (number, optional): A Unix timestamp (in milliseconds) to filter requests that occurred after this time.
  - timeEnd (number, optional): A Unix timestamp (in milliseconds) to filter requests that occurred before this time.
  - orderBy (string, optional, default: "timestamp"): The field to order results by. Possible values: "timestamp", "url".
  - orderDirection (string, optional, default: "desc"): The direction for ordering. Possible values: "asc" (oldest first), "desc" (newest first).
  - limit (number, optional, default: 20): The maximum number of results to return.
- Functionality: This tool constructs a query based on the provided parameters and fetches network request details from the browser-connector server (typically at http://<host>:<port>/network-request-details). It then returns the filtered and ordered list of network interactions.
takeScreenshot ⭐ ENHANCED
- Description: Take a screenshot of the current browser tab and return the image data for immediate analysis. The screenshot is automatically organized by project and URL structure in a centralized directory system.
- Parameters:
  - filename (string, optional): Optional custom filename for the screenshot (without extension). If not provided, uses timestamp-based naming.
  - returnImageData (boolean, optional, default: true): Whether to return the base64 image data in the response for immediate analysis.
  - projectName (string, optional): Optional project name to override automatic project detection. Screenshots will be organized under this project folder.
- Functionality: Captures a screenshot via the Chrome extension with enhanced connection stability. Features 15-second timeout for autonomous operation reliability and organized storage system. Returns both file confirmation and base64 image data (if requested) for immediate analysis workflows.
getSelectedElement
- Description: Retrieves information about the HTML element currently selected by the user in the browser's DevTools (if any).
- Parameters: None.
- Functionality: This tool queries the browser-connector server (at http://<host>:<port>/selected-element) to get details of the element last inspected or selected by the user in the Chrome DevTools. It returns a JSON string containing information about the selected element.
analyzeImageFile
- Description: Load and analyze previously saved images or existing image files. Use this to access historical screenshots taken with takeScreenshot or any other image files in your project.
- Parameters:
  - imagePath (string, required): The path to the image file. This can be an absolute path or a path relative to the project root.
  - projectRoot (string, optional): An optional path to override the default project root directory. If not provided, it uses the PROJECT_ROOT environment variable or the directory of the MCP server.
- Functionality: The tool resolves the absolute path to the image, reads the file, converts its content to a base64 string, and determines its MIME type. It returns an object containing the fileName, mimeType, size (in bytes), and the base64Data of the image.
ingestFrdDocument
- Description: Takes a path to a Functional Requirements Document (FRD) or similar document (TXT, MD, CSV, PDF), processes it using LlamaIndex, and ingests its content into a Qdrant vector database for semantic search and analysis. This is an asynchronous operation.
- Parameters:
  - documentPath (string, required): The path to the document file.
  - projectRoot (string, optional): Optional override for the project root directory to resolve relative document paths.
  - collectionName (string, optional, default: "frd_documents"): The name of the Qdrant collection to use.
  - qdrantUrl (string, optional): The URL of the Qdrant server. Defaults to process.env.QDRANT_URL or http://localhost:6333.
  - qdrantApiKey (string, optional): The API key for Qdrant Cloud. Defaults to process.env.QDRANT_API_KEY.
  - vectorSize (number, optional, default: 768): The size of the vectors for embeddings (default is for Gemini text-embedding-004).
- Functionality:
  - Generates a unique task ID for tracking the ingestion process.
  - Resolves the absolute path to the document.
  - Asynchronously, it uses LlamaParseReader (from LlamaIndex) to parse the document. For PDF files, it's configured to extract text and describe images within the resulting markdown.
  - It then creates embeddings (using Google's Gemini model, requires GOOGLE_API_KEY) and stores them in the specified Qdrant collection.
  - If the Qdrant collection doesn't exist, it attempts to create it.
  - The tool immediately returns the taskId and the initial status. The actual ingestion happens in the background. You can use getFrdIngestionStatus to check the progress.
getFrdIngestionStatus
- Description: Retrieves the current status of an FRD document ingestion task previously initiated by ingestFrdDocument.
- Parameters:
  - taskId (string, required): The unique ID of the ingestion task.
- Functionality: It checks the internal ingestionTasks store for the status of the task associated with the given taskId. It returns details such as the current status (e.g., "STARTED", "PROCESSING", "COMPLETED", "FAILED"), any message, startTime, endTime, documentPath, and collectionName.
searchApiDocs
- Description: Searches through an OpenAPI (Swagger) specification to find API endpoints that match a given pattern. This helps in understanding API structures, parameters, and responses.
- Parameters:
  - swaggerSource (string, required): The source of the Swagger/OpenAPI specification. This can be a URL, a local file path, or a JSON string containing the specification. Defaults to the SWAGGER_URL environment variable if not provided.
  - apiPattern (string, required): A regular expression pattern to match against API paths or operationIds.
  - includeSchemas (boolean, optional, default: true): If true, the tool will attempt to resolve and include the full schema definitions for parameters, request bodies, and responses referenced via $ref.
- Functionality:
  - Loads the OpenAPI specification from the swaggerSource.
  - Iterates through all defined paths and operations in the specification.
  - Matches the apiPattern against the endpoint path and its operationId.
  - For matching endpoints, it extracts details like the HTTP method, summary, description, parameters, request body, and responses.
  - If includeSchemas is true, it resolves and embeds any referenced JSON schemas directly into the output for the matching endpoints.
executeAuthenticatedApiCall (NEW - Unified API Testing Tool)
- Description: Automatically retrieves authentication tokens from browser session and executes authenticated API calls. This eliminates token retrieval hallucination and ensures consistent API testing with real authentication.
- Parameters:
  - endpoint (string, required): The API endpoint path (e.g., '/api/users', '/auth/profile'). Combined with API_BASE_URL from environment.
  - method (enum, optional, default: "GET"): HTTP method for the API call (GET, POST, PUT, PATCH, DELETE).
  - requestBody (any, optional): Request body for POST/PUT/PATCH requests (automatically JSON stringified).
  - queryParams (object, optional): Query parameters as key-value pairs.
  - additionalHeaders (object, optional): Additional headers to include in the request.
  - includeResponseDetails (boolean, optional, default: true): Whether to include detailed response analysis (status, headers, timing).
- Environment Variables Required:
  - AUTH_ORIGIN: The origin where your app is running (e.g., "http://localhost:5173")
  - AUTH_STORAGE_TYPE: Where the auth token is stored ("cookie", "localStorage", or "sessionStorage")
  - AUTH_TOKEN_KEY: The key name for the auth token (e.g., "authToken", "accessToken")
  - API_BASE_URL: Your API base URL (e.g., "https://api.example.com")
- Functionality:
  - Automatically retrieves auth token from browser session using predefined environment configuration
  - Constructs full API URL and adds query parameters if provided
  - Makes authenticated API request with proper Authorization header
  - Returns structured response with actual API data and optional detailed metrics
  - Eliminates manual token handling and curl command execution
getAccessToken (DEPRECATED)
- Description: Legacy tool for manual token retrieval. Use executeAuthenticatedApiCall instead for better reliability.
- Status: Kept for backward compatibility but deprecated in favor of the unified approach.
  - Returns a JSON string containing an array of the matching API endpoint details.

🤖 Autonomous Operation Features

Enhanced Connection Stability

Intelligent Heartbeat System: 25-second intervals with 60-second timeouts
Fast Recovery: 3-15 second reconnection times for minimal workflow disruption
Exponential Backoff: Smart retry logic with up to 10 attempts
Individual Request Tracking: Prevents callback conflicts during concurrent operations
Connection Health Monitoring: Real-time status endpoint at /connection-health

Autonomous AI Workflow Optimizations

Extended Screenshot Timeouts: 15-second timeouts for network tolerance
Enhanced Error Handling: Detailed connection state reporting for debugging
Streamlined Discovery: Essential IP scanning (300ms timeouts) for faster server detection
Background Retry Logic: 5 retry attempts with server validation
Network Tolerance: Increased timeouts for unreliable network conditions

Connection Health API

Access real-time connection status at: http://localhost:3026/connection-health

{
  "connected": true,
  "healthy": true,
  "connectionId": "conn_1234567890_abc123",
  "lastHeartbeat": 1701234567890,
  "timeSinceLastHeartbeat": 5000,
  "heartbeatTimeout": 60000,
  "heartbeatInterval": 25000,
  "pendingScreenshots": 0,
  "uptime": 3600.45,
  "timestamp": "2024-12-01T10:30:00.000Z"
}

See SETUP_GUIDE.md for detailed configuration instructions and AUTONOMOUS_OPERATION_TESTING_REPORT.md for testing results.

Environment Variables

The server supports several environment variables for configuration:

API Testing & Authentication

AUTH_ORIGIN: Origin where your app runs (e.g., "http://localhost:5173")
AUTH_STORAGE_TYPE: Token storage location ("cookie", "localStorage", "sessionStorage")
AUTH_TOKEN_KEY: Token key name (e.g., "authToken", "accessToken")
API_BASE_URL: Your API base URL (e.g., "https://api.example.com")

Document & API Discovery

SWAGGER_URL: Swagger/OpenAPI JSON URL for API documentation search
PROJECT_ROOT: Project root directory for file operations and image analysis

Screenshot Management

SCREENSHOT_STORAGE_PATH: Custom directory for screenshot storage (defaults to Downloads folder)

Vector Database (for FRD document ingestion)

GOOGLE_API_KEY: Google API key for embeddings
QDRANT_API_KEY: Qdrant vector database API key
QDRANT_URL: Qdrant server URL (defaults to http://localhost:6333)

Connection Stability & Autonomous Operation

BROWSER_TOOLS_HOST: Server host override (defaults to "127.0.0.1")
BROWSER_TOOLS_PORT: Server port override (defaults to 3025)

Browser Tools MCP Extension

README

Browser Tools MCP Extension

Motivation

Available Tools

🤖 Autonomous Operation Features

Enhanced Connection Stability

Autonomous AI Workflow Optimizations

Connection Health API

Environment Variables

API Testing & Authentication

Document & API Discovery

Screenshot Management

Vector Database (for FRD document ingestion)

Connection Stability & Autonomous Operation

推荐服务器