Browser Tools MCP Extension

Browser Tools MCP Extension

Enables AI tools to interact with your browser for enhanced frontend development, providing context-rich capabilities like API call analysis, screenshot capture, element inspection, and API testing with automatic authentication.

Category
访问服务器

README

Browser Tools MCP Extension

🚀 Optimized for Autonomous AI-Powered Frontend Development Workflows

Browser Tools MCP Extension enables AI tools to interact with your browser for enhanced development capabilities. This document provides an overview of the available tools within the MCP server. For setup instructions, please refer to SETUP_GUIDE.md.

Motivation

At this point in time, I think the models are capable of doing a lot of things, but they are not able to do it in a way that is helpful to the user because of a lack of context.

We humans can do tasks accurately because we have a lot of context about the task we are doing, and we can use that context to make decisions.

Too much context also makes it hard for LLMs to make decisions. So, giving the right context at the right time is very important, and this will be the key to making LLMs more helpful to the user. MCP servers are one of the ways to provide context to LLMs at the right time.

One day, I came across AgentDeskAI's repo (https://github.com/AgentDeskAI/browser-tools-mcp). This repo consisted of a Chrome extension and an MCP server. It had tools like get browser logs, get network status, etc. This inspired me, and I started using these tools in my development workflow. I came to the realization that when I am writing code, I am juggling a lot of things and managing this context so I know what to write. So, what if we can provide this context to LLMs at the right time? AgentDeskAI was a huge inspiration and starting point for this project, and that is why you will see that this is a fork of that repository. Though at this moment, I am not using most of the tools they had in their repo except the getSelectedElement tool, they do have many interesting tools, and I am planning to use some again depending on how this setup works.

I am a Frontend Developer and Applied AI enthusiast, and I am working on this project to make already good AI coding IDEs better by creating a custom workflow on top of these tools. This workflow allows me to automate my work of frontend development and delegate the tasks to these AI IDEs, and they can autonomously work. This allows me to focus on important tasks like future-proof project setup. Oh yeah, one important thing to note is that currently, this workflow only works if the project is already set up and has basic things like auth context, API calling structure, routing, and how those routes are exposed, etc. All of this context should be set up in AI IDEs. I use Windsurf's Memories to store this context, which allows the agent to retrieve the important memories based on my prompt. You can use Cursor's Rule file also, but I don't know how well this will work because I haven't tried it.

Now, to make Frontend development autonomous, we have to understand what a frontend developer uses to code and how he/she thinks.

A frontend developer uses API documentation, browser, browser logs, browser errors, the ability to make API calls, functional requirement documents, developer tools, and his/her visual capability to see the UI and make decisions. Considering these aspects of frontend development, we can create an MCP server that can provide context to AI IDEs at the right time. So, I made tools that can access all these aspects of frontend development and provide context to AI IDEs at the right time. These tools include: analyzeApiCalls, takeScreenshot, getSelectedElement, analyzeImageFile, ingestFrdDocument, getFrdIngestionStatus, searchApiDocs... and more coming soon.

I plan to make such workflows for backend and QA testers also, but primarily I am a frontend guy, so I chose this first. If you are interested in this project, please let me know, and I will be happy to help you. We can create something big and awesome.


Available Tools

The following tools are available through the Browser Tools MCP server:

  1. analyzeApiCalls

    • Description: Analyzes API interactions between the frontend and backend by retrieving filtered network request details. This tool is useful for inspecting API calls to specific endpoints, debugging network errors and status codes, examining request/response payloads, investigating authentication headers, or monitoring AJAX requests. Results include timestamps to help distinguish between identical API calls made at different times.
    • Parameters:
      • urlFilter (string, required): A substring or pattern to filter request URLs.
      • details (array of strings, required): Specific details to retrieve for each request. Possible values include: "url", "method", "status", "timestamp", "requestHeaders", "responseHeaders", "requestBody", "responseBody".
      • timeStart (number, optional): A Unix timestamp (in milliseconds) to filter requests that occurred after this time.
      • timeEnd (number, optional): A Unix timestamp (in milliseconds) to filter requests that occurred before this time.
      • orderBy (string, optional, default: "timestamp"): The field to order results by. Possible values: "timestamp", "url".
      • orderDirection (string, optional, default: "desc"): The direction for ordering. Possible values: "asc" (oldest first), "desc" (newest first).
      • limit (number, optional, default: 20): The maximum number of results to return.
    • Functionality: This tool constructs a query based on the provided parameters and fetches network request details from the browser-connector server (typically at http://<host>:<port>/network-request-details). It then returns the filtered and ordered list of network interactions.
  2. takeScreenshotENHANCED

    • Description: Take a screenshot of the current browser tab and return the image data for immediate analysis. The screenshot is automatically organized by project and URL structure in a centralized directory system.
    • Parameters:
      • filename (string, optional): Optional custom filename for the screenshot (without extension). If not provided, uses timestamp-based naming.
      • returnImageData (boolean, optional, default: true): Whether to return the base64 image data in the response for immediate analysis.
      • projectName (string, optional): Optional project name to override automatic project detection. Screenshots will be organized under this project folder.
    • Functionality: Captures a screenshot via the Chrome extension with enhanced connection stability. Features 15-second timeout for autonomous operation reliability and organized storage system. Returns both file confirmation and base64 image data (if requested) for immediate analysis workflows.
  3. getSelectedElement

    • Description: Retrieves information about the HTML element currently selected by the user in the browser's DevTools (if any).
    • Parameters: None.
    • Functionality: This tool queries the browser-connector server (at http://<host>:<port>/selected-element) to get details of the element last inspected or selected by the user in the Chrome DevTools. It returns a JSON string containing information about the selected element.
  4. analyzeImageFile

    • Description: Load and analyze previously saved images or existing image files. Use this to access historical screenshots taken with takeScreenshot or any other image files in your project.
    • Parameters:
      • imagePath (string, required): The path to the image file. This can be an absolute path or a path relative to the project root.
      • projectRoot (string, optional): An optional path to override the default project root directory. If not provided, it uses the PROJECT_ROOT environment variable or the directory of the MCP server.
    • Functionality: The tool resolves the absolute path to the image, reads the file, converts its content to a base64 string, and determines its MIME type. It returns an object containing the fileName, mimeType, size (in bytes), and the base64Data of the image.
  5. ingestFrdDocument

    • Description: Takes a path to a Functional Requirements Document (FRD) or similar document (TXT, MD, CSV, PDF), processes it using LlamaIndex, and ingests its content into a Qdrant vector database for semantic search and analysis. This is an asynchronous operation.
    • Parameters:
      • documentPath (string, required): The path to the document file.
      • projectRoot (string, optional): Optional override for the project root directory to resolve relative document paths.
      • collectionName (string, optional, default: "frd_documents"): The name of the Qdrant collection to use.
      • qdrantUrl (string, optional): The URL of the Qdrant server. Defaults to process.env.QDRANT_URL or http://localhost:6333.
      • qdrantApiKey (string, optional): The API key for Qdrant Cloud. Defaults to process.env.QDRANT_API_KEY.
      • vectorSize (number, optional, default: 768): The size of the vectors for embeddings (default is for Gemini text-embedding-004).
    • Functionality:
      • Generates a unique task ID for tracking the ingestion process.
      • Resolves the absolute path to the document.
      • Asynchronously, it uses LlamaParseReader (from LlamaIndex) to parse the document. For PDF files, it's configured to extract text and describe images within the resulting markdown.
      • It then creates embeddings (using Google's Gemini model, requires GOOGLE_API_KEY) and stores them in the specified Qdrant collection.
      • If the Qdrant collection doesn't exist, it attempts to create it.
      • The tool immediately returns the taskId and the initial status. The actual ingestion happens in the background. You can use getFrdIngestionStatus to check the progress.
  6. getFrdIngestionStatus

    • Description: Retrieves the current status of an FRD document ingestion task previously initiated by ingestFrdDocument.
    • Parameters:
      • taskId (string, required): The unique ID of the ingestion task.
    • Functionality: It checks the internal ingestionTasks store for the status of the task associated with the given taskId. It returns details such as the current status (e.g., "STARTED", "PROCESSING", "COMPLETED", "FAILED"), any message, startTime, endTime, documentPath, and collectionName.
  7. searchApiDocs

    • Description: Searches through an OpenAPI (Swagger) specification to find API endpoints that match a given pattern. This helps in understanding API structures, parameters, and responses.
    • Parameters:
      • swaggerSource (string, required): The source of the Swagger/OpenAPI specification. This can be a URL, a local file path, or a JSON string containing the specification. Defaults to the SWAGGER_URL environment variable if not provided.
      • apiPattern (string, required): A regular expression pattern to match against API paths or operationIds.
      • includeSchemas (boolean, optional, default: true): If true, the tool will attempt to resolve and include the full schema definitions for parameters, request bodies, and responses referenced via $ref.
    • Functionality:
      • Loads the OpenAPI specification from the swaggerSource.
      • Iterates through all defined paths and operations in the specification.
      • Matches the apiPattern against the endpoint path and its operationId.
      • For matching endpoints, it extracts details like the HTTP method, summary, description, parameters, request body, and responses.
      • If includeSchemas is true, it resolves and embeds any referenced JSON schemas directly into the output for the matching endpoints.
  8. executeAuthenticatedApiCall (NEW - Unified API Testing Tool)

    • Description: Automatically retrieves authentication tokens from browser session and executes authenticated API calls. This eliminates token retrieval hallucination and ensures consistent API testing with real authentication.
    • Parameters:
      • endpoint (string, required): The API endpoint path (e.g., '/api/users', '/auth/profile'). Combined with API_BASE_URL from environment.
      • method (enum, optional, default: "GET"): HTTP method for the API call (GET, POST, PUT, PATCH, DELETE).
      • requestBody (any, optional): Request body for POST/PUT/PATCH requests (automatically JSON stringified).
      • queryParams (object, optional): Query parameters as key-value pairs.
      • additionalHeaders (object, optional): Additional headers to include in the request.
      • includeResponseDetails (boolean, optional, default: true): Whether to include detailed response analysis (status, headers, timing).
    • Environment Variables Required:
      • AUTH_ORIGIN: The origin where your app is running (e.g., "http://localhost:5173")
      • AUTH_STORAGE_TYPE: Where the auth token is stored ("cookie", "localStorage", or "sessionStorage")
      • AUTH_TOKEN_KEY: The key name for the auth token (e.g., "authToken", "accessToken")
      • API_BASE_URL: Your API base URL (e.g., "https://api.example.com")
    • Functionality:
      • Automatically retrieves auth token from browser session using predefined environment configuration
      • Constructs full API URL and adds query parameters if provided
      • Makes authenticated API request with proper Authorization header
      • Returns structured response with actual API data and optional detailed metrics
      • Eliminates manual token handling and curl command execution
  9. getAccessToken (DEPRECATED)

    • Description: Legacy tool for manual token retrieval. Use executeAuthenticatedApiCall instead for better reliability.
    • Status: Kept for backward compatibility but deprecated in favor of the unified approach.
      • Returns a JSON string containing an array of the matching API endpoint details.

🤖 Autonomous Operation Features

Enhanced Connection Stability

  • Intelligent Heartbeat System: 25-second intervals with 60-second timeouts
  • Fast Recovery: 3-15 second reconnection times for minimal workflow disruption
  • Exponential Backoff: Smart retry logic with up to 10 attempts
  • Individual Request Tracking: Prevents callback conflicts during concurrent operations
  • Connection Health Monitoring: Real-time status endpoint at /connection-health

Autonomous AI Workflow Optimizations

  • Extended Screenshot Timeouts: 15-second timeouts for network tolerance
  • Enhanced Error Handling: Detailed connection state reporting for debugging
  • Streamlined Discovery: Essential IP scanning (300ms timeouts) for faster server detection
  • Background Retry Logic: 5 retry attempts with server validation
  • Network Tolerance: Increased timeouts for unreliable network conditions

Connection Health API

Access real-time connection status at: http://localhost:3026/connection-health

{
  "connected": true,
  "healthy": true,
  "connectionId": "conn_1234567890_abc123",
  "lastHeartbeat": 1701234567890,
  "timeSinceLastHeartbeat": 5000,
  "heartbeatTimeout": 60000,
  "heartbeatInterval": 25000,
  "pendingScreenshots": 0,
  "uptime": 3600.45,
  "timestamp": "2024-12-01T10:30:00.000Z"
}

See SETUP_GUIDE.md for detailed configuration instructions and AUTONOMOUS_OPERATION_TESTING_REPORT.md for testing results.

Environment Variables

The server supports several environment variables for configuration:

API Testing & Authentication

  • AUTH_ORIGIN: Origin where your app runs (e.g., "http://localhost:5173")
  • AUTH_STORAGE_TYPE: Token storage location ("cookie", "localStorage", "sessionStorage")
  • AUTH_TOKEN_KEY: Token key name (e.g., "authToken", "accessToken")
  • API_BASE_URL: Your API base URL (e.g., "https://api.example.com")

Document & API Discovery

  • SWAGGER_URL: Swagger/OpenAPI JSON URL for API documentation search
  • PROJECT_ROOT: Project root directory for file operations and image analysis

Screenshot Management

  • SCREENSHOT_STORAGE_PATH: Custom directory for screenshot storage (defaults to Downloads folder)

Vector Database (for FRD document ingestion)

  • GOOGLE_API_KEY: Google API key for embeddings
  • QDRANT_API_KEY: Qdrant vector database API key
  • QDRANT_URL: Qdrant server URL (defaults to http://localhost:6333)

Connection Stability & Autonomous Operation

  • BROWSER_TOOLS_HOST: Server host override (defaults to "127.0.0.1")
  • BROWSER_TOOLS_PORT: Server port override (defaults to 3025)

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选