Competitor Hunter

Competitor Hunter

AI-powered competitor analysis agent that scrapes product pages and extracts structured data using LLMs, integrated via MCP for use with Claude Desktop and other clients.

Category
访问服务器

README

🎯 Competitor Hunter

AI-Powered Competitor Analysis Agent | Automated web scraping and structured data extraction using MCP, LangGraph, and Playwright

License: MIT Python 3.10+ Code style: black

Language | 语言: English | 中文


📖 Introduction

Competitor Hunter is a production-ready AI agent that automates competitor analysis by scraping product pages and extracting structured information using Large Language Models. Built on the Model Context Protocol (MCP), it seamlessly integrates with Claude Desktop and other MCP-compatible clients.

Key Capabilities

  • 🔍 Intelligent Web Scraping: Automated browser-based content extraction with anti-detection features
  • 🤖 LLM-Powered Extraction: Structured data extraction using OpenAI-compatible APIs
  • 📊 Structured Output: Pydantic-validated product information (pricing, features, SWOT analysis)
  • 🔄 LangGraph Workflow: Robust state management and error handling
  • 🔌 MCP Integration: Native support for Claude Desktop and MCP clients

🏗️ Architecture

The system follows Hexagonal Architecture with clear separation of concerns. Workflow: User Request → MCP Server → LangGraph Workflow → Browser Scraping → LLM Extraction → Structured Data Response.


✨ Core Features

  • 🤖 AI-Powered: Intelligent extraction using LLM with automatic SWOT analysis
  • 📊 Structured Output: Pydantic-validated data models (pricing, features, summary)
  • 🛡️ Anti-Detection: Random User-Agents, intelligent scrolling, auto-screenshots
  • 🔌 MCP Native: Seamless integration with Claude Desktop and Cursor IDE
  • 📦 CLI Tool: Professional command-line interface via competitor-hunter command
  • Async/Await: Full asynchronous programming for optimal performance

🚀 Quick Start

Prerequisites

  • Python 3.10+ (3.11 or 3.12 recommended)
  • UV or Poetry (dependency manager)
  • Playwright browsers (installed automatically)

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/competitor-hunter.git
    cd competitor-hunter
    
  2. Install dependencies (using UV):

    uv sync
    

    Or using Poetry:

    poetry install
    

    Or using pip:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -e ".[dev]"
    
  3. Install Playwright browsers:

    playwright install chromium
    

Configuration

Create a .env file in the project root:

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional: for custom endpoints
OPENAI_MODEL_NAME=gpt-4o                    # Optional: default is gpt-4o

# Browser Configuration
HEADLESS_MODE=true                          # Set to false for debugging

# Database Configuration
DB_PATH=data/competitors.db                 # SQLite database path

💡 Tip: Copy .env.example to .env and fill in your values:

cp .env.example .env

📸 Screenshots & Examples

Analysis Results

Notion Pricing Analysis

Screenshot of Notion pricing page analysis

CLI Output Example

$ competitor-hunter https://www.notion.so/pricing
🔍 正在分析: https://www.notion.so/pricing

✅ 分析完成!

======================================================================
📦 产品名称: Notion
🔗 URL: https://www.notion.so/pricing
🕒 更新时间: 2024-06-13 00:00:00+00:00
======================================================================

💰 定价方案 (4 个):
   • Free: 0 USD / monthly
   • Plus: 10 USD / monthly
   • Business: 20 USD / monthly
   • Enterprise: Custom USD / custom

✨ 核心功能 (13 个):
   1. AI automation
   2. Enterprise search
   3. Meeting notes
   ...

💾 结果已保存到: reports/product_Notion.json

JSON Output Structure

The analysis results are saved as structured JSON files:

{
  "product_name": "Notion",
  "url": "https://www.notion.so/pricing",
  "pricing_tiers": [
    {
      "name": "Free",
      "price": "0",
      "currency": "USD",
      "billing_cycle": "monthly"
    },
    {
      "name": "Plus",
      "price": "10",
      "currency": "USD",
      "billing_cycle": "monthly"
    }
  ],
  "core_features": [
    "AI automation",
    "Docs",
    "Knowledge Base"
  ],
  "summary": "## 产品概述\nNotion 是一款集文档编辑...",
  "last_updated": "2024-06-13T00:00:00Z"
}

📚 Usage

Method 1: CLI Command (Easiest)

After installation, use the competitor-hunter command:

# Analyze a single website
competitor-hunter https://www.notion.so/pricing

# Specify output file
competitor-hunter https://example.com output.json

# Batch analysis
competitor-hunter https://site1.com https://site2.com https://site3.com

Results are automatically saved to the reports/ directory with proper UTF-8 encoding.

Method 2: MCP Server Mode (Recommended for AI Assistants)

Run the MCP server to enable integration with Claude Desktop or Cursor:

python -m src.competitor_hunter.interface.mcp_server.server

Claude Desktop Integration

Add the following configuration to your Claude Desktop claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "competitor-hunter": {
      "command": "python",
      "args": [
        "-m",
        "src.competitor_hunter.interface.mcp_server.server"
      ],
      "cwd": "/path/to/competitor-hunter"
    }
  }
}

Cursor IDE Integration

Create .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "competitor-hunter": {
      "command": "python",
      "args": [
        "-m",
        "src.competitor_hunter.interface.mcp_server.server"
      ],
      "cwd": "${workspaceFolder}"
    }
  }
}

After restarting, you can use the tool directly in chat:

Analyze this competitor: https://www.notion.so/pricing

Method 3: Python Library

Use the LangGraph workflow directly in your Python code:

import asyncio
from competitor_hunter.core import graph, AgentState, cleanup_resources

async def analyze(url: str):
    # Initialize state
    initial_state: AgentState = {
        "url": url,
        "scraped_content": None,
        "product": None,
        "error": None,
    }
    
    # Run workflow
    result = await graph.ainvoke(initial_state)
    
    # Check results
    if result.get("error"):
        print(f"Error: {result['error']}")
        return None
    
    product = result["product"]
    print(f"Product: {product.product_name}")
    print(f"Pricing Tiers: {len(product.pricing_tiers)}")
    print(f"Features: {product.core_features}")
    
    return product

# Use
product = await analyze("https://www.notion.so/pricing")
await cleanup_resources()

Output Structure

All analysis results are saved to the reports/ directory:

reports/
├── product_Notion.json
├── product_Example_Domain.json
└── ...

Each JSON file contains:

  • Product name and URL
  • Pricing tiers (name, price, currency, billing cycle)
  • Core features list
  • Markdown-formatted summary with SWOT analysis
  • Last updated timestamp

🧪 Development

Running Tests

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_crawler.py -v

# Run with coverage
pytest tests/ --cov=src/competitor_hunter --cov-report=html

Code Quality

# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type checking (if using mypy)
mypy src/

Project Structure

competitor-hunter/
├── src/
│   └── competitor_hunter/
│       ├── cli.py             # CLI command-line interface
│       ├── main.py            # Application entry point
│       ├── config.py          # Configuration management
│       ├── core/               # Domain models & LangGraph workflow
│       │   ├── models.py       # Pydantic models (CompetitorProduct, etc.)
│       │   └── graph.py        # LangGraph workflow definition
│       ├── infrastructure/     # External services
│       │   ├── browser/        # Playwright browser service
│       │   └── llm/            # LLM extractor service
│       └── interface/          # Entry points
│           └── mcp_server/     # MCP server implementation
├── config/                     # Configuration files
│   └── app.yaml.example        # Configuration template
├── docker/                     # Docker configuration
│   ├── Dockerfile              # Docker image definition
│   └── docker-compose.yml     # Docker Compose configuration
├── examples/                   # Example scripts
├── tests/                      # Test suite
├── reports/                    # Analysis results (gitignored)
├── data/                       # SQLite database (gitignored)
├── logs/                       # Screenshots & logs (gitignored)
├── pyproject.toml              # Project dependencies & CLI entry points
└── README.md                   # This file

📦 Dependencies

Core Dependencies

  • mcp: Model Context Protocol server implementation
  • langgraph: Workflow orchestration
  • langchain: LLM integration framework
  • playwright: Browser automation
  • pydantic: Data validation and serialization
  • html2text: HTML to Markdown conversion
  • loguru: Structured logging

Development Dependencies

  • pytest: Testing framework
  • pytest-asyncio: Async test support
  • ruff: Fast Python linter
  • black: Code formatter

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments


📞 Support

For issues, questions, or contributions, please open an issue on GitHub.


Made with ❤️ for competitive intelligence

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选