Competitor Hunter
AI-powered competitor analysis agent that scrapes product pages and extracts structured data using LLMs, integrated via MCP for use with Claude Desktop and other clients.
README
🎯 Competitor Hunter
AI-Powered Competitor Analysis Agent | Automated web scraping and structured data extraction using MCP, LangGraph, and Playwright
📖 Introduction
Competitor Hunter is a production-ready AI agent that automates competitor analysis by scraping product pages and extracting structured information using Large Language Models. Built on the Model Context Protocol (MCP), it seamlessly integrates with Claude Desktop and other MCP-compatible clients.
Key Capabilities
- 🔍 Intelligent Web Scraping: Automated browser-based content extraction with anti-detection features
- 🤖 LLM-Powered Extraction: Structured data extraction using OpenAI-compatible APIs
- 📊 Structured Output: Pydantic-validated product information (pricing, features, SWOT analysis)
- 🔄 LangGraph Workflow: Robust state management and error handling
- 🔌 MCP Integration: Native support for Claude Desktop and MCP clients
🏗️ Architecture
The system follows Hexagonal Architecture with clear separation of concerns. Workflow: User Request → MCP Server → LangGraph Workflow → Browser Scraping → LLM Extraction → Structured Data Response.
✨ Core Features
- 🤖 AI-Powered: Intelligent extraction using LLM with automatic SWOT analysis
- 📊 Structured Output: Pydantic-validated data models (pricing, features, summary)
- 🛡️ Anti-Detection: Random User-Agents, intelligent scrolling, auto-screenshots
- 🔌 MCP Native: Seamless integration with Claude Desktop and Cursor IDE
- 📦 CLI Tool: Professional command-line interface via
competitor-huntercommand - Async/Await: Full asynchronous programming for optimal performance
🚀 Quick Start
Prerequisites
- Python 3.10+ (3.11 or 3.12 recommended)
- UV or Poetry (dependency manager)
- Playwright browsers (installed automatically)
Installation
-
Clone the repository:
git clone https://github.com/your-username/competitor-hunter.git cd competitor-hunter -
Install dependencies (using UV):
uv syncOr using Poetry:
poetry installOr using pip:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -e ".[dev]" -
Install Playwright browsers:
playwright install chromium
Configuration
Create a .env file in the project root:
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # Optional: for custom endpoints
OPENAI_MODEL_NAME=gpt-4o # Optional: default is gpt-4o
# Browser Configuration
HEADLESS_MODE=true # Set to false for debugging
# Database Configuration
DB_PATH=data/competitors.db # SQLite database path
💡 Tip: Copy .env.example to .env and fill in your values:
cp .env.example .env
📸 Screenshots & Examples
Analysis Results

Screenshot of Notion pricing page analysis
CLI Output Example
$ competitor-hunter https://www.notion.so/pricing
🔍 正在分析: https://www.notion.so/pricing
✅ 分析完成!
======================================================================
📦 产品名称: Notion
🔗 URL: https://www.notion.so/pricing
🕒 更新时间: 2024-06-13 00:00:00+00:00
======================================================================
💰 定价方案 (4 个):
• Free: 0 USD / monthly
• Plus: 10 USD / monthly
• Business: 20 USD / monthly
• Enterprise: Custom USD / custom
✨ 核心功能 (13 个):
1. AI automation
2. Enterprise search
3. Meeting notes
...
💾 结果已保存到: reports/product_Notion.json
JSON Output Structure
The analysis results are saved as structured JSON files:
{
"product_name": "Notion",
"url": "https://www.notion.so/pricing",
"pricing_tiers": [
{
"name": "Free",
"price": "0",
"currency": "USD",
"billing_cycle": "monthly"
},
{
"name": "Plus",
"price": "10",
"currency": "USD",
"billing_cycle": "monthly"
}
],
"core_features": [
"AI automation",
"Docs",
"Knowledge Base"
],
"summary": "## 产品概述\nNotion 是一款集文档编辑...",
"last_updated": "2024-06-13T00:00:00Z"
}
📚 Usage
Method 1: CLI Command (Easiest)
After installation, use the competitor-hunter command:
# Analyze a single website
competitor-hunter https://www.notion.so/pricing
# Specify output file
competitor-hunter https://example.com output.json
# Batch analysis
competitor-hunter https://site1.com https://site2.com https://site3.com
Results are automatically saved to the reports/ directory with proper UTF-8 encoding.
Method 2: MCP Server Mode (Recommended for AI Assistants)
Run the MCP server to enable integration with Claude Desktop or Cursor:
python -m src.competitor_hunter.interface.mcp_server.server
Claude Desktop Integration
Add the following configuration to your Claude Desktop claude_desktop_config.json:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"competitor-hunter": {
"command": "python",
"args": [
"-m",
"src.competitor_hunter.interface.mcp_server.server"
],
"cwd": "/path/to/competitor-hunter"
}
}
}
Cursor IDE Integration
Create .cursor/mcp.json in your project root:
{
"mcpServers": {
"competitor-hunter": {
"command": "python",
"args": [
"-m",
"src.competitor_hunter.interface.mcp_server.server"
],
"cwd": "${workspaceFolder}"
}
}
}
After restarting, you can use the tool directly in chat:
Analyze this competitor: https://www.notion.so/pricing
Method 3: Python Library
Use the LangGraph workflow directly in your Python code:
import asyncio
from competitor_hunter.core import graph, AgentState, cleanup_resources
async def analyze(url: str):
# Initialize state
initial_state: AgentState = {
"url": url,
"scraped_content": None,
"product": None,
"error": None,
}
# Run workflow
result = await graph.ainvoke(initial_state)
# Check results
if result.get("error"):
print(f"Error: {result['error']}")
return None
product = result["product"]
print(f"Product: {product.product_name}")
print(f"Pricing Tiers: {len(product.pricing_tiers)}")
print(f"Features: {product.core_features}")
return product
# Use
product = await analyze("https://www.notion.so/pricing")
await cleanup_resources()
Output Structure
All analysis results are saved to the reports/ directory:
reports/
├── product_Notion.json
├── product_Example_Domain.json
└── ...
Each JSON file contains:
- Product name and URL
- Pricing tiers (name, price, currency, billing cycle)
- Core features list
- Markdown-formatted summary with SWOT analysis
- Last updated timestamp
🧪 Development
Running Tests
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_crawler.py -v
# Run with coverage
pytest tests/ --cov=src/competitor_hunter --cov-report=html
Code Quality
# Format code
black src/ tests/
# Lint code
ruff check src/ tests/
# Type checking (if using mypy)
mypy src/
Project Structure
competitor-hunter/
├── src/
│ └── competitor_hunter/
│ ├── cli.py # CLI command-line interface
│ ├── main.py # Application entry point
│ ├── config.py # Configuration management
│ ├── core/ # Domain models & LangGraph workflow
│ │ ├── models.py # Pydantic models (CompetitorProduct, etc.)
│ │ └── graph.py # LangGraph workflow definition
│ ├── infrastructure/ # External services
│ │ ├── browser/ # Playwright browser service
│ │ └── llm/ # LLM extractor service
│ └── interface/ # Entry points
│ └── mcp_server/ # MCP server implementation
├── config/ # Configuration files
│ └── app.yaml.example # Configuration template
├── docker/ # Docker configuration
│ ├── Dockerfile # Docker image definition
│ └── docker-compose.yml # Docker Compose configuration
├── examples/ # Example scripts
├── tests/ # Test suite
├── reports/ # Analysis results (gitignored)
├── data/ # SQLite database (gitignored)
├── logs/ # Screenshots & logs (gitignored)
├── pyproject.toml # Project dependencies & CLI entry points
└── README.md # This file
📦 Dependencies
Core Dependencies
- mcp: Model Context Protocol server implementation
- langgraph: Workflow orchestration
- langchain: LLM integration framework
- playwright: Browser automation
- pydantic: Data validation and serialization
- html2text: HTML to Markdown conversion
- loguru: Structured logging
Development Dependencies
- pytest: Testing framework
- pytest-asyncio: Async test support
- ruff: Fast Python linter
- black: Code formatter
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built with LangGraph for workflow orchestration
- Powered by Playwright for browser automation
- Integrated with Model Context Protocol (MCP) for AI agent communication
📞 Support
For issues, questions, or contributions, please open an issue on GitHub.
Made with ❤️ for competitive intelligence
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。