android-mcp

android-mcp

A lightweight MCP server for Android operating system automation. This server provides tools to interact directly with Android devices and app interaction with control

Category
访问服务器

README

🎯 SDLC Agent Workflow

AI-Powered Software Development Life Cycle Automation Platform

Version Python License Status

Transform your software development process with AI-powered automation. From meeting transcriptions to complete technical documentation, streamline your entire SDLC workflow.


🚀 What is SDLC Agent Workflow?

The SDLC Agent Workflow is a production-ready AI platform that automates key aspects of software development, starting with audio transcription and document generation, with a comprehensive roadmap to become a complete SDLC automation solution.

🎯 Current Capabilities (Production Ready ✅)

  • 🎤 Audio Transcription: High-quality transcription using OpenAI Whisper models
  • 🤖 AI Meeting Analysis: Generate key meeting points and summaries with OpenAI GPT
  • 📋 PRD Generation: Transform discussions into industry-standard Product Requirements Documents
  • 🔧 Android TRD Generation: Convert PRDs into comprehensive Android Technical Requirements Documents
  • 🎨 Figma MCP Integration: Model Context Protocol server for comprehensive Figma design data extraction
  • 📱 Android MCP Integration: AI-powered Android device automation with LLM integration for intelligent mobile testing and interaction
  • 📁 Multi-Format Support: MP3, WAV, M4A, FLAC, AAC, OGG, WMA, MP4, MOV, AVI
  • ⚙️ Configurable Settings: Extensive customization through environment variables

🔮 Future Vision (2025-2026 Roadmap)

Complete SDLC automation platform covering:

  • Requirements & PlanningDesign & ArchitectureDevelopment SupportTesting & QualityDeployment & OperationsDocumentation & Knowledge

⚡ Quick Start

Prerequisites

  • Python 3.10 or higher
  • OpenAI API key
  • uv package manager (recommended) or pip

Installation

  1. Clone the repository

    git clone git@github.com:tomdwipo/agent.git
    cd agent
    
  2. Install dependencies

    # Using uv (recommended)
    uv sync
    
    # Or using pip
    pip install -r requirements.txt
    
  3. Configure environment

    # Create .env file
    cp .env.example .env
    
    # Add your OpenAI API key
    echo "OPENAI_API_KEY=your_api_key_here" >> .env
    
  4. Launch the application

    # Using uv
    uv run python transcribe_gradio.py
    
    # Or using python directly
    python transcribe_gradio.py
    
  5. Access the interface Open your browser to http://localhost:7860


🎯 Features Overview

✅ Production Features

Feature Status Description Documentation
Audio Transcription ✅ Complete OpenAI Whisper integration with multi-format support API Docs
AI Meeting Analysis ✅ Complete Key points extraction and meeting summaries API Docs
PRD Generation v1.0 ✅ Complete 8-section industry-standard Product Requirements Documents Feature Docs
Android TRD Generation v1.0 ✅ Complete 7-section Android Technical Requirements Documents Feature Docs
Figma MCP Integration v1.0 ✅ Complete Model Context Protocol server for Figma design data extraction Feature Docs
Android MCP Integration v1.0 ✅ Complete AI-powered Android device automation with LLM integration for intelligent mobile testing Setup Guide

📋 Planned Features (2025-2026)

Phase Timeline Key Components Expected Impact
Phase 1: Requirements & Planning Q3 2025 Enhanced PRD + Project Planning Agent 50% planning time reduction
Phase 2: Design & Architecture Q4 2025 System Design + UI/UX Design Agents 60% faster architecture documentation
Phase 3: Development Support Q1 2026 Code Generation + Development Standards 70% boilerplate code reduction
Phase 4: Testing & Quality Q2 2026 Test Planning + Quality Assurance Agents 80% test coverage automation
Phase 5: Deployment & Operations Q3 2026 DevOps + Infrastructure Management 90% deployment automation
Phase 6: Documentation & Knowledge Q4 2026 Documentation + Knowledge Management 75% documentation automation

🏗️ Architecture

System Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   UI Layer      │    │  Service Layer  │    │ Configuration   │
│                 │    │                 │    │                 │
│ • Gradio UI     │◄──►│ • OpenAI Service│◄──►│ • Settings      │
│ • Components    │    │ • Whisper Service│   │ • Constants     │
│ • Interface     │    │ • File Service  │    │ • Environment   │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Technology Stack

  • Backend: Python 3.10+, OpenAI API, Whisper
  • Frontend: Gradio (Web UI)
  • Package Management: uv with pyproject.toml
  • Configuration: Environment variables with .env support
  • Testing: Comprehensive test suite with pytest

Project Structure

agent/
├── main.py                 # Main application entry point
├── transcribe_gradio.py    # Gradio interface launcher
├── pyproject.toml         # Project configuration
├── requirements.txt       # Dependencies
├── config/               # Configuration management
│   ├── settings.py       # Application settings
│   ├── constants.py      # System constants
│   └── __init__.py
├── services/             # Core business logic
│   ├── openai_service.py # OpenAI API integration
│   ├── whisper_service.py# Audio transcription
│   ├── file_service.py   # File operations
│   └── __init__.py
├── ui/                   # User interface components
│   ├── gradio_interface.py# Main UI interface
│   ├── components.py     # UI components
│   └── __init__.py
├── tests/                # Test suite
├── demos/                # Demo applications
└── docs/                 # Comprehensive documentation

📚 Documentation

🎯 For Users

🛠️ For Developers

📋 For Project Managers & Stakeholders


🚀 Usage Examples

Basic Audio Transcription

from services.whisper_service import WhisperService

# Initialize service
whisper = WhisperService()

# Transcribe audio file
result = whisper.transcribe("meeting.mp3")
print(result["text"])

PRD Generation

from services.openai_service import OpenAIService

# Initialize service
openai_service = OpenAIService()

# Generate PRD from meeting transcript
prd = openai_service.generate_prd(transcript_text)
print(prd)

Complete Workflow

  1. Upload Audio → Transcribe meeting recording
  2. Generate Analysis → Extract key points and action items
  3. Create PRD → Transform discussion into structured requirements
  4. Generate TRD → Convert PRD into technical specifications
  5. Download Documents → Export all generated documents

🔧 Configuration

Environment Variables

# OpenAI Configuration
OPENAI_API_KEY=your_api_key_here
OPENAI_MODEL=gpt-4
OPENAI_MAX_TOKENS=4000

# Whisper Configuration
WHISPER_MODEL=base
WHISPER_LANGUAGE=auto

# Application Settings
DEBUG=false
LOG_LEVEL=INFO

Advanced Configuration

See Configuration API Documentation for complete configuration options.


🧪 Development

Setup Development Environment

# Clone repository
git clone git@github.com:tomdwipo/agent.git
cd agent

# Install development dependencies
uv sync --dev

# Run tests
uv run pytest

# Run with development settings
uv run python transcribe_gradio.py

Running Tests

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_prd_services.py

# Run with coverage
uv run pytest --cov=services

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite (uv run pytest)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

See Contributing Guidelines for detailed information.


📈 Project Status & Roadmap

Current Status: Production Ready v1.0

  • Core Foundation: Fully functional audio transcription and document generation
  • Production Features: PRD and Android TRD generation complete
  • Architecture: Modular, scalable design ready for expansion
  • Documentation: Comprehensive documentation and testing

Success Metrics by Phase

  • Phase 1: 50% planning time reduction
  • Phase 2: 60% faster architecture documentation
  • Phase 3: 70% boilerplate code reduction
  • Phase 4: 80% test coverage automation
  • Phase 5: 90% deployment automation
  • Phase 6: 75% documentation automation

Complete Workflow Vision

Meeting/Discussion → Transcription → PRD → TRD → Architecture → Code → Tests → Deployment → Documentation

🤝 Community & Support

Getting Help

  • Documentation: Comprehensive guides in docs/
  • Issues: Report bugs and request features via GitHub Issues
  • Discussions: Join community discussions

Contributing

We welcome contributions! See our Contributing Guide for:

  • Code contribution guidelines
  • Development setup instructions
  • Testing requirements
  • Documentation standards

📊 Metrics & Performance

Current Application Metrics

  • Features Implemented: 5/5 core features (100%)
  • Architecture Phases: 3/3 complete (Service Layer, Configuration, UI Components)
  • Test Coverage: Comprehensive test suite
  • Production Readiness: ✅ Ready for deployment

Performance Benchmarks

  • Transcription Speed: Real-time processing for most audio formats
  • PRD Generation: ~30 seconds for typical meeting transcript
  • TRD Generation: ~45 seconds from PRD input
  • Multi-format Support: 9 audio/video formats supported

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🎉 Acknowledgments

  • OpenAI for Whisper and GPT API
  • Gradio for the excellent web UI framework
  • Python Community for the amazing ecosystem
  • Contributors who help make this project better

📞 Contact & Links


🚀 Ready to transform your SDLC workflow? Get started with the Quick Start guide above!

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选