FastMCP Document Analyzer

FastMCP Document Analyzer

A comprehensive document analysis server that performs sentiment analysis, keyword extraction, readability scoring, and text statistics while providing document management capabilities including storage, search, and organization.

Category
访问服务器

README

🔍 FastMCP Document Analyzer

A comprehensive document analysis server built with the modern FastMCP framework

Python FastMCP License

📋 Table of Contents

🌟 Features

📖 Document Analysis

  • 🎭 Sentiment Analysis: VADER + TextBlob dual-engine sentiment classification
  • 🔑 Keyword Extraction: TF-IDF and frequency-based keyword identification
  • 📚 Readability Scoring: Multiple metrics (Flesch, Flesch-Kincaid, ARI)
  • 📊 Text Statistics: Word count, sentences, paragraphs, and more

🗂️ Document Management

  • 💾 Persistent Storage: JSON-based document collection with metadata
  • 🔍 Smart Search: TF-IDF semantic similarity search
  • 🏷️ Tag System: Category and tag-based organization
  • 📈 Collection Insights: Comprehensive statistics and analytics

🚀 FastMCP Advantages

  • ⚡ Simple Setup: 90% less boilerplate than standard MCP
  • 🔒 Type Safety: Full type validation with Pydantic
  • 🎯 Modern API: Decorator-based tool definitions
  • 🌐 Multi-Transport: STDIO, HTTP, and SSE support

🚀 Quick Start

1. Clone and Setup

git clone <repository-url>
cd document-analyzer
python -m venv venv
source venv/Scripts/activate  # Windows
# source venv/bin/activate    # macOS/Linux

2. Install Dependencies

pip install -r requirements.txt

3. Initialize NLTK Data

python -c "import nltk; nltk.download('punkt'); nltk.download('vader_lexicon'); nltk.download('stopwords'); nltk.download('punkt_tab')"

4. Run the Server

python fastmcp_document_analyzer.py

5. Test Everything

python test_fastmcp_analyzer.py

📦 Installation

System Requirements

  • Python 3.8 or higher
  • 500MB free disk space
  • Internet connection (for initial NLTK data download)

Dependencies

fastmcp>=2.3.0      # Modern MCP framework
textblob>=0.17.1    # Sentiment analysis
nltk>=3.8.1         # Natural language processing
textstat>=0.7.3     # Readability metrics
scikit-learn>=1.3.0 # Machine learning utilities
numpy>=1.24.0       # Numerical computing
pandas>=2.0.0       # Data manipulation
python-dateutil>=2.8.2  # Date handling

Optional: Virtual Environment

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (macOS/Linux)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

🔧 Usage

Starting the Server

Default (STDIO Transport)

python fastmcp_document_analyzer.py

HTTP Transport (for web services)

python fastmcp_document_analyzer.py --transport http --port 9000

With Custom Host

python fastmcp_document_analyzer.py --transport http --host 0.0.0.0 --port 8080

Basic Usage Examples

# Analyze a document
result = analyze_document("doc_001")
print(f"Sentiment: {result['sentiment_analysis']['overall_sentiment']}")

# Extract keywords
keywords = extract_keywords("Artificial intelligence is transforming healthcare", 5)
print([kw['keyword'] for kw in keywords])

# Search documents
results = search_documents("machine learning", 3)
print(f"Found {len(results)} relevant documents")

# Get collection statistics
stats = get_collection_stats()
print(f"Total documents: {stats['total_documents']}")

🛠️ Available Tools

Core Analysis Tools

Tool Description Example
analyze_document 🔍 Complete document analysis analyze_document("doc_001")
get_sentiment 😊 Sentiment analysis get_sentiment("I love this!")
extract_keywords 🔑 Keyword extraction extract_keywords(text, 10)
calculate_readability 📖 Readability metrics calculate_readability(text)

Document Management Tools

Tool Description Example
add_document 📝 Add new document add_document("id", "title", "content")
get_document 📄 Retrieve document get_document("doc_001")
delete_document 🗑️ Delete document delete_document("old_doc")
list_documents 📋 List all documents list_documents("Technology")

Search and Discovery Tools

Tool Description Example
search_documents 🔍 Semantic search search_documents("AI", 5)
search_by_tags 🏷️ Tag-based search search_by_tags(["AI", "tech"])
get_collection_stats 📊 Collection statistics get_collection_stats()

📊 Sample Data

The server comes pre-loaded with 16 diverse documents covering:

Category Documents Topics
Technology 4 AI, Quantum Computing, Privacy, Blockchain
Science 3 Space Exploration, Healthcare, Ocean Conservation
Environment 2 Climate Change, Sustainable Agriculture
Society 3 Remote Work, Mental Health, Transportation
Business 2 Economics, Digital Privacy
Culture 2 Art History, Wellness

Sample Document Structure

{
  "id": "doc_001",
  "title": "The Future of Artificial Intelligence",
  "content": "Artificial intelligence is rapidly transforming...",
  "author": "Dr. Sarah Chen",
  "category": "Technology",
  "tags": ["AI", "technology", "future", "ethics"],
  "language": "en",
  "created_at": "2024-01-15T10:30:00"
}

🏗️ Project Structure

document-analyzer/
├── 📁 analyzer/                    # Core analysis engine
│   ├── __init__.py
│   └── document_analyzer.py       # Sentiment, keywords, readability
├── 📁 storage/                     # Document storage system
│   ├── __init__.py
│   └── document_storage.py        # JSON storage, search, management
├── 📁 data/                        # Sample data
│   ├── __init__.py
│   └── sample_documents.py        # 16 sample documents
├── 📄 fastmcp_document_analyzer.py # 🌟 Main FastMCP server
├── 📄 test_fastmcp_analyzer.py    # Comprehensive test suite
├── 📄 requirements.txt            # Python dependencies
├── 📄 documents.json              # Persistent document storage
├── 📄 README.md                   # This documentation
├── 📄 FASTMCP_COMPARISON.md       # FastMCP vs Standard MCP
├── 📄 .gitignore                  # Git ignore patterns
└── 📁 venv/                       # Virtual environment (optional)

🔄 API Reference

Document Analysis

analyze_document(document_id: str) -> Dict[str, Any]

Performs comprehensive analysis of a document.

Parameters:

  • document_id (str): Unique document identifier

Returns:

{
  "document_id": "doc_001",
  "title": "Document Title",
  "sentiment_analysis": {
    "overall_sentiment": "positive",
    "confidence": 0.85,
    "vader_scores": {...},
    "textblob_scores": {...}
  },
  "keywords": [
    {"keyword": "artificial", "frequency": 5, "relevance_score": 2.3}
  ],
  "readability": {
    "flesch_reading_ease": 45.2,
    "reading_level": "Difficult",
    "grade_level": "Grade 12"
  },
  "basic_statistics": {
    "word_count": 119,
    "sentence_count": 8,
    "paragraph_count": 1
  }
}

get_sentiment(text: str) -> Dict[str, Any]

Analyzes sentiment of any text.

Parameters:

  • text (str): Text to analyze

Returns:

{
  "overall_sentiment": "positive",
  "confidence": 0.85,
  "vader_scores": {
    "compound": 0.7269,
    "positive": 0.294,
    "negative": 0.0,
    "neutral": 0.706
  },
  "textblob_scores": {
    "polarity": 0.5,
    "subjectivity": 0.6
  }
}

Document Management

add_document(...) -> Dict[str, str]

Adds a new document to the collection.

Parameters:

  • id (str): Unique document ID
  • title (str): Document title
  • content (str): Document content
  • author (str, optional): Author name
  • category (str, optional): Document category
  • tags (List[str], optional): Tags list
  • language (str, optional): Language code

Returns:

{
  "status": "success",
  "message": "Document 'my_doc' added successfully",
  "document_count": 17
}

Search and Discovery

search_documents(query: str, limit: int = 10) -> List[Dict[str, Any]]

Performs semantic search across documents.

Parameters:

  • query (str): Search query
  • limit (int): Maximum results

Returns:

[
  {
    "id": "doc_001",
    "title": "AI Document",
    "similarity_score": 0.8542,
    "content_preview": "First 200 characters...",
    "tags": ["AI", "technology"]
  }
]

🧪 Testing

Run All Tests

python test_fastmcp_analyzer.py

Test Categories

  • Server Initialization: FastMCP server setup
  • Sentiment Analysis: VADER and TextBlob integration
  • Keyword Extraction: TF-IDF and frequency analysis
  • Readability Calculation: Multiple readability metrics
  • Document Analysis: Full document processing
  • Document Search: Semantic similarity search
  • Collection Statistics: Analytics and insights
  • Document Management: CRUD operations
  • Tag Search: Tag-based filtering

Expected Test Output

=== Testing FastMCP Document Analyzer ===

✓ FastMCP server module imported successfully
✓ Server initialized successfully
✓ Sentiment analysis working
✓ Keyword extraction working
✓ Readability calculation working
✓ Document analysis working
✓ Document search working
✓ Collection statistics working
✓ Document listing working
✓ Document addition and deletion working
✓ Tag search working

=== All FastMCP tests completed successfully! ===

📚 Documentation

Additional Resources

Key Concepts

Sentiment Analysis

Uses dual-engine approach:

  • VADER: Rule-based, excellent for social media text
  • TextBlob: Machine learning-based, good for general text

Keyword Extraction

Combines multiple approaches:

  • TF-IDF: Term frequency-inverse document frequency
  • Frequency Analysis: Simple word frequency counting
  • Relevance Scoring: Weighted combination of both methods

Readability Metrics

Provides multiple readability scores:

  • Flesch Reading Ease: 0-100 scale (higher = easier)
  • Flesch-Kincaid Grade: US grade level
  • ARI: Automated Readability Index

Document Search

Uses TF-IDF vectorization with cosine similarity:

  • Converts documents to numerical vectors
  • Calculates similarity between query and documents
  • Returns ranked results with similarity scores

🤝 Contributing

Development Setup

# Clone repository
git clone <repository-url>
cd document-analyzer

# Create development environment
python -m venv venv
source venv/Scripts/activate  # Windows
pip install -r requirements.txt

# Run tests
python test_fastmcp_analyzer.py

Adding New Tools

FastMCP makes it easy to add new tools:

@mcp.tool
def my_new_tool(param: str) -> Dict[str, Any]:
    """
    🔧 Description of what this tool does.

    Args:
        param: Parameter description

    Returns:
        Return value description
    """
    # Implementation here
    return {"result": "success"}

Code Style

  • Use type hints for all functions
  • Add comprehensive docstrings
  • Include error handling
  • Follow PEP 8 style guidelines
  • Add emoji icons for better readability

Testing New Features

  1. Add your tool to the main server file
  2. Create test cases in the test file
  3. Run the test suite to ensure everything works
  4. Update documentation as needed

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • FastMCP Team for the excellent framework
  • NLTK Team for natural language processing tools
  • TextBlob Team for sentiment analysis capabilities
  • Scikit-learn Team for machine learning utilities

Made with ❤️ using FastMCP

🚀 Ready to analyze documents? Start with python fastmcp_document_analyzer.py

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选