Tagging MCP

Tagging MCP

Enables parallel tagging and classification of CSV data using multiple LLM providers with structured output, confidence scores, and optional reasoning for batch classification tasks.

Category
访问服务器

README

Tagging MCP

MCP server for tagging CSV rows using polar_llama with parallel LLM inference.

Overview

This MCP server enables fast, parallel tagging of CSV data using multiple LLM providers. It leverages polar_llama to process rows concurrently, making it ideal for batch classification and tagging tasks.

Features

  • Parallel Processing: Tag hundreds or thousands of CSV rows concurrently
  • Multiple LLM Providers: Support for Claude (Anthropic), OpenAI, Gemini, and Groq
  • Structured Output: Uses Pydantic models for consistent, type-safe results
  • Flexible Taxonomy: Define custom tag lists for your use case
  • Optional Reasoning: Include confidence levels and explanations for tags

Installation

Prerequisites

  • Python 3.12+
  • UV package manager
  • API key for at least one LLM provider

Environment Setup

  1. Clone this repository
  2. Create a .env file with your API keys:
    ANTHROPIC_API_KEY=your_key_here
    OPENAI_API_KEY=your_key_here
    GEMINI_API_KEY=your_key_here
    GROQ_API_KEY=your_key_here
    

Claude Desktop Configuration

Option 1: Local Development (Recommended)

Run directly without containers:

{
  "mcpServers": {
    "tagging-mcp": {
      "command": "uv",
      "args": ["run", "fastmcp", "run", "/path/to/tagging_mcp/tagging.py"]
    }
  }
}

Option 2: Container Deployment

  1. Build the container:

    container build -t tagging_mcp .
    
  2. Configure Claude Desktop:

    {
      "mcpServers": {
        "tagging-mcp": {
          "command": "container",
          "args": ["run", "--interactive", "tagging_mcp"]
        }
      }
    }
    

Available Tools

tag_csv

Simple tagging with a list of categories. Perfect for basic classification tasks.

Parameters:

  • csv_path (str): Path to the CSV file to tag
  • taxonomy (List[str]): List of possible tags/categories (e.g., ["technology", "business", "science"])
  • text_column (str, optional): Column containing text to analyze (default: "text")
  • provider (str, optional): LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")
  • model (str, optional): Model identifier (default: "llama-3.3-70b-versatile")
  • api_key (str, optional): API key if not set via environment variable
  • output_path (str, optional): Path to save tagged CSV
  • include_reasoning (bool, optional): Include detailed reasoning and reflection (default: false)
  • field_name (str, optional): Name for the classification field (default: "category")

Returns: Dictionary with status, tagged data preview, confidence scores, and optional errors

tag_csv_advanced

Advanced multi-dimensional classification with custom taxonomy definitions. Use this for complex tagging with multiple fields.

Parameters:

  • csv_path (str): Path to the CSV file to tag
  • taxonomy (Dict): Full taxonomy dictionary with field definitions and value descriptions
  • text_column (str, optional): Column containing text to analyze (default: "text")
  • provider (str, optional): LLM provider (default: "groq")
  • model (str, optional): Model identifier (default: "llama-3.3-70b-versatile")
  • api_key (str, optional): API key if not set via environment variable
  • output_path (str, optional): Path to save tagged CSV
  • include_reasoning (bool, optional): Include detailed reasoning (default: false)

Example Taxonomy:

{
  "sentiment": {
    "description": "The emotional tone of the text",
    "values": {
      "positive": "Text expresses positive emotions or favorable opinions",
      "negative": "Text expresses negative emotions or unfavorable opinions",
      "neutral": "Text is factual and objective"
    }
  },
  "urgency": {
    "description": "How urgent the content is",
    "values": {
      "high": "Requires immediate attention",
      "medium": "Should be addressed soon",
      "low": "Can be addressed at any time"
    }
  }
}

Returns: Dictionary with status, all field values, confidence scores per field, and optional reasoning

preview_csv

Preview the first few rows of a CSV file to understand its structure.

Parameters:

  • csv_path (str): Path to the CSV file
  • rows (int, optional): Number of rows to preview (default: 5)

Returns: Dictionary with columns, row count, and preview data

get_tagging_info

Get information about the tagging MCP server and supported providers.

Returns: Server metadata, supported providers, features, and available tools

Example Usage

Basic Tagging

  1. Preview your CSV:

    Use preview_csv with csv_path="/path/to/data.csv"
    
  2. Simple category tagging:

    Use tag_csv with:
    - csv_path="/path/to/data.csv"
    - taxonomy=["technology", "business", "science", "politics"]
    - text_column="description"
    - output_path="/path/to/tagged_output.csv"
    
  3. Include reasoning for transparency:

    Use tag_csv with:
    - csv_path="/path/to/data.csv"
    - taxonomy=["urgent", "normal", "low_priority"]
    - field_name="priority"
    - include_reasoning=true
    

Advanced Multi-Field Tagging

For complex classification with multiple dimensions:

Use tag_csv_advanced with:
- csv_path="/path/to/support_tickets.csv"
- taxonomy={
    "department": {
      "description": "Which department should handle this",
      "values": {
        "sales": "Product inquiries and purchases",
        "support": "Technical issues and bugs",
        "billing": "Payment and account questions"
      }
    },
    "priority": {
      "description": "How urgent this is",
      "values": {
        "urgent": "Service down or critical issue",
        "high": "Significant problem",
        "normal": "Standard request"
      }
    }
  }
- text_column="ticket_description"
- output_path="/path/to/classified_tickets.csv"

Output Structure

Basic Tagging Output

  • Original CSV columns
  • {field_name}: The selected tag
  • confidence: Confidence score (0.0 to 1.0)
  • thinking: Reasoning for each possible value (if include_reasoning=true)
  • reflection: Overall analysis (if include_reasoning=true)

Advanced Tagging Output

  • Original CSV columns
  • For each taxonomy field:
    • {field_name}: Selected value
    • {field_name}_confidence: Confidence score
    • {field_name}_thinking: Reasoning dict (if enabled)
    • {field_name}_reflection: Analysis (if enabled)

Supported LLM Providers

  • Groq (Recommended): llama-3.3-70b-versatile, llama-3.1-70b-versatile, mixtral-8x7b-32768
  • Claude (Anthropic): claude-3-5-sonnet-20241022, claude-3-opus-20240229
  • OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
  • Gemini: gemini-1.5-pro, gemini-1.5-flash
  • AWS Bedrock: anthropic.claude-3-sonnet, anthropic.claude-3-haiku

Key Features

Detailed Reasoning: For each tag, see why the model chose it 🔍 Reflection: Model reflects on its analysis 📊 Confidence Scores: Know how confident each classification is (0.0-1.0) ⚡ Parallel Processing: All rows processed concurrently 🎯 Error Detection: Automatic error tracking and reporting 🔧 Flexible: Simple list or complex multi-field taxonomies

License

MIT

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选