Tagging MCP
Enables parallel tagging and classification of CSV data using multiple LLM providers with structured output, confidence scores, and optional reasoning for batch classification tasks.
README
Tagging MCP
MCP server for tagging CSV rows using polar_llama with parallel LLM inference.
Overview
This MCP server enables fast, parallel tagging of CSV data using multiple LLM providers. It leverages polar_llama to process rows concurrently, making it ideal for batch classification and tagging tasks.
Features
- Parallel Processing: Tag hundreds or thousands of CSV rows concurrently
- Multiple LLM Providers: Support for Claude (Anthropic), OpenAI, Gemini, and Groq
- Structured Output: Uses Pydantic models for consistent, type-safe results
- Flexible Taxonomy: Define custom tag lists for your use case
- Optional Reasoning: Include confidence levels and explanations for tags
Installation
Prerequisites
- Python 3.12+
- UV package manager
- API key for at least one LLM provider
Environment Setup
- Clone this repository
- Create a
.envfile with your API keys:ANTHROPIC_API_KEY=your_key_here OPENAI_API_KEY=your_key_here GEMINI_API_KEY=your_key_here GROQ_API_KEY=your_key_here
Claude Desktop Configuration
Option 1: Local Development (Recommended)
Run directly without containers:
{
"mcpServers": {
"tagging-mcp": {
"command": "uv",
"args": ["run", "fastmcp", "run", "/path/to/tagging_mcp/tagging.py"]
}
}
}
Option 2: Container Deployment
-
Build the container:
container build -t tagging_mcp . -
Configure Claude Desktop:
{ "mcpServers": { "tagging-mcp": { "command": "container", "args": ["run", "--interactive", "tagging_mcp"] } } }
Available Tools
tag_csv
Simple tagging with a list of categories. Perfect for basic classification tasks.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(List[str]): List of possible tags/categories (e.g., ["technology", "business", "science"])text_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning and reflection (default: false)field_name(str, optional): Name for the classification field (default: "category")
Returns: Dictionary with status, tagged data preview, confidence scores, and optional errors
tag_csv_advanced
Advanced multi-dimensional classification with custom taxonomy definitions. Use this for complex tagging with multiple fields.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(Dict): Full taxonomy dictionary with field definitions and value descriptionstext_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning (default: false)
Example Taxonomy:
{
"sentiment": {
"description": "The emotional tone of the text",
"values": {
"positive": "Text expresses positive emotions or favorable opinions",
"negative": "Text expresses negative emotions or unfavorable opinions",
"neutral": "Text is factual and objective"
}
},
"urgency": {
"description": "How urgent the content is",
"values": {
"high": "Requires immediate attention",
"medium": "Should be addressed soon",
"low": "Can be addressed at any time"
}
}
}
Returns: Dictionary with status, all field values, confidence scores per field, and optional reasoning
preview_csv
Preview the first few rows of a CSV file to understand its structure.
Parameters:
csv_path(str): Path to the CSV filerows(int, optional): Number of rows to preview (default: 5)
Returns: Dictionary with columns, row count, and preview data
get_tagging_info
Get information about the tagging MCP server and supported providers.
Returns: Server metadata, supported providers, features, and available tools
Example Usage
Basic Tagging
-
Preview your CSV:
Use preview_csv with csv_path="/path/to/data.csv" -
Simple category tagging:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["technology", "business", "science", "politics"] - text_column="description" - output_path="/path/to/tagged_output.csv" -
Include reasoning for transparency:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["urgent", "normal", "low_priority"] - field_name="priority" - include_reasoning=true
Advanced Multi-Field Tagging
For complex classification with multiple dimensions:
Use tag_csv_advanced with:
- csv_path="/path/to/support_tickets.csv"
- taxonomy={
"department": {
"description": "Which department should handle this",
"values": {
"sales": "Product inquiries and purchases",
"support": "Technical issues and bugs",
"billing": "Payment and account questions"
}
},
"priority": {
"description": "How urgent this is",
"values": {
"urgent": "Service down or critical issue",
"high": "Significant problem",
"normal": "Standard request"
}
}
}
- text_column="ticket_description"
- output_path="/path/to/classified_tickets.csv"
Output Structure
Basic Tagging Output
- Original CSV columns
{field_name}: The selected tagconfidence: Confidence score (0.0 to 1.0)thinking: Reasoning for each possible value (ifinclude_reasoning=true)reflection: Overall analysis (ifinclude_reasoning=true)
Advanced Tagging Output
- Original CSV columns
- For each taxonomy field:
{field_name}: Selected value{field_name}_confidence: Confidence score{field_name}_thinking: Reasoning dict (if enabled){field_name}_reflection: Analysis (if enabled)
Supported LLM Providers
- Groq (Recommended): llama-3.3-70b-versatile, llama-3.1-70b-versatile, mixtral-8x7b-32768
- Claude (Anthropic): claude-3-5-sonnet-20241022, claude-3-opus-20240229
- OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
- Gemini: gemini-1.5-pro, gemini-1.5-flash
- AWS Bedrock: anthropic.claude-3-sonnet, anthropic.claude-3-haiku
Key Features
✨ Detailed Reasoning: For each tag, see why the model chose it 🔍 Reflection: Model reflects on its analysis 📊 Confidence Scores: Know how confident each classification is (0.0-1.0) ⚡ Parallel Processing: All rows processed concurrently 🎯 Error Detection: Automatic error tracking and reporting 🔧 Flexible: Simple list or complex multi-field taxonomies
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。