verbalized-sampling-mcp

verbalized-sampling-mcp

A Model Context Protocol server that provides Verbalized Sampling prompt templates and response processing to improve LLM output diversity by 2-3x.

Category
访问服务器

README

Verbalized Sampling MCP Server

A Model Context Protocol (MCP) server that provides Verbalized Sampling (VS) prompt templates and response processing utilities to mitigate mode collapse in LLM outputs.

Overview

Verbalized Sampling is a training-free prompting strategy that improves LLM diversity by 2-3x. It works by asking the model to generate multiple responses with their probabilities, then sampling from the tails of the distribution to encourage creative, less common outputs.

This MCP server provides three core tools that work together to implement the VS methodology:

  1. vs_create_prompt - Generate optimized VS prompts for any task
  2. vs_process_response - Parse LLM responses and select diverse outputs
  3. vs_recommend_params - Get model-specific VS parameter recommendations

Features

Core VS Tools

  • Prompt Generation: Creates research-backed VS prompts optimized for different models
  • Response Processing: Parses XML-formatted responses and implements tail sampling
  • Model Optimization: Provides parameter recommendations for 20+ current LLM models

Model Support

Supports the latest models from all major providers:

  • Anthropic: Claude Sonnet 4.5, Haiku 4.5, Opus 4.1
  • OpenAI: GPT-5.1, GPT-5 mini/nano/pro, GPT-4.1 series, o4-mini
  • Google: Gemini 2.5 Pro/Flash, Gemini 1.5 Pro
  • Meta/Open Source: Llama 3.3, DeepSeek R1, Qwen3

Installation

Option 1: Install from npm (Recommended)

npm install -g verbalized-sampling-mcp

Option 2: Install from Source

# Clone the repository
git clone https://github.com/johnferguson/verbalized-sampling-mcp.git
cd verbalized-sampling-mcp

# Install dependencies
npm install

# Configure environment variables (optional)
cp .env.example .env.development
# Edit .env.development with your Sentry DSN and other settings

# Build the project
npm run build

# Start the server
npm start

Sentry Monitoring

This server includes comprehensive Sentry monitoring for production observability:

Features

  • Performance Monitoring: 100% trace sampling for detailed performance insights
  • Error Tracking: MCP-specific error categorization and context
  • Custom Metrics: VS tool execution times, success rates, confidence scores
  • Health Monitoring: Server uptime, memory usage, connection tracking

Configuration

The server automatically detects environment and configures monitoring accordingly:

  • Development: Full tracing with local error handling
  • Production: Optimized performance with comprehensive error tracking

Environment Variables

# Required
SENTRY_DSN=https://your-dsn@sentry.io/project-id
SENTRY_ENVIRONMENT=development|production

# MCP-specific tags (automatically added to all events)
MCP_SERVER_NAME=verbalized-sampling-mcp
MCP_TRANSPORT_TYPE=stdio
MCP_TOOL_COUNT=4
MCP_CLIENT_INFO=vscode-extension@1.0.0

Monitoring Dashboard

View real-time metrics and errors at: Sentry Dashboard

For detailed monitoring setup and procedures, see OBSERVABILITY.md.

Usage

Quick Start

# Install and start
npm install -g verbalized-sampling-mcp
verbalized-sampling-mcp

# In another terminal, test with MCP Inspector
npx @modelcontextprotocol/inspector node dist/index.js

Basic Workflow

  1. Generate VS Prompt: Use vs_create_prompt to get an optimized prompt
  2. Send to LLM: Give the prompt to your LLM (via any interface)
  3. Process Response: Use vs_process_response to parse and select the best diverse output

Examples

Example 1: Creative Writing with Claude

// Generate VS prompt for creative writing
const promptResult = await mcp.callTool("vs_create_prompt", {
  topic: "Write a short story about a robot learning to paint",
  method: "creative_writing", // Optimized for creative tasks
  model_name: "claude-sonnet-4-5"
});

// Send to Claude and get response
const claudeResponse = await callClaude(promptResult.content[0].text);

// Process for diverse selection
const storyResult = await mcp.callTool("vs_process_response", {
  llm_output: claudeResponse,
  tau: 0.08 // Model-specific threshold
});

console.log(storyResult.content[0].text); // Selected diverse story

Example 2: Technical Documentation with GPT-5

// Get model-specific parameters first
const params = await mcp.callTool("vs_recommend_params", {
  model_name: "gpt-5"
});
// Returns: {"k": 10, "tau": 0.05, "temperature": 1.1}

// Generate technical explanation prompt
const promptResult = await mcp.callTool("vs_create_prompt", {
  topic: "Explain quantum computing in simple terms",
  method: "cot", // Chain-of-thought for complex topics
  model_name: "gpt-5"
});

// Process GPT's XML response
const result = await mcp.callTool("vs_process_response", {
  llm_output: gptResponse,
  tau: params.tau // Use research-optimized threshold
});

Example 3: Dialogue Generation

// Generate diverse dialogue responses
const promptResult = await mcp.callTool("vs_create_prompt", {
  topic: "Write a conversation between a human and AI about climate change",
  method: "dialogue", // Specialized for conversation
  model_name: "gemini-2.5-pro"
});

// Get multiple dialogue options
const dialogueResult = await mcp.callTool("vs_process_response", {
  llm_output: geminiResponse,
  tau: 0.12 // Gemini-specific threshold
});

Example 4: Batch Processing

// Process multiple responses efficiently
const responses = [
  "<response><text>Option A</text><probability>0.15</probability></response>",
  "<response><text>Option B</text><probability>0.07</probability></response>",
  "<response><text>Option C</text><probability>0.03</probability></response>"
];

for (const response of responses) {
  const result = await mcp.callTool("vs_process_response", {
    llm_output: response,
    tau: 0.10 // Standard threshold
  });
  console.log(`Selected: ${result.content[0].text}`);
}

MCP Integration

Claude Desktop (Recommended)

  1. Install from npm:
npm install -g verbalized-sampling-mcp
  1. Add to Claude Desktop:
    • Open Claude Desktop → Settings → Developer → Edit MCP Servers
    • Add new server:
      {
        "name": "verbalized-sampling-mcp",
        "command": "verbalized-sampling-mcp",
        "args": []
      }
      
    • Restart Claude Desktop

Other MCP Clients

{
  "mcpServers": {
    "verbalized-sampling": {
      "command": "node",
      "args": ["/path/to/verbalized-sampling-mcp/dist/index.js"]
    }
  }
}

Environment Variables (Optional)

# Sentry monitoring (recommended for production)
export SENTRY_DSN="your-dsn@sentry.io/project-id"
export SENTRY_ENVIRONMENT="production"

# Or create .env file
echo "SENTRY_DSN=your-dsn@sentry.io/project-id" > .env
echo "SENTRY_ENVIRONMENT=production" >> .env

Available Tools

vs_create_prompt

Generates a Verbalized Sampling prompt optimized for a specific model and task.

Parameters:

  • topic (string, required): The user's query or task
  • method (string, optional): VS strategy - "standard", "cot", or "multi-turn"
  • model_name (string, optional): Target model name for parameter optimization

Returns: A complete VS prompt string ready to send to an LLM.

vs_process_response

Parses an LLM's XML response and selects the most diverse option using tail sampling.

Parameters:

  • llm_output (string, required): Raw text output from LLM containing <response> tags
  • tau (number, optional): Probability threshold for tail sampling (default: 0.10)

Returns: The selected diverse response with metadata.

vs_recommend_params

Gets recommended VS parameters for a specific model.

Parameters:

  • model_name (string, required): The model name to look up

Returns: JSON object with k (sample count), tau (threshold), and temperature values.

MCP Server Details

Server Configuration

The server runs on stdio transport and provides these MCP tools:

Tool Description Parameters
vs_create_prompt Generate optimized VS prompts topic (required), method, model_name
vs_process_response Parse XML responses and select diverse output llm_output (required), tau
vs_recommend_params Get model-specific VS parameters model_name (required)

VS Methods Available

Method Description Best For
standard Basic VS prompting General use
cot Chain-of-thought reasoning Complex tasks
multi-turn Progressive diversity building Conversations
research_standard Official research format Research compliance
creative_writing Optimized for creativity Stories, poems
dialogue Varied tone/style Conversations

Model Support

Supports 20+ models with optimized parameters:

Anthropic: Claude Sonnet 4.5, Haiku 4.5, Opus 4.1 OpenAI: GPT-5, GPT-5 mini/nano/pro, GPT-4.1 series, o4-mini
Google: Gemini 2.5 Pro/Flash, Gemini 1.5 Pro Meta/Open Source: Llama 3.3, DeepSeek R1, Qwen3

Development

# Development mode
npm run dev

# Run tests
npm test

# Test Sentry integration
npm run sentry:test

# Lint and fix code
npm run lint:fix

# Format code
npm run format

# Type checking
npm run typecheck

Production Deployment

Environment Setup

# Production environment
export NODE_ENV=production
export SENTRY_DSN="your-dsn@sentry.io/project-id"
export SENTRY_ENVIRONMENT=production

# Start with monitoring
npm start

Docker Deployment

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY dist ./dist/
EXPOSE 3000
CMD ["npm", "start"]

Monitoring

The server includes comprehensive Sentry monitoring:

  • Performance Metrics: 100% trace sampling
  • Error Tracking: MCP-specific error categorization
  • Custom Metrics: VS tool execution times, success rates
  • Health Monitoring: Server uptime, memory usage, connections

View metrics at: Sentry Dashboard

Testing Sentry Integration

# Test error reporting
npm run sentry:test

# Start server with monitoring
npm run start

# Use MCP Inspector to test tools and verify metrics
npx @modelcontextprotocol/inspector node dist/index.js

All tool executions, errors, and performance metrics are automatically sent to Sentry with MCP-specific context.

Architecture

src/
├── tools/
│   ├── vs-tools.ts        # Main MCP tool implementations
│   ├── prompts.ts         # VS prompt templates and formatting
│   ├── sampler.ts         # Response parsing and selection logic
│   └── constants.ts       # Model-specific parameter mappings
└── index.ts               # MCP server setup

Scientific Foundation

This implementation is based on the research paper "Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity" by Zhang et al. (2025), which demonstrates that VS increases diversity by 1.6-2.1x while maintaining quality.

The methodology works by:

  1. Prompting for Probabilities: Asking LLMs to verbalize probability estimates for their own outputs
  2. Tail Sampling: Selecting responses with low probabilities to encourage diversity
  3. XML Structure: Using structured output format for reliable parsing

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Related

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选