MCP 服务器

mcp-openvision

MCP OpenVision 是一个模型上下文协议 (MCP) 服务器，它提供由 OpenRouter 视觉模型驱动的图像分析功能。它使 AI 助手能够通过 MCP 生态系统内的一个简单界面来分析图像。

Tools

image_analysis

Analyze an image using OpenRouter's vision capabilities. This tool allows you to send an image to OpenRouter's vision models for analysis. You provide a query to guide the analysis and can optionally customize the system prompt for more control over the model's behavior. Args: image: The image as a base64-encoded string, URL, or local file path query: Text prompt to guide the image analysis. For best results, provide context about why you're analyzing the image and what specific information you need. Including details about your purpose and required focus areas leads to more relevant and useful responses. system_prompt: Instructions for the model defining its role and behavior model: The vision model to use (defaults to the value set by OPENROUTER_DEFAULT_MODEL) max_tokens: Maximum number of tokens in the response (100-4000) temperature: Temperature parameter for generation (0.0-1.0) top_p: Optional nucleus sampling parameter (0.0-1.0) presence_penalty: Optional penalty for new tokens based on presence in text so far (0.0-2.0) frequency_penalty: Optional penalty for new tokens based on frequency in text so far (0.0-2.0) project_root: Optional root directory to resolve relative image paths against Returns: The analysis result as text Examples: Basic usage with a file path: image_analysis(image="path/to/image.jpg", query="Describe this image in detail") Basic usage with an image URL: image_analysis(image="https://example.com/image.jpg", query="Describe this image in detail") Basic usage with a relative path and project root: image_analysis(image="examples/image.jpg", project_root="/path/to/project", query="Describe this image in detail") Usage with a detailed contextual query: image_analysis( image="path/to/image.jpg", query="Analyze this product packaging design for a fitness supplement. Identify all nutritional claims, certifications, and health icons. Assess the visual hierarchy and how the key selling points are communicated. This is for a competitive analysis project." ) Usage with custom system prompt: image_analysis( image="path/to/image.jpg", query="What objects can you see in this image?", system_prompt="You are an expert at identifying objects in images. Focus on listing all visible objects." )

README

MCP OpenVision

概述

MCP OpenVision 是一个模型上下文协议 (MCP) 服务器，它提供由 OpenRouter 视觉模型驱动的图像分析功能。它使 AI 助手能够通过 MCP 生态系统中的一个简单界面来分析图像。

安装

通过 Smithery 安装

要通过 Smithery 为 Claude Desktop 自动安装 mcp-openvision，请执行以下操作：

npx -y @smithery/cli install @Nazruden/mcp-openvision --client claude

使用 pip

pip install mcp-openvision

使用 UV (推荐)

uv pip install mcp-openvision

配置

MCP OpenVision 需要一个 OpenRouter API 密钥，并且可以通过环境变量进行配置：

OPENROUTER_API_KEY (必需): 您的 OpenRouter API 密钥
OPENROUTER_DEFAULT_MODEL (可选): 要使用的视觉模型

OpenRouter 视觉模型

MCP OpenVision 可以与任何支持视觉功能的 OpenRouter 模型一起使用。默认模型是 qwen/qwen2.5-vl-32b-instruct:free，但您可以指定任何其他兼容的模型。

通过 OpenRouter 提供的一些流行的视觉模型包括：

qwen/qwen2.5-vl-32b-instruct:free (默认)
anthropic/claude-3-5-sonnet
anthropic/claude-3-opus
anthropic/claude-3-sonnet
openai/gpt-4o

您可以通过设置 OPENROUTER_DEFAULT_MODEL 环境变量或直接将 model 参数传递给 image_analysis 函数来指定自定义模型。

用法

使用 MCP Inspector 进行测试

测试 MCP OpenVision 最简单的方法是使用 MCP Inspector 工具：

npx @modelcontextprotocol/inspector uvx mcp-openvision

与 Claude Desktop 或 Cursor 集成

编辑您的 MCP 配置文件：
- Windows: %USERPROFILE%\.cursor\mcp.json
- macOS: ~/.cursor/mcp.json 或 ~/Library/Application Support/Claude/claude_desktop_config.json
添加以下配置：

{
  "mcpServers": {
    "openvision": {
      "command": "uvx",
      "args": ["mcp-openvision"],
      "env": {
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here",
        "OPENROUTER_DEFAULT_MODEL": "anthropic/claude-3-sonnet"
      }
    }
  }
}

在本地运行以进行开发

# 设置所需的 API 密钥
export OPENROUTER_API_KEY="your_api_key"

# 直接运行服务器模块
python -m mcp_openvision

特性

MCP OpenVision 提供以下核心工具：

image_analysis: 使用视觉模型分析图像，支持各种参数：
- image: 可以提供为：
  - Base64 编码的图像数据
  - 图像 URL (http/https)
  - 本地文件路径
- query: 用户对图像分析任务的指令
- system_prompt: 定义模型角色和行为的指令（可选）
- model: 要使用的视觉模型
- temperature: 控制随机性 (0.0-1.0)
- max_tokens: 最大响应长度

编写有效的查询

query 参数对于从图像分析中获得有用的结果至关重要。一个精心设计的查询提供以下方面的上下文：

目的: 您分析此图像的原因
关注区域: 要注意的特定元素或细节
所需信息: 您需要提取的信息类型
格式偏好: 您希望结果如何构建

有效查询的示例

基本查询	增强查询
"描述这张图片"	"识别此商店货架图像中所有可见的零售产品，并估算其价格范围"
"这张图片里有什么？"	"分析此医疗扫描中的异常情况，重点关注突出显示的区域并提供可能的诊断"
"分析此图表"	"从此显示季度销售额的条形图中提取数值数据，并确定 2022-2023 年的主要趋势"
"阅读文本"	"转录此餐厅菜单中所有可见的文本，保留项目名称、描述和价格"

通过提供有关您需要分析的原因以及您正在寻找的特定信息的上下文，您可以帮助模型专注于相关细节并产生更有价值的见解。

用法示例

# 分析来自 URL 的图像
result = await image_analysis(
    image="https://example.com/image.jpg",
    query="详细描述这张图片"
)

# 分析来自本地文件的图像，并带有重点查询
result = await image_analysis(
    image="path/to/local/image.jpg",
    query="识别此街道场景中的所有交通标志，并解释它们对驾驶员教育课程的意义"
)

# 使用 base64 编码的图像和特定的分析目的进行分析
result = await image_analysis(
    image="SGVsbG8gV29ybGQ=...",  # base64 数据
    query="检查此产品包装设计并突出显示可以改进的元素，以提高可见性和品牌认知度"
)

# 自定义系统提示以进行专门的分析
result = await image_analysis(
    image="path/to/local/image.jpg",
    query="分析这幅画中使用的构图和艺术技巧，重点关注它们如何产生情感影响",
    system_prompt="你是一位精通绘画技巧和艺术运动的艺术史专家。 专注于构图、色彩、笔触和风格元素的正式分析。"
)

图像输入类型

image_analysis 工具接受几种类型的图像输入：

Base64 编码的字符串
图像 URL - 必须以 http:// 或 https:// 开头
文件路径:
- 绝对路径: 以 / (Unix) 或驱动器盘符 (Windows) 开头的完整路径
- 相对路径: 相对于当前工作目录的路径
- 带有 project_root 的相对路径: 使用 project_root 参数指定基本目录

使用相对路径

使用相对文件路径（如 "examples/image.jpg"）时，您有两个选择：

该路径必须相对于服务器运行的当前工作目录
或者，您可以指定一个 project_root 参数：

# 带有相对路径和 project_root 的示例
result = await image_analysis(
    image="examples/image.jpg",
    project_root="/path/to/your/project",
    query="这张图片里有什么？"
)

这在当前工作目录可能不可预测或您想使用相对于特定目录的路径引用文件时特别有用。

开发

设置开发环境

# 克隆存储库
git clone https://github.com/modelcontextprotocol/mcp-openvision.git
cd mcp-openvision

# 安装开发依赖项
pip install -e ".[dev]"

代码格式化

本项目使用 Black 进行自动代码格式化。格式化通过 GitHub Actions 强制执行：

推送到存储库的所有代码都会自动使用 Black 进行格式化
对于来自存储库协作者的拉取请求，Black 会格式化代码并直接提交到 PR 分支
对于来自 fork 的拉取请求，Black 会创建一个新的 PR，其中包含可以合并到原始 PR 中的格式化代码

您也可以在本地运行 Black 以在提交之前格式化您的代码：

# 格式化 src 和 tests 目录中的所有 Python 代码
black src tests

运行测试

pytest

发布流程

本项目使用自动化的发布流程：

按照语义版本控制原则更新 pyproject.toml 中的版本
- 您可以使用辅助脚本：python scripts/bump_version.py [major|minor|patch]
使用有关新版本的详细信息更新 CHANGELOG.md
- 该脚本还在 CHANGELOG.md 中创建一个您可以填写的模板条目
提交并将这些更改推送到 main 分支
GitHub Actions 工作流程将：
- 检测版本更改
- 自动创建一个新的 GitHub 版本
- 触发发布到 PyPI 的发布工作流程

这种自动化有助于维护一致的发布流程，并确保每个版本都经过适当的版本控制和记录。

支持

如果您觉得这个项目有帮助，请考虑请我喝杯咖啡，以支持正在进行的开发和维护。

许可证

本项目根据 MIT 许可证获得许可 - 有关详细信息，请参阅 LICENSE 文件。