ViperMCP

ViperMCP

A mixture-of-experts visual question-answering server that enables visual grounding, compositional image question answering, and external knowledge-dependent image question answering through code generation and execution. Built on the ViperGPT framework with support for multiple computer vision models including Grounding DINO, SegmentAnything, and GPT-4o.

Category
访问服务器

README

ViperMCP: A Model Context Protocol for Viper Server

ViperMCP is a mixture-of-experts (MoE) visual question-answering (VQA) server that defines several functions to solve 3 particular task areas: 1) visual grounding, 2) compositional image question answering, and 3) external knowledge-dependent image question answering. It is based heavily on the ViperGPT framework.

The MCP server is structured as a FastMCP streamable-http server and is therefore compatible with all of the client tooling provided by FastMCP.

Setup

OpenAI API Key

An API key for the OpenAI platform is required. It can either be set in the execution environment as OPENAI_API_KEY, referenced by path in the OPENAI_API_KEY_PATH environment variable, or passed as an http query parameter.

Ngrok Account (Optional)

Ngrok can be used to quickly deploy a locally-running server to a public facing URL. Create an account and run pip install ngrok to use.

Assuming that you have followed one of the following installation procedures in the next section, running ngrok http 8000 will forward the public-facing URL to your ViperMCP server.

The address provided by ngrok (or any public facing address) can be used as a substitute for the local address (http://0.0.0.0:8000) we will reference below.

Installation

Smithery Deployment

ViperMCP can be deployed through Smithery.

Dockerized FastMCP Server

Add your OpenAI API key to a file called api.key. In the command below, point the mount source to the location of the api.key.

docker run -i --rm \
--mount type=bind,source=/path/to/api.key,target=/run/secrets/openai_api.key,readonly \
-e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \
-p 8000:8000 \
rsherby/vipermcp:latest

This will begin a CUDA-enabled docker container that can be accessed at http://0.0.0.0:8000/mcp/.

Alternatively, you can use the docker-compose.yaml file to build the image from source and run it. By default, it assumes that the OpenAI API key can be found in the same directory.

If your container provisioner (e.g., cloud provider) allows you to create environment variables and pass them to the container environment, you can also just set the OPENAI_API_KEY variable pre-runtime.

Pure FastMCP Server

Clone the repository to your local device by running the following comand:

git clone --recurse-submodules https://github.com/ryansherby/ViperMCP.git

After cloning, we need to download the pretrained models and set our OpenAI API key. Run the following commands:

cd vipermcp
bash download-models.sh
echo YOUR_OPENAI_API_KEY > api.key

We then suggest creating a virtual environment (e.g., conda or venv) and activating it. This is not a requirement but is generally the best practice for managing Python packages. Then, install the requirements by running the follow commands.

pip install -r requirements.txt
pip install -e .

This will install both the 3rd-party requirements as well the local viper package that is used to standardize import locations.

We can now run our local FastMCP server using the follow command.

python run_server.py

We should be able to access our server now at http://0.0.0.0:8000/mcp/.

To utilize the OpenAI related models, we must pass the OpenAI API key to the following URL like: http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXXXXXXXXXXXXXXXXXX

Usage

FastMCP Client

An example with passing base64-encoded byte-level image data. Image URLs can also be passed.

async with client:
    await client.ping()

    tools = await client.list_tools()  # Optional

    query = await client.call_tool(
        "viper_query",
        {"query": "how many muffins can each kid have for it to be fair?"},
        {"image": f"data:image/png;base64,{image_base64_string}"}
    )

    task = await client.call_tool(
        "viper_task",
        {"task": "return a mask of all the people in the image"},
        {"image": f"data:image/png;base64,{image_base64_string}"}
    )

OpenAI API

Make sure to send the image URL as "type" : "input_text". Currently, the OpenAI API MCP integration cannot handle byte-level image data, so the image must be sent as a public URL.

response = client.responses.create(
    model="gpt-4o",
    tools=[
        {
            "type": "mcp",
            "server_label": "ViperMCP",
            "server_url": f"{server_url}/mcp/",
            "require_approval": "never",
        },
    ],
    input=[
        {
            "role": "system",
            "content": "Forward any queries or tasks relating to an image directly to the ViperMCP server."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "based on this image, how many muffins can each kid have for it to be fair?"
                },
                {
                    "type": "input_text",
                    "text": f"{img_url},
                },
            ],
        },
    ],
)

Appendix

Models

The following models are used in the default version of ViperMCP:

  • Grounding DINO
  • SegmentAnything (SAM)
  • GPT-4o-mini LLM
  • GPT-4o-mini VLM
  • GPT-4.1
  • X-VLM
  • Midas
  • BERT

Warnings

This package generates and executes code on the machine in which it is run. We do not have any direct control over the code that is executed, and thus the prompting mechanism may be used to expose sensitive data. We have included basic injection prevention tools; however, this will not be sufficient to protect your data in a production environment.

If a production-level environment is your goal, we strongly suggest modifying the src/entrypoint.py to define separate client wrappers using the same naming convention (i.e., find, simple_query, etc.) that forward requests to a backend server. Then, the mcp/server.py should be modified to push requests to this client server, which then makes requests of the backend server. An example flow would be like the following:

MCP Server (Query + Image) => Client Server (Generate Code Request) =>
Backend Server (Generates Code) =>
Client Server (Executes Code with Wrapper Functions) =>
Backend Server (Executes Underlying Functions from Wrapper) =>
Client Server (Forwards Result to MCP Server) =>
MCP Server (Returns Result to User)

Citations

Thank you to the team behind ViperGPT! Your framework and subsequent empirical successes have been invaluable in the creation of this project.

@article{surismenon2023vipergpt,
    title={ViperGPT: Visual Inference via Python Execution for Reasoning},
    author={D\'idac Sur\'is and Sachit Menon and Carl Vondrick},
    journal={arXiv preprint arXiv:2303.08128},
    year={2023}
}

Contributions

If you'd like to contribute to the project, please pass the necessary tests (found in /tests) and create a pull request.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选