keyphrases-mcp

keyphrases-mcp

Enables AI-driven workflows to extract keyphrases more accurately and with higher relevance using the BERT machine learning model. It works directly with your local files in the allowed directories saving the context tokens for your agentic LLM.

Category
访问服务器

README

🔤 Keyphrases-MCP

Empowering LLMs with authentic keyphrase Extraction

Built with the following tools and technologies:

<img src="https://img.shields.io/badge/MCP-6A5ACD.svg?style=default&logo=" alt="MCP"> <img src="https://img.shields.io/badge/PyTorch-EE4C2C.svg?style=default&logo=PyTorch&logoColor=white" alt="PyTorch"> <img src="https://img.shields.io/badge/Python-3776AB.svg?style=default&logo=Python&logoColor=white" alt="Python"> <img src="https://img.shields.io/badge/uv-DE5FE9.svg?style=default&logo=uv&logoColor=white" alt="uv">


Overview

This Keyphrases MCP Server is a natural language interface designed for agentic applications to extract keyphrasess from provided text. It integrates seamlessly with MCP (Model Content Protocol) clients, enabling AI-driven workflows to extract keyphrases more accurately and with higher relevance using the BERT machine learning model. It works directly with your local files in the allowed directories saving the context tokens for your agentic LLM. The application exposes found keyphrases but not file's content to the MCP client.

Using this MCP Server, you can ask the following question:

  • "Extract 7 keyphrases from the file. [ABSOLUTE_FILE_PATH]"
  • "Extract 3 keyphrases from the given file ignoring the stop words. Stop words: former, due, amount, [OTHER_STOP_WORDS]. File: [ABSOLUTE_FILE_PATH]"

Keyphrases help users quickly grasp the main topics and themes of a document without reading it in full and enable the following applications:

  1. tags or metadata for documents, improving organization and discoverability in digital libraries
  2. emerging trends, sentiment, identified from customer reviews, social media, or news articles
  3. features or inputs for other tasks, such as text classification, clustering

Reasoning for keyphrases-mcp

Autoregressive LLM models such as in Claude or ChatGPT process text sequentially, which—not only limits their ability to fully contextualize keyphrases across the entire document—but also suffers from context degradation as the input length increases, causing earlier keyphrases to receive diluted attention.

Bidirectional models like BERT, by considering both left and right context and maintaining more consistent attention across the sequence, generally extract existing keyphrases from texts more accurately and with higher relevance especially when no domain-specific fine-tuning is applied.

However, as autoregressive models adopt longer context windows and techniques such as input chunking, their performance in keyphrase extraction is improving, narrowing the gap with BERT. And domain-specific fine-tuning can make autoregressive LLM model to outperform the BERT solution.

This MCP server combines BERT for keyphrase extraction with an autoregressive LLM for text generation or refinement, enabling seamless text processing.

How it works

The server uses a KeyBERT framework for the multi-step extraction pipeline combining spaCy NLP preprocessing with BERT embeddings:

  1. Candidate Generation: KeyphraseCountVectorizer identifies meaningful keyphrase candidates using spaCy's en_core_web_trf model and discarding stop words
  2. Semantic Encoding: Candidates and document are embedded using paraphrase-multilingual-MiniLM-L12-v2 sentence transformer
  3. Relevance Ranking: KeyBERT calculates cosine similarity between candidate keyphrase and document embeddings
  4. Diversity Selection: Maximal Marginal Relevance (MMR) ensures diverse, non-redundant keyphrases
  5. Final Output: Top N most relevant and diverse keyphrases are selected and sorted alphabetically

There are various pretrained embedding models for BERT. The "paraphrase-multilingual-MiniLM-L12-v2" for multi-lingual documents or any other language that is used by default.

You can specify "all-MiniLM-L6-v2" model for English documents by exporting MCP_KEYPHRASES_EMBEDDINGS_MODEL environment variable (see the src/config.py for details).

Integration

OpenAI

Run the keyphrases-mcp server locally and expose it to the internet via ngrok:

uvx --from git+https://github.com/IvanRublev/keyphrases-mcp.git start-mcp-server --allowed-dir <path_to_documents> --http
ngrok http 8000

Note the public URL (e.g., https://your-server.ngrok.io) for the next steps.

Add to ChatGPT with the following: ​

  1. Enable Developer Mode Open ChatGPT and go to Settings → Connectors Under Advanced, toggle Developer Mode to enabled ​
  2. Create Connector In Settings → Connectors, click Create Enter: Name: Keyphrases-MCP Server URL: https://your-server.ngrok.io/mcp/ Check I trust this provider Click Create

Use in Chat

  1. Start a new chat

  2. Click the + button → More → Developer Mode Enable your MCP server connector (required - the connector must be explicitly added to each chat)

Now you can use the tool.

With Docker

You can use a dockerized deployment of this server to provide access via Streamable HTTP transport to MCP clients as follows:

Build the image, it will take ~10 GB of the disk space.

docker build -f Dockerfile-deploy -t keyphrases-mcp .

Run the container exposing ports, temporary directory to store the embeddings model, and documents directory.

docker run --rm --name keyphrases-mcp-server -i -v <tmp_directory_path>/embedding_model:/app/embedding_model -v <path_to_documents>:/app/documents -p 8000:8000 keyphrases-mcp:latest

OpenAI Agents SDK

Integrate this MCP Server with the OpenAI Agents SDK. Read the documents to learn more about the integration of the SDK with MCP.

Install the Python SDK.

pip install openai-agents

Configure the OpenAI token:

export OPENAI_API_KEY="<openai_token>"

And run the application.

cd openai_agents_sdk && python keyphrases_assistant.py --allowed-dir <path_to_documents>

You can troubleshoot your agent workflows using the OpenAI dashboard.

Claude Desktop

Run the following command once to download embedding models.

<path_to_uvx>/bin/uvx --from git+https://github.com/IvanRublev/keyphrases-mcp.git keyphrases-mcp-server --download-embeddings

Update the Claude configuration file on macOS: ~/Library/Application Support/Claude/claude_desktop_config.json on windows: %APPDATA%\Claude\claude_desktop_config.json

Add the kyphrases-mcp server configuration to run it from pypi org with uvx:

{
  "mcpServers": {
    "keyphrases-mcp-server": {
        "type": "stdio",
        "command": "<path_to_uvx>/bin/uvx",
        "args": [
            "--from", "git+https://github.com/IvanRublev/keyphrases-mcp.git",
            "keyphrases-mcp-server",
            "--allowed-dir", "<path_to_documents>"
        ]
    }
  }
}

Start the application. It will take some time do download ~1 GB of dependencies on the first launch.

Alternatively, you can clone the source code from the GitHub repository and start the server using uv. This is usually desired for development.

{
  "mcpServers": {
    "keyphrases-mcp-server": {
        "type": "stdio",
        "command": "<path_to_uv>/bin/uv",
        "args": [
	        "run",
            "--directory", "<path_to_keyphrases-mcp>/src",
            "-m", "main",
            "--allowed-dir", "<path_to_documents>"
        ]
    }
  }
}

Development

Build from the source and intsall dependencies:

git clone https://github.com/IvanRublev/keyphrases-mcp.git
cd keyphrases-mcp
asdf install
uv venv --no-managed-python
uv sync --dev --locked

Run linters and tests with:

ruff check . 
pyrefly check .
pytest

Integration testing

You can use the MCP Inspector for visual debugging of this MCP Server.

npx @modelcontextprotocol/inspector uv run src/main.py --allowed-dir <path_to_documents>

Contributing

  1. Fork the repo
  2. Create a new branch (feature-branch)
  3. Run linters and tests
  4. Commit your changes
  5. Push to your branch and submit a PR!

License

This project is licensed under the MIT License.

Contact

For questions or support, reach out via GitHub Issues.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选