MCP VectorStore Server
Provides advanced document search and processing capabilities through vector stores, including PDF processing, semantic search, web search integration, and file operations. Enables users to create searchable document collections and retrieve relevant information using natural language queries.
README
MCP VectorStore Server
A Model Context Protocol (MCP) server that provides advanced vector store operations for document search, PDF processing, and information retrieval. This server wraps the functionality from vectorstore.py into a standardized MCP interface.
Features
- Vector Store Operations: Create, search, and manage document vector stores
- PDF Processing: Extract and index content from PDF documents using LLMSherpa
- Semantic Search: Advanced document search using HuggingFace embeddings
- Web Search Integration: Google, Wikipedia, and DuckDuckGo search capabilities
- File Operations: Read and process local files
- Mathematical Calculations: Built-in calculator functionality
Prerequisites
System Requirements
- Python: 3.8 or higher
- Operating System: Linux, macOS, or Windows
- Memory: Minimum 4GB RAM (8GB+ recommended for large document collections)
- Storage: At least 2GB free space for models and vector stores
- Network: Internet connection for downloading models and web searches
Optional GPU Support
For improved performance with large document collections:
- CUDA: 11.8 or higher
- GPU: NVIDIA GPU with 4GB+ VRAM
- cuDNN: Compatible version for your CUDA installation
Installation
Step 1: Clone or Download the Repository
# If you have the files locally, navigate to the directory
cd /path/to/McpDocServer
# Or clone from a repository (if available)
# git clone <repository-url>
# cd McpDocServer
Step 2: Create a Virtual Environment
# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
# On Linux/macOS:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
Step 3: Install Dependencies
# Upgrade pip
pip install --upgrade pip
# Install all required packages
pip install -r requirements.txt
Step 4: Install LLMSherpa (Optional but Recommended)
For optimal PDF processing, install LLMSherpa locally:
# Install LLMSherpa
pip install llmsherpa
# Start the LLMSherpa server (in a separate terminal)
llmsherpa --port 5001
Step 5: Download Embedding Models
The server will automatically download the required embedding model on first use, but you can pre-download it:
# Download the embedding model
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-mpnet-base-v2')"
Configuration
Environment Variables
Create a .env file in the project directory:
# LLMSherpa API URL (use local if available, otherwise cloud)
LLMSHERPA_API_URL=http://localhost:5001/api/parseDocument?renderFormat=all
# Vector store directory
VECTORSTORE_DIR=/path/to/your/documents
# User agent for web scraping
USER_AGENT=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
# Optional: CUDA device for GPU acceleration
CUDA_VISIBLE_DEVICES=0
Directory Structure
Prepare your document directory:
your_documents/
├── pdfs/
│ ├── document1.pdf
│ ├── document2.pdf
│ └── ...
├── text_files/
│ ├── notes.txt
│ └── ...
└── other_documents/
└── ...
Usage
Starting the MCP Server
# Make the server executable
chmod +x mcp_vectorstore_server.py
# Start the server on linux
python /home/em/McpDocServer/mcp_vectorstore_server.py
or windows with wsl
wsl -d Ubuntu-24.04 bash -c "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh"
Using with MCP Clients
0. Claude Desktop
Add to your MCP configuration:
{
"mcpServers": {
"vectorstore": {
"command": "python",
"args": ["/home/em/McpDocServer/mcp_vectorstore_server.py"],
"env": {
"PYTHONPATH": "/home/em/McpDocServer/McpDocServer"
}
}
}
}
1. GitHub Copilot
- Click on Configure Tools in the GitHub Copilot Chat window:<br>
- Click on Add More Tools in the top search bar.<br>
- Click on Add MCP Server in the top search bar.<br>
- Click on command (stdio) in the top search bar.<br>
- Enter command to run:<br>
- python /home/em/McpDocServer/mcp_vectorstore_server.py<br> or on windows: wsl -d Ubuntu-24.04 /mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh
- Enter mcp server id / name e.g. McpDocServer-19be5552<br>
- Configure settings.json<br>
{
"security.workspace.trust.untrustedFiles": "open",
"python.defaultInterpreterPath": "/mnt/c/Users/emanu/Desktop/LLM/venv/venv/bin/python",
"terminal.integrated.inheritEnv": false,
"git.openRepositoryInParentFolders": "never",
"terminal.integrated.scrollback": 100000,
"mcp": {
"servers": {
"McpDocServer-19be5552": {
"type": "stdio",
"command": "python",
"args": [
"/mnt/c/Users/emanu/Desktop/McpDocServer/mcp_vectorstore_server.py"
]
}
}
}
}
- Check if the following tools are available in the mcp server tool list when you click on Configure Tools in the GitHub Copilot Chat window and scroll to bottom:<br> vectorstore_search<br> vectorstore_create<br> vectorstore_info<br> vectorstore_clear<br> read_file<br> google_search<br> wikipedia_search<br> duckduckgo_search<br> calculate<br>
- Select Agent mode in GitHub Copilot Chat window and use vectorstore_search to get information:<br> use vectorstore_search to get information on unit testing<br> 11)Confirm tool call usage.
2. Continue MCP CLient
name: McpDocServer
version: 1.0.1
schema: v1
mcpServers:
- name: McpDocServer
command: wsl -d Ubuntu-24.04
args:
- "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh"
env: {}
mcp_timeout: 180 # set timeout to 180 sec
timeout: 9999
connectionTimeout: 120000 # 120 seconds = 2 minutes
3. Other MCP Clients
Configure your MCP client to use the server:
# Example with a generic MCP client
mcp-client --server python --args /path/to/McpDocServer/mcp_vectorstore_server.py
Available Tools
Vector Store Operations
vectorstore_search
Search the vector store for relevant documents.
Parameters:
query(string, required): Search queryk(integer, optional): Number of results (default: 2)
Example:
{
"name": "vectorstore_search",
"arguments": {
"query": "machine learning algorithms",
"k": 5
}
}
vectorstore_create
Create a new vector store from documents in a directory.
Parameters:
directory_path(string, required): Path to directory containing documents
Example:
{
"name": "vectorstore_create",
"arguments": {
"directory_path": "/home/user/documents/research_papers"
}
}
vectorstore_info
Get information about the current vector store.
Example:
{
"name": "vectorstore_info",
"arguments": {}
}
vectorstore_clear
Clear all documents from the vector store.
Example:
{
"name": "vectorstore_clear",
"arguments": {}
}
File Operations
read_file
Read the contents of a file on the system.
Parameters:
filename(string, required): Path to the file to read
Example:
{
"name": "read_file",
"arguments": {
"filename": "/home/user/documents/notes.txt"
}
}
Web Search Operations
google_search
Search Google for information.
Parameters:
query(string, required): Search querymax_results(integer, optional): Maximum number of results (default: 3)
Example:
{
"name": "google_search",
"arguments": {
"query": "latest AI developments 2024",
"max_results": 5
}
}
wikipedia_search
Search Wikipedia for information.
Parameters:
query(string, required): Search query
Example:
{
"name": "wikipedia_search",
"arguments": {
"query": "artificial intelligence"
}
}
duckduckgo_search
Search DuckDuckGo for information.
Parameters:
query(string, required): Search query
Example:
{
"name": "duckduckgo_search",
"arguments": {
"query": "privacy-focused search engines"
}
}
Utility Operations
calculate
Perform mathematical calculations.
Parameters:
operation(string, required): Mathematical operation to perform
Example:
{
"name": "calculate",
"arguments": {
"operation": "2 + 2 * 3"
}
}
Resources
The server provides the following resources:
vectorstore://info
Returns information about the current vector store in JSON format.
Example Response:
{
"num_documents": 150,
"directory": "/home/user/documents",
"embeddings_model": "sentence-transformers/all-mpnet-base-v2"
}
Troubleshooting
Common Issues
1. Import Errors
Problem: ModuleNotFoundError for various packages
Solution: Ensure all dependencies are installed:
pip install -r requirements.txt
2. CUDA/GPU Issues
Problem: CUDA-related errors Solution: Install CPU-only versions:
pip uninstall faiss-gpu torch
pip install faiss-cpu
3. LLMSherpa Connection Issues
Problem: Cannot connect to LLMSherpa API Solution:
- Start LLMSherpa server:
llmsherpa --port 5001 - Or use cloud API by updating the URL in the code
4. Memory Issues
Problem: Out of memory errors with large documents Solution:
- Reduce chunk size in the text splitter
- Use smaller embedding models
- Process documents in batches
5. Permission Issues
Problem: Cannot read files or directories Solution: Check file permissions:
chmod 644 /path/to/documents/*
chmod 755 /path/to/documents/
Performance Optimization
For Large Document Collections
-
Use GPU acceleration:
# In vectorstore.py, ensure CUDA is enabled model_kwargs={'device': 'cuda'} -
Optimize chunk size:
# Adjust in PDFVectorStoreTool.__init__ chunk_size=1000, # Smaller chunks for better performance chunk_overlap=100, -
Batch processing:
# Process documents in smaller batches batch_size = 10
For Better Search Results
-
Adjust similarity threshold:
# In vectorstore_search method similarity_threshold = 0.7 -
Use different embedding models:
# Try different models for better results model_name="sentence-transformers/all-MiniLM-L6-v2" # Faster model_name="sentence-transformers/all-mpnet-base-v2" # Better quality
Development
Project Structure
McpDocServer/
├── mcp_vectorstore_server.py # Main MCP server
├── vectorstore.py # Original vectorstore implementation
├── requirements.txt # Python dependencies
├── README.md # This documentation
└── .env # Environment variables (create this)
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Testing
# Run basic functionality tests
python -c "
from mcp_vectorstore_server import *
print('Server imports successfully')
"
# Test vector store operations
python -c "
from vectorstore import PDFVectorStoreTool
tool = PDFVectorStoreTool()
print(f'Vector store initialized with {tool.vectorstore_get_num_items()} documents')
"
License
This project is provided as-is for educational and research purposes. Please ensure you comply with the licenses of all included dependencies.
Support
For issues and questions:
- Check the troubleshooting section above
- Review the error logs
- Ensure all dependencies are correctly installed
- Verify your system meets the requirements
Changelog
Version 1.0.0
- Initial release
- MCP server implementation
- Vector store operations
- Web search integration
- File operations
- Mathematical calculations
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。