zvec-mcp

zvec-mcp

Enables AI agents to store, search, and manage a local vector database for RAG knowledge retrieval and long-term memory, powered by zvec.

Category
访问服务器

README

zvec-mcp

Claude Code MCP server that gives your AI agent a local vector database for RAG knowledge retrieval and long-term memory — powered by zvec.


Table of contents


What it does

zvec-mcp runs as a Model Context Protocol server over stdio. Once connected, Claude Code gets 11 new tools for storing, searching, and managing a local vector database — completely offline, no data leaves your machine.


Installation

Prerequisites

Option A — Desktop extension (.mcpb)

The fastest way to get started. Download the pre-built .mcpb file and drag it into Claude Code.

  1. Download zvec-mcp-0.1.0.mcpb from the latest release
  2. Open Claude Code
  3. Go to the Extensions tab
  4. Drag the .mcpb file into the drop zone
  5. Configure your embedding backend in the settings that appear

Note: The .mcpb bundle defaults to the HTTP embedding backend and does not include sentence-transformers. To use local embeddings, install via pip instead (Option B).

Option B — pip install

pip install zvec-mcp

This installs the zvec-mcp command-line entry point. To also get local offline embeddings:

pip install zvec-mcp sentence-transformers

For OpenAI embeddings:

pip install "zvec-mcp[openai]"

Option C — From source

git clone https://github.com/cluster2600/zvec-mcp.git
cd zvec-mcp

# Create a virtual environment and install
uv venv --python 3.10
uv pip install -e .

# Optional: local embeddings
uv pip install sentence-transformers

# Optional: OpenAI embeddings
uv pip install openai

Registering with Claude Code

After installing, you need to tell Claude Code where to find the server.

User scope (available in all projects)

claude mcp add-json zvec-mcp \
  '{"type":"stdio","command":"zvec-mcp","args":[],"env":{"ZVEC_MCP_DATA_DIR":"'"$HOME"'/.zvec-mcp","ZVEC_MCP_EMBEDDING":"http"}}' \
  --scope user

If you installed from source, use the full path to the venv binary:

claude mcp add-json zvec-mcp \
  '{"type":"stdio","command":"'"$(pwd)"'/.venv/bin/zvec-mcp","args":[],"env":{"ZVEC_MCP_DATA_DIR":"'"$HOME"'/.zvec-mcp","ZVEC_MCP_EMBEDDING":"http"}}' \
  --scope user

Project scope (shared via .mcp.json)

Create or edit .mcp.json in your project root:

{
  "mcpServers": {
    "zvec-mcp": {
      "command": "zvec-mcp",
      "args": [],
      "env": {
        "ZVEC_MCP_DATA_DIR": "${HOME}/.zvec-mcp",
        "ZVEC_MCP_EMBEDDING": "http",
        "ZVEC_MCP_HTTP_URL": "http://127.0.0.1:1234/v1/embeddings",
        "ZVEC_MCP_HTTP_MODEL": "text-embedding-nomic-embed-text-v1.5@f16",
        "ZVEC_MCP_HTTP_DIM": "768"
      }
    }
  }
}

Replace "zvec-mcp" with the full path if the command is not on your PATH (e.g. /path/to/zvec-mcp/.venv/bin/zvec-mcp).

Verify

claude mcp list
# zvec-mcp: zvec-mcp  - ✓ Connected

You can also check from inside Claude Code by asking:

"What's the status of zvec-mcp?"

Claude will call the zvec_status tool and display the backend, dimensions, and collection stats.


Embedding backends

zvec-mcp supports three embedding backends. You choose one via the ZVEC_MCP_EMBEDDING environment variable.

HTTP — LM Studio / Ollama / vLLM

Calls any OpenAI-compatible /v1/embeddings endpoint. This is the recommended setup — it keeps embeddings local, runs on GPU if available, and requires no heavy Python dependencies (uses Python's built-in urllib).

Setup with LM Studio:

  1. Install LM Studio
  2. Download an embedding model (e.g. nomic-embed-text-v1.5)
  3. Start the local server (LM Studio → Developer → Start Server)
  4. Note the port (default 1234) and API key if you enabled authentication
{
  "mcpServers": {
    "zvec-mcp": {
      "command": "zvec-mcp",
      "args": [],
      "env": {
        "ZVEC_MCP_EMBEDDING": "http",
        "ZVEC_MCP_HTTP_URL": "http://127.0.0.1:1234/v1/embeddings",
        "ZVEC_MCP_HTTP_MODEL": "text-embedding-nomic-embed-text-v1.5@f16",
        "ZVEC_MCP_HTTP_API_KEY": "your-api-key",
        "ZVEC_MCP_HTTP_DIM": "768"
      }
    }
  }
}

Setup with Ollama:

  1. Install Ollama
  2. Pull an embedding model: ollama pull nomic-embed-text
  3. Ollama serves on port 11434 by default
{
  "env": {
    "ZVEC_MCP_EMBEDDING": "http",
    "ZVEC_MCP_HTTP_URL": "http://127.0.0.1:11434/v1/embeddings",
    "ZVEC_MCP_HTTP_MODEL": "nomic-embed-text",
    "ZVEC_MCP_HTTP_DIM": "768"
  }
}

Important: The ZVEC_MCP_HTTP_DIM value must match the output dimension of your chosen model. Common values: 768 for nomic-embed-text, 384 for all-MiniLM-L6-v2, 1024 for BGE-large.

Local — sentence-transformers

Runs all-MiniLM-L6-v2 entirely offline on your machine. 384 dimensions, uses MPS acceleration on Apple Silicon. No API key needed.

Requires: pip install sentence-transformers (pulls in PyTorch, ~500 MB)

{
  "env": {
    "ZVEC_MCP_EMBEDDING": "local"
  }
}

The model is downloaded automatically on first use (~80 MB, cached in ~/.cache/torch/).

OpenAI

Uses the OpenAI Embeddings API. Requires a valid API key.

{
  "env": {
    "ZVEC_MCP_EMBEDDING": "openai",
    "OPENAI_API_KEY": "sk-..."
  }
}

You can customize the model and dimension:

{
  "env": {
    "ZVEC_MCP_EMBEDDING": "openai",
    "OPENAI_API_KEY": "sk-...",
    "ZVEC_MCP_OPENAI_MODEL": "text-embedding-3-large",
    "ZVEC_MCP_OPENAI_DIM": "3072"
  }
}

Configuration reference

All settings are driven by environment variables, passed through the env block in your MCP config.

Variable Default Description
ZVEC_MCP_DATA_DIR ~/.zvec-mcp Root directory for vector database collections
ZVEC_MCP_EMBEDDING local Embedding backend: local, openai, or http
ZVEC_MCP_CHUNK_SIZE 512 Characters per chunk for knowledge ingestion
ZVEC_MCP_CHUNK_OVERLAP 64 Overlap between consecutive chunks
HTTP backend
ZVEC_MCP_HTTP_URL http://127.0.0.1:1234/v1/embeddings OpenAI-compatible embedding endpoint
ZVEC_MCP_HTTP_MODEL text-embedding-nomic-embed-text-v1.5@f16 Model name sent to the endpoint
ZVEC_MCP_HTTP_API_KEY Bearer token (optional, depends on your server)
ZVEC_MCP_HTTP_DIM 768 Embedding dimension (must match your model)
OpenAI backend
OPENAI_API_KEY Required when using OpenAI backend
ZVEC_MCP_OPENAI_MODEL text-embedding-3-small OpenAI model name
ZVEC_MCP_OPENAI_DIM 1536 Embedding dimension

Tools reference

Knowledge base (RAG)

Tool Description
knowledge_ingest Chunk, embed, and store text for later retrieval. Accepts text and an optional source label.
knowledge_ingest_file Read a file from disk and ingest its contents. Accepts a file path.
knowledge_search Semantic search over ingested documents. Accepts a query and optional topk (default 5).
knowledge_delete_source Remove all chunks from a given source.
knowledge_stats Return collection statistics (doc count, embedding dimension, path).

Memory

Tool Description
memory_remember Store a fact, preference, or observation. Accepts text and optional category (default "general").
memory_recall Semantic recall by query. Accepts query, optional topk (default 5), optional category filter.
memory_forget Delete a specific memory by its memory_id.
memory_forget_category Delete all memories in a category.
memory_stats Return memory store statistics.

Status

Tool Description
zvec_status Return server config, embedding backend, and stats for both collections.

Usage examples

Once registered, you can use the tools naturally in conversation with Claude Code:

Ingest documentation for RAG:

"Ingest the contents of docs/api-reference.md into the knowledge base."

Claude will call knowledge_ingest_file with that path. You can then ask questions like:

"How does the authentication flow work?"

Claude will call knowledge_search to retrieve relevant chunks and use them in its answer.

Store project context as memory:

"Remember that this project uses PostgreSQL 16 and the primary database is called appdb."

Claude will call memory_remember with category "project". Later, in any conversation:

"What database does this project use?"

Claude will call memory_recall and retrieve the stored fact.

Organize memories by category:

Categories help you organize different types of information:

  • "preference" — coding style, tool choices, naming conventions
  • "project" — architecture decisions, database schemas, API contracts
  • "person" — team member roles, contact info
  • "decision" — past decisions with rationale

"Remember that we decided to use JWT for auth instead of sessions — the main reason was stateless scaling."

Claude stores this with the "decision" category.

Clean up:

"Delete all knowledge from source docs/old-api.md."

"Forget all memories in the project category."


Architecture

┌─────────────┐    stdio/JSON-RPC    ┌──────────────────────────┐
│ Claude Code  │◄───────────────────►│  zvec-mcp (FastMCP)      │
│   (client)   │                     │                          │
└─────────────┘                     │  ┌──────────┐            │
                                     │  │ knowledge │ chunk →   │
                                     │  │ manager   │ embed →   │
                                     │  │           │ store     │
                                     │  └─────┬────┘            │
                                     │        │                 │
                                     │  ┌─────▼────┐            │
                                     │  │   zvec    │ vector    │
                                     │  │collection │ search    │
                                     │  └─────┬────┘            │
                                     │        │                 │
                                     │  ┌─────▼────┐            │
                                     │  │  memory   │ remember  │
                                     │  │  manager  │ recall    │
                                     │  └──────────┘            │
                                     │                          │
                                     │  ┌──────────┐            │
                                     │  │embeddings│ local /   │
                                     │  │ (lazy)   │ openai /  │
                                     │  │          │ http      │
                                     │  └──────────┘            │
                                     └──────────────────────────┘

Source layout

src/zvec_mcp/
├── server.py       # FastMCP server — 11 tools registered via @mcp.tool()
├── config.py       # Env-driven dataclass, reads all ZVEC_MCP_* variables
├── embeddings.py   # Lazy-loaded embedding singleton (local, OpenAI, or HTTP)
├── knowledge.py    # RAG pipeline: chunk → embed → upsert → query
└── memory.py       # Semantic memory: remember → recall → forget

How RAG ingestion works

  1. Text is split into overlapping chunks at sentence boundaries (configurable size and overlap)
  2. Each chunk is embedded via the configured backend
  3. Chunks are stored in a zvec Collection with fields: source, chunk_idx, text, created_at
  4. Chunk IDs are content-addressed (sha256(source:idx)) — re-ingesting the same source updates in place

How memory works

  1. Each fact/observation is embedded and stored with a category tag
  2. Memory IDs are content-addressed (sha256(text)) — storing the same text twice is a no-op (deduplication)
  3. Recall uses cosine similarity with optional category filtering
  4. Categories let you organize memories (e.g. preference, project, person, decision)

Data storage

All data lives under ZVEC_MCP_DATA_DIR (default ~/.zvec-mcp/):

~/.zvec-mcp/
├── knowledge/    # zvec collection for RAG chunks
└── memory/       # zvec collection for memories

Collections are created automatically on first use. To reset everything, delete the directory:

rm -rf ~/.zvec-mcp

Troubleshooting

Server not showing as connected

# Check MCP registration
claude mcp list

# Test the server directly (should print JSON-RPC on stdout)
zvec-mcp
# Or from source:
/path/to/zvec-mcp/.venv/bin/zvec-mcp

If the command is not found, make sure the installation directory is on your PATH, or use the full path in your MCP config.

HTTP embedding errors

  • Connection refused: Make sure your LM Studio / Ollama / vLLM server is running
  • 401 Unauthorized: Set ZVEC_MCP_HTTP_API_KEY to your server's API key
  • Dimension mismatch: If you get errors when searching after changing models, delete your data directory (rm -rf ~/.zvec-mcp) and re-ingest — you cannot mix embeddings of different dimensions in the same collection

Local embedding slow on first run

The all-MiniLM-L6-v2 model (~80 MB) is downloaded on first use. Subsequent runs load from cache (~/.cache/torch/).

sentence-transformers not found

The local backend requires sentence-transformers. Install it:

pip install sentence-transformers

Or switch to the http backend if you have LM Studio or Ollama running.

Resetting the database

To start fresh, delete the data directory:

rm -rf ~/.zvec-mcp

Collections are recreated automatically on next use.


Building the .mcpb bundle

To build the desktop extension from source:

# Install mcpb CLI
npm install -g @anthropic-ai/mcpb

# Install bundled dependencies (core only, no torch/sentence-transformers)
# IMPORTANT: let pip resolve versions — do NOT use --no-deps
pip install --target bundle/server/lib "mcp[cli]>=1.0.0" zvec numpy

# Copy zvec_mcp source into the bundle
cp -r src/zvec_mcp bundle/server/lib/zvec_mcp

# Strip caches (keep .dist-info — needed for importlib.metadata)
cd bundle/server/lib
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find . -name "*.pyc" -delete 2>/dev/null
cd ../../..

# Validate manifest and pack
mcpb validate bundle/manifest.json
mcpb pack bundle/ zvec-mcp-0.1.0.mcpb

Do not strip .dist-info directories — the MCP SDK uses importlib.metadata.version("mcp") at import time and will fail without them. Do not use --no-deps — pydantic and pydantic-core have strict version coupling and must be resolved together.


Dependencies

Package Role
zvec Vector database engine
mcp Model Context Protocol SDK (FastMCP)
sentence-transformers Local embeddings (optional)
openai OpenAI embeddings (optional)

License

Apache-2.0

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选