Mnemo MCP
Persistent AI memory server with hybrid search and embedded sync. Enables AI agents to store, retrieve, and manage information across sessions with temporal knowledge graph support.
README
Mnemo MCP Server
mcp-name: io.github.n24q02m/mnemo-mcp
Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.
<!-- BEGIN: AUTO-GENERATED-CROSS-PROMO --> <details> <summary><strong>Sister projects from n24q02m</strong> (click to expand)</summary>
| Project | Tagline | Tag |
|---|---|---|
| better-code-review-graph | Knowledge graph for token-efficient code reviews -- semantic search and call-... | MCP |
| better-email-mcp | IMAP/SMTP email for AI agents -- read, send, organize folders, and manage att... | MCP |
| better-godot-mcp | Composite MCP server for Godot Engine -- 17 composite tools for AI-assisted g... | MCP |
| better-notion-mcp | Markdown-first Notion for AI agents -- pages, databases, blocks, and comments... | MCP |
| better-telegram-mcp | Telegram for AI agents -- messages, chats, media, and contacts across both bo... | MCP |
| claude-plugins | Claude Code plugin marketplace for the n24q02m MCP servers -- install web sea... | Marketplace |
| imagine-mcp | Image and video understanding + generation for AI agents -- across Gemini, Op... | MCP |
| jules-task-archiver | Chrome Extension for bulk operations on Jules tasks via batchexecute API -- a... | Tooling |
| mcp-core | Shared foundation for building MCP servers -- Streamable HTTP transport, OAut... | MCP |
| mnemo-mcp | Persistent AI memory with hybrid search and embedded sync. Open, free, unlimi... | MCP |
| qwen3-embed | Lightweight Qwen3 text embedding and reranking via ONNX Runtime and GGUF | Library |
| skret | Secrets without the server. | CLI |
| tacet | TACET: a self-distilling neuro-symbolic cascade that amortises LLM cost in kn... | Tooling |
| web-core | Shared web infrastructure package for search, scraping, HTTP security, and st... | Library |
| wet-mcp | Open-source MCP server for AI agents: web search, content extraction, and lib... | MCP |
</details> <!-- END: AUTO-GENERATED-CROSS-PROMO -->
Table of contents
- Roadmap
- Features
- Quick install
- Configuration
- Comparison vs. peers
- Status
- Documentation
- Tools
- Security
- Build from Source
- Trust Model
- License
<a href="https://glama.ai/mcp/servers/n24q02m/mnemo-mcp"> <img width="380" height="200" src="https://glama.ai/mcp/servers/n24q02m/mnemo-mcp/badge" alt="Mnemo MCP server" /> </a>
Roadmap
All three phases below have shipped. The temporal knowledge graph (Phase 3) is the current major line (v2.x).
| Phase | Version | Status | Highlights |
|---|---|---|---|
| Phase 1 | v1.x | Shipped | Typed memory(action="capture") (6 context_types + dedup) -- RRF (k=60) hybrid fusion + cross-encoder rerank + temporal decay -- importance x recency archive policy + restore -- Alembic migrations -- multi-provider LLM dispatch -- plugin trinity (recall-context + memory-commit skills, SessionStart + opt-in PostToolUse hooks) |
| Phase 2 | v1.x | Shipped | LLM-driven compression of older memories + Passport sync (encrypted import/export bundle for cross-machine bootstrap) -- AES-256-GCM + Argon2id, S3 / R2 / B2 / MinIO + GDrive backends, delta-sync with LWW per row |
| Phase 3 | v2.x | Shipped (BREAKING) | Temporal knowledge graph -- bitemporal valid_from / valid_to columns -- entity resolution via embedding KNN -- entity_search / entity_graph / history actions -- KG-aware passport bundle sections -- KG_AUTO_ENABLED opt-in auto-extract on capture |
Features
- Hybrid retrieval -- FTS5 + sqlite-vec, fused via Reciprocal Rank Fusion (k=60), then re-ranked by a configurable rerank chain (
RERANK_MODELS, order = litellm fallback; empty -> local qwen3-reranker) with temporal decay and importance boost - Typed capture --
memory(action="capture")with 6 context_types (conversation/fact/preference/skill/task/decision), embedding-based dedup, and a configurable LLM chain (LLM_MODELS, order = litellm fallback) - Knowledge graph -- Automatic entity extraction and relation tracking; top results boosted by graph proximity
- Importance scoring + archive policy -- LLM-scored 0.0-1.0 importance; soft-archive when
recency_factor * (1 - importance) > 1.0; restore action available - Auto-archive trigger -- Background sweep every Nth capture (default 100) -- no cron required
- STM-to-LTM consolidation -- LLM summarization of related memories in a category
- Duplicate detection -- Warns before adding semantically similar memories
- Zero config -- Built-in local Qwen3 ONNX embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
- Multi-machine sync -- JSONL-based merge sync via Google Drive (bundled Desktop OAuth public client)
- Plugin trinity -- Ships
/recall-context+/memory-commitskills and SessionStart + opt-in PostToolUse hooks (see docs/ARCHITECTURE.md) - Proactive memory -- Tool descriptions and skills guide AI to save preferences, decisions, facts at the right moment
- LLM compression -- Per-turn compression via the multi-provider dispatcher targets ~3x token reduction at >=0.90 fact retention; graceful skip when no provider configured (see docs/compression.md)
- Encrypted passport sync -- AES-256-GCM bundles + Argon2id KDF, S3 (R2 / B2 / MinIO) and Google Drive backends, delta-sync with last-write-wins per row (see docs/passport.md). Bootstrap via the
passport-bootstrapskill. - Temporal knowledge graph -- Bitemporal columns (
valid_from/valid_to/superseded_by) on every memory + entity-resolution dedup (embedding KNN at default 0.85 cosine threshold) + audit trail (memory_audittable with prev/new state hashes) + new actions (entity_search/entity_graph/history) + opt-inKG_AUTO_ENABLEDauto-extract on capture. BREAKING for clients that calledmemory.getexpecting historical-inclusive results: passas_offor time-travel; default now filters to current-state (valid_to IS NULL).
Quick install
# Method 1 (default): plugin install via Claude Code
/plugin marketplace add n24q02m/claude-plugins
/plugin install mnemo-mcp@n24q02m-plugins
# Method 1 (CLI): direct uvx invocation (zero config -- runs on the built-in local model)
claude mcp add mnemo -- uvx mnemo-mcp
# Method 3 (HTTP / multi-device / multi-user)
docker run -d --name mnemo-mcp-http -p 8085:8080 \
-v mnemo-data:/data -e MCP_TRANSPORT=http \
-e PUBLIC_URL=https://mnemo.example.com \
n24q02m/mnemo-mcp:latest
No API keys are required: with no provider keys set, mnemo runs fully offline on the bundled local Qwen3 ONNX embedding + reranker. Add cloud provider keys only to switch embedding / rerank / LLM onto a hosted model (see Configuration).
Full setup matrices live at the canonical docs site mcp.n24q02m.com/servers/mnemo-mcp/setup/ and the paste-to-agent snippet at claude-plugins/plugins/mnemo-mcp/setup-with-agent.md.
Configuration
All settings are plain environment variables (no prefix). Everything is optional -- mnemo runs zero-config on the local model. The most common knobs:
Model selection (per-task chains)
Embedding, reranking, and LLM features each take an ordered, comma-separated chain of
provider/model entries (tried in order, litellm fallback). Leave a chain empty to use
the bundled local model (embedding / rerank) or to disable the feature (LLM).
| Env var | Default | Purpose |
|---|---|---|
EMBEDDING_MODELS |
(empty -> local Qwen3 ONNX) | Embedding chain, e.g. jina_ai/jina-embeddings-v5-text-small,gemini/gemini-embedding-001 |
RERANK_MODELS |
(empty -> local Qwen3 cross-encoder) | Rerank chain, e.g. jina_ai/jina-reranker-v3,cohere/rerank-v3.5 |
LLM_MODELS |
(built-in cloud chain) | LLM chain for graph extraction / importance / compression; empty disables those features |
EMBEDDING_DIMS |
768 |
Embedding dimensions (0 = auto-detect) |
Provider is inferred from the model prefix; supply each provider's key via the litellm
<PROVIDER>_API_KEY convention:
| model prefix | key env var | get it at |
|---|---|---|
jina_ai/ |
JINA_AI_API_KEY |
jina.ai/api-dashboard |
gemini/ |
GEMINI_API_KEY |
aistudio.google.com/apikey |
openai/ (or bare) |
OPENAI_API_KEY |
platform.openai.com/api-keys |
cohere/ |
COHERE_API_KEY |
dashboard.cohere.com/api-keys |
Any other litellm provider works via env passthrough; see
https://docs.litellm.ai/docs/providers/<provider> for its <PROVIDER>_API_KEY name.
Custom OpenAI-compatible endpoints (SSRF-guarded): LLM_API_BASE, EMBEDDING_API_BASE,
RERANK_API_BASE.
Changing the embedding model changes the vector space. A safe-by-default guard blocks boot on mismatch; set
REINDEX_ON_MODEL_CHANGE=trueto re-embed.
Storage, sync, retrieval, and archive
| Env var | Default | Purpose |
|---|---|---|
DB_PATH |
~/.mnemo-mcp/memories.db |
SQLite database path (also accepts MNEMO_DB_PATH) |
SYNC_ENABLED |
true |
Enable Google Drive multi-machine sync |
GOOGLE_DRIVE_CLIENT_ID |
(none) | OAuth client ID required for sync |
SYNC_FOLDER |
mnemo-mcp |
Google Drive folder name |
SYNC_INTERVAL |
300 |
Auto-sync interval in seconds (0 = manual only) |
RERANK_ENABLED |
true |
Enable reranking of fused results |
RERANK_TOP_N |
10 |
Number of reranked results to keep |
ARCHIVE_ENABLED |
true |
Enable importance x recency soft-archive sweeps |
ARCHIVE_AFTER_DAYS |
90 |
Age before a memory is eligible for archive |
DEDUP_THRESHOLD |
0.9 |
Similarity above which a new memory is a duplicate |
RECENCY_HALF_LIFE_DAYS |
7 |
Half-life for temporal decay scoring |
KG_AUTO_ENABLED |
false |
Auto-extract entities + relations on capture |
LOG_LEVEL |
INFO |
Log verbosity |
Manual config example
{
"mcpServers": {
"mnemo": {
"command": "uvx",
"args": ["mnemo-mcp"],
"env": {
"EMBEDDING_MODELS": "jina_ai/jina-embeddings-v5-text-small,gemini/gemini-embedding-001",
"RERANK_MODELS": "jina_ai/jina-reranker-v3",
"LLM_MODELS": "gemini/gemini-3-flash-preview",
"JINA_AI_API_KEY": "jina_xxx",
"GEMINI_API_KEY": "AIza_xxx"
}
}
}
}
Comparison vs. peers
| Feature | mnemo-mcp | Mem0 | Letta | OpenMemory |
|---|---|---|---|---|
| Hybrid retrieval (FTS + vec) | yes (FTS5 + sqlite-vec + RRF) | yes | partial | yes |
| Cross-encoder rerank chain | yes (qwen3 local + Jina + Cohere) | partial (Cohere only) | no | no |
| Temporal decay scoring | yes (exp half-life) | no | no | no |
| Importance boost in rank | yes (LLM 0.0-1.0) | no | no | no |
| Soft-archive + restore policy | yes (importance x recency) | no | no | no |
| Self-hostable (single SQLite file) | yes (zero ext deps) | partial (cloud-first) | yes (Postgres) | yes (Postgres + Qdrant) |
| Multi-provider LLM dispatch | yes (LLM_MODELS chain, any litellm provider) |
partial | yes | partial |
| Plugin trinity (skills + hooks) | yes (recall-context + memory-commit) | n/a | n/a | n/a |
| Multi-machine sync | yes (GDrive bundled OAuth) | yes (cloud) | n/a | n/a |
| E2E-encrypted passport sync | yes (AES-256-GCM + Argon2id, S3 + GDrive) | no | no | no |
| LLM compression on capture | yes (multi-provider, ~3x at >=0.90 retention) | no | no | no |
| Backend-pluggable sync architecture | yes (S3 / R2 / B2 / MinIO + GDrive) | no | no | no |
Bitemporal valid_from / valid_to queries |
yes (as_of time-travel) |
no | partial (events only) | no |
| Entity resolution via embedding KNN | yes (cosine threshold tunable) | no | no | no |
| Audit trail with state hashes | yes (memory_audit table) |
no | no | no |
Status
2026-05-02 -- Architecture stabilization update
Past months saw significant churn around credential handling and the daemon-bridge auto-spawn pattern. This caused multi-process races, browser tab spam, and inconsistent setup UX across plugins. The architecture is now stable: 2 clean modes (stdio + HTTP), no daemon-bridge layer, no auto-spawn from stdio.
Apologies for the instability period. If you encountered issues with prior versions, please update to the latest release and follow the current setup docs -- most prior workarounds are no longer needed.
Related plugins from the same author:
- wet-mcp -- Web search + content extraction
- imagine-mcp -- Image/video understanding + generation
- better-notion-mcp -- Notion API
- better-email-mcp -- Email management
- better-telegram-mcp -- Telegram
- better-godot-mcp -- Godot Engine
- better-code-review-graph -- Code review knowledge graph
All plugins share the same architecture -- install once, learn pattern transfers.
Documentation
Full docs at mcp.n24q02m.com/servers/mnemo-mcp/setup/:
- Setup -- install methods for Claude Code, Codex, Gemini CLI, Cursor, Windsurf, mcp.json
- Modes overview -- stdio / local-relay / remote-relay / remote-oauth
- Multi-user setup -- per-JWT-sub credential model
In-repo references:
docs/ARCHITECTURE.md-- storage layout, embedding / rerank dispatch, knowledge graph, plugin trinitydocs/compression.md-- LLM compression pipelinedocs/passport.md-- encrypted passport sync (S3 / GDrive backends)docs/BENCHMARKS.md-- retrieval quality + latency metrics
Install with AI agent -- paste this to your AI coding agent:
Install MCP server
mnemo-mcpfollowing the steps at https://raw.githubusercontent.com/n24q02m/claude-plugins/main/plugins/mnemo-mcp/setup-with-agent.md
Tools
15 MCP tools, 17 memory actions. The memory surface is exposed both as 11 specialized single-purpose tools and a legacy memory dispatcher (same actions), plus config, help, and config__open_relay:
| Tool | Actions | Description |
|---|---|---|
add_memory, search_memory, list_memories, update_memory, delete_memory, export_memories, import_memories, memory_stats, restore_memory, archived_memories, consolidate_memories |
(one action each) | Specialized single-purpose memory tools -- the recommended surface |
memory (legacy dispatcher) |
add, capture, search, list, update, delete, export, import, stats, restore, archived, archive_now, consolidate, compress, entity_search, entity_graph, history |
Core CRUD + typed capture (6 context_types) + hybrid search (RRF + rerank + temporal decay) + import/export + soft-archive + restore + on-demand archive sweep + LLM consolidation + LLM compression + temporal KG (entity search / graph / history) |
config |
status, sync, set, warmup, setup_sync, setup_status, setup_start, setup_skip, setup_reset, setup_complete, setup_relay, sync_now, export_passport, import_passport |
Server status, trigger sync, update settings, pre-download embedding model, authenticate sync provider, manage HTTP setup form lifecycle, passport export/import |
help |
topic="memory" or topic="config" |
Full documentation for any tool |
config__open_relay |
(HTTP relay mode) | Open the zero-config relay setup form (registered via mcp-core) |
Plugin trinity (Claude Code marketplace install):
| Component | Trigger | Purpose |
|---|---|---|
mnemo:recall-context skill |
session start, before significant decisions, "what do I know about X?" | Pulls cwd / topic-relevant memories with context_type filtering |
mnemo:memory-commit skill |
"remember this" / "save this" / "ghi nho" / "luu lai" | Typed manual capture with context_type decision tree |
mnemo:knowledge-audit skill |
periodic / "audit memory" | Find duplicates, contradictions, stale entries; consolidate |
mnemo:session-handoff skill |
end of session | Capture decisions / preferences / corrections / conventions / open questions |
| SessionStart hook | every session init | Non-blocking nudge to invoke recall-context |
| PostToolUse hook (opt-in) | CAPTURE_AUTO_ENABLED=true |
Hint memory-commit after Write/Edit of CLAUDE.md / AGENTS.md / ARCHITECTURE.md / docs/*.md |
MCP Resources
| URI | Description |
|---|---|
mnemo://stats |
Database statistics and server status |
MCP Prompts
| Prompt | Parameters | Description |
|---|---|---|
save_summary |
summary |
Generate prompt to save a conversation summary as memory |
recall_context |
topic |
Generate prompt to recall relevant memories about a topic |
Security
- Graceful fallbacks -- Cloud → Local embedding, no cross-mode fallback
- Sync token security -- OAuth tokens stored at
~/.mnemo-mcp/tokens/with 600 permissions - Input validation -- Sync provider, folder, remote validated against allowlists
- Error sanitization -- No credentials in error messages
Build from Source
git clone https://github.com/n24q02m/mnemo-mcp.git
cd mnemo-mcp
uv sync
uv run mnemo-mcp
Trust Model
This plugin implements TC-Local (machine-bound, single trust principal). The mode/storage/encryption breakdown below is the full classification.
| Mode | Credentials | Memory data | Who can read your data? |
|---|---|---|---|
| stdio (default) | Read from environment variables (no credential file written) | Local SQLite at ~/.mnemo-mcp/memories.db |
Only your OS user |
| HTTP self-host (single user) | Encrypted config.enc under ~/.mnemo-mcp/ |
Local SQLite (same host) | Only you (admin = user) |
HTTP multi-user remote (PUBLIC_URL) |
Per-JWT-sub store at subs/<sub>/config.json |
Per-sub isolated rows |
Only the authenticated user (per-sub isolation) |
Passport sync bundles are always end-to-end encrypted (AES-256-GCM + Argon2id); backends never see plaintext.
License
MIT -- See LICENSE.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。