memory-mcp
A self-organizing, persistent semantic memory layer that enables AI agents to store, categorize, and retrieve information using hybrid vector and keyword search. It features autonomous chunking, deduplication, and hierarchical taxonomy management through a PostgreSQL-backed MCP server.
README
memory-mcp
Persistent, self-organizing semantic memory for AI agents — served as an MCP server.
What is this?
memory-mcp is a Model Context Protocol server that gives AI agents durable, searchable memory backed by PostgreSQL and pgvector. Drop it into any MCP-compatible client (Claude Code, Cursor, Windsurf, etc.) and your agent gains the ability to remember, retrieve, and reason over information across sessions — without you managing any schema or storage logic.
What it does autonomously:
- Chunks and embeds incoming text
- Categorizes memories into a hierarchical taxonomy (
ltreedot-paths) - Deduplicates against existing memories and resolves conflicts
- Synthesizes a System Primer — a compressed, always-current summary of everything it knows — and surfaces it at session start
- Expires stale memories via TTL and prompts for verification of aging facts
Why memory-mcp?
| memory-mcp | Simple vector DB | LangChain / LlamaIndex memory | |
|---|---|---|---|
| Schema management | Automatic | Manual | Manual |
| Deduplication | Semantic + LLM | None | None |
| Taxonomy | Auto-assigned ltree | None | None |
| Session bootstrap | System Primer | Manual RAG | Manual |
| Conflict resolution | LLM-evaluated | None | None |
| Ephemeral context | Built-in (TTL store) | No | No |
| Self-hostable | Yes (Docker) | Varies | No |
| MCP-native | Yes | No | No |
Architecture
AI Agent (Claude Code / Cursor / Windsurf)
│ HTTP (MCP — Streamable HTTP)
▼
┌──────────────────────────────────────────┐
│ server.py │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Production MCP │ │ Admin MCP │ │
│ │ :8766/mcp │ │ :8767/mcp │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ tools/ │ │
│ ┌────────▼──────────────────▼────────┐ │
│ │ ingestion · search · context │ │
│ │ crud · admin_tools · context_store│ │
│ └────────────────┬───────────────────┘ │
│ │ │
│ ┌────────────────▼───────────────────┐ │
│ │ Background Workers │ │
│ │ Ingestion Queue · TTL Daemon │ │
│ │ System Primer Auto-Regeneration │ │
│ └────────────────┬───────────────────┘ │
└───────────────────┼──────────────────────┘
│ asyncpg
▼
PostgreSQL + pgvector
┌─────────────────┐
│ memories │ chunks, embeddings, ltree paths
│ memory_edges │ sequence_next, relates_to, supersedes
│ ingestion_staging│ async job queue
│ context_store │ ephemeral TTL store
└─────────────────┘
│
┌──────────▼──────────┐
│ Backup Service │ pg_dump → private GitHub repo
└─────────────────────┘
Two servers, one process:
- Production (
:8766) — tools safe for the agent to call freely - Admin (
:8767) — superset including destructive tools (delete, prune, bulk-move). Point your agent at production; use admin for maintenance.
Quickstart (Docker)
Prerequisites: Docker + Docker Compose, an OpenAI API key.
# 1. Clone
git clone https://github.com/isaacriehm/memory-mcp.git
cd memory-mcp
# 2. Configure
cp .env.example .env
$EDITOR .env # set OPENAI_API_KEY and DB_PASSWORD at minimum
# 3. Start
docker compose up -d
# Production MCP endpoint: http://localhost:8766/mcp
# Admin MCP endpoint: http://localhost:8767/mcp
To rebuild after code changes:
docker compose up -d --build memory-api
Connecting to an MCP Client
Claude Code
Add to your project's .claude/settings.json or ~/.claude/settings.json:
{
"mcpServers": {
"memory": {
"type": "http",
"url": "http://localhost:8766/mcp"
}
}
}
Or via the CLI:
claude mcp add memory --transport http http://localhost:8766/mcp
Then add this instruction to your CLAUDE.md so the agent always bootstraps memory at session start:
## Memory
At the start of every session, call `initialize_context` before anything else.
This returns your System Primer — your identity, current knowledge taxonomy, and retrieval guide.
Always consult it before answering questions about prior context.
Cursor / Windsurf
Add to your MCP settings (.cursor/mcp.json or equivalent):
{
"mcpServers": {
"memory": {
"url": "http://localhost:8766/mcp"
}
}
}
MCP Tools
Production Tools (:8766)
| Tool | Description |
|---|---|
initialize_context |
Call first every session. Returns the System Primer + verification prompts for aging memories. |
memorize_context |
Ingest raw text. Automatically chunks, embeds, categorizes, and deduplicates. Supports ttl_days. |
check_ingestion_status |
Poll async ingestion job by job_id. Returns pending, processing, complete, or failed. |
search_memory |
Hybrid vector + BM25 search with Reciprocal Rank Fusion. Filter by category_path. |
list_categories |
Return all occupied taxonomy paths with memory counts. |
explore_taxonomy |
Drill into a collapsed [+N more] branch from list_categories. |
fetch_document |
Reconstruct a full document by following sequence_next edges from a memory ID. |
trace_history |
Inspect the full supersession chain (oldest → newest) for a memory. |
confirm_memory_validity |
Confirm an aging memory is still accurate. Advances its verify_after date. |
update_memory |
Rewrite a memory's content in-place (preserves identity, edges, history). |
set_context |
Write a key/value pair to the ephemeral context store with a TTL. |
get_context |
Retrieve an ephemeral context entry by key. |
list_context_keys |
List active (non-expired) context keys, optionally filtered by scope. |
delete_context |
Explicitly delete a context entry before its TTL expires. |
extend_context_ttl |
Push a context entry's expiry forward by N hours. |
Admin-Only Tools (:8767)
| Tool | Description |
|---|---|
delete_memory |
Hard-delete a memory by ID (cascades edges). |
prune_history |
Batch-delete superseded memories older than N days. |
export_memories |
Export all active memories to JSON. |
recategorize_memory |
Move a single memory to a new taxonomy path. |
bulk_move_category |
Move an entire taxonomy branch (e.g. old.prefix → new.prefix). |
update_memory_metadata |
Patch a memory's metadata JSONB in-place. |
run_diagnostics |
Report on pool health, memory counts, ingestion queue depth. |
get_ingestion_stats |
Breakdown of ingestion job statuses. |
flush_staging |
Clear all completed/failed staging jobs immediately. |
Taxonomy
Memories are organized into a dot-path hierarchy using PostgreSQL ltree. The system assigns paths automatically during ingestion. You can override with recategorize_memory or bulk_move_category.
Example paths:
user.profile.personal
user.health.medical
projects.myapp.architecture
projects.myapp.decisions
organizations.acme.business
concepts.ai.behavior
reference.system.primer ← auto-generated System Primer lives here
Search is subtree-aware — passing category_path: "projects.myapp" returns everything under that branch.
System Primer
initialize_context returns a synthesized summary stored at reference.system.primer. It includes:
- A compressed user/agent profile
- The full taxonomy tree with memory counts
- Retrieval guidance
The primer auto-regenerates in the background when ≥10 new memories are ingested or when the previous primer is older than 1 hour. You can force regeneration via the admin tool synthesize_system_primer.
Environment Variables
Copy .env.example to .env and fill in your values.
Required
| Variable | Description |
|---|---|
DATABASE_URL |
PostgreSQL connection string (e.g. postgresql://user:pass@localhost:5432/memory) |
OPENAI_API_KEY |
OpenAI API key for embeddings and LLM calls |
DB_PASSWORD |
PostgreSQL password (used by Docker Compose) |
Optional — Models & Embeddings
| Variable | Default | Description |
|---|---|---|
EMBEDDING_MODEL |
text-embedding-3-small |
OpenAI embedding model |
EXTRACT_MODEL |
gpt-5-mini |
LLM for semantic section extraction and categorization |
CONFLICT_MODEL |
gpt-5-nano |
LLM for conflict/dedup evaluation |
EMBED_DIM |
1536 |
Embedding vector dimension (must match model) |
Optional — Search & Limits
| Variable | Default | Description |
|---|---|---|
DEFAULT_SEARCH_LIMIT |
10 |
Default result count for search_memory |
DEFAULT_LIST_LIMIT |
50 |
Default result count for list_categories |
DUP_THRESHOLD |
0.95 |
Cosine similarity threshold for deduplication |
CONFLICT_THRESHOLD |
0.55 |
Similarity threshold for conflict detection |
RELATES_TO_THRESHOLD |
0.65 |
Similarity threshold for relates_to edge creation |
MIN_SECTION_LENGTH |
100 |
Minimum character length for a chunk to be stored |
MAX_TAXONOMY_PATHS |
40 |
Max taxonomy paths assigned per ingestion |
Optional — OpenAI & Concurrency
| Variable | Default | Description |
|---|---|---|
OPENAI_TIMEOUT_S |
60 |
Per-request OpenAI timeout in seconds |
OPENAI_MAX_RETRIES |
5 |
Exponential-backoff retry limit |
MAX_CONCURRENT_API_CALLS |
5 |
Semaphore for parallel OpenAI requests |
EXTRACT_REASONING |
low |
Reasoning effort for extraction LLM |
CONFLICT_REASONING |
minimal |
Reasoning effort for conflict LLM |
Optional — Database
| Variable | Default | Description |
|---|---|---|
PG_POOL_MIN |
1 |
asyncpg minimum pool connections |
PG_POOL_MAX |
10 |
asyncpg maximum pool connections |
STAGING_RETENTION_DAYS |
7 |
Days to retain completed/failed staging jobs |
Optional — Server
| Variable | Default | Description |
|---|---|---|
PRODUCTION_PORT |
8766 |
Production MCP server port |
ADMIN_PORT |
8767 |
Admin MCP server port |
MCP_TRANSPORT |
streamable-http |
FastMCP transport mode |
FASTMCP_JSON_RESPONSE |
— | Set to 1 to force JSON responses |
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING |
Optional — System Primer
| Variable | Default | Description |
|---|---|---|
PRIMER_UPDATE_MAX_AGE_S |
3600 |
Max seconds before auto primer regeneration |
Optional — Context Store
| Variable | Default | Description |
|---|---|---|
CONTEXT_DEFAULT_TTL_HOURS |
24 |
Default TTL for context store entries |
CONTEXT_MAX_VALUE_LENGTH |
50000 |
Max character length for context values |
CONTEXT_MAX_KEY_LENGTH |
200 |
Max character length for context keys |
Optional — Backup Service
| Variable | Description |
|---|---|
GITHUB_PAT |
GitHub Personal Access Token with repo scope |
GITHUB_BACKUP_REPO |
Target repo in owner/repo format |
BACKUP_INTERVAL_SECONDS |
Seconds between backups (default: 21600 = 6 hours) |
Running Locally (Development)
Requirements: Python 3.11+, PostgreSQL with pgvector.
# Create and activate virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure
cp .env.example .env
$EDITOR .env
# Start the server
python -m server
# Production: http://0.0.0.0:8766
# Admin: http://0.0.0.0:8767
Backup Service
The backup/ directory contains a containerized PostgreSQL backup job that:
- Runs
pg_dumpon the configured interval (default: every 6 hours) - Commits the dump to a private GitHub repository
The backup service starts automatically with docker compose up. Set GITHUB_PAT and GITHUB_BACKUP_REPO in your .env to enable it. If those variables are unset, the service will error on startup — remove the memory-backup service from docker-compose.yml if you don't need backups.
CLI Scripts
Standalone scripts in scripts/ (require DATABASE_URL in environment):
# Export all memories to a timestamped JSON file
python scripts/export_memories.py
# Generate an interactive graph visualization
python scripts/visualize_memories.py
open memory_map.html
Contributing
See CONTRIBUTING.md.
License
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。
mcp-server-qdrant
这个仓库展示了如何为向量搜索引擎 Qdrant 创建一个 MCP (Managed Control Plane) 服务器的示例。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。