MCP 服务器

Mnemozine

Mnemozine is a self-hosted unified conversational memory layer that ingests conversations from AI tools, distills them into a temporal knowledge graph, and serves that memory to agents via a single MCP server.

README

Mnemozine

A self-hosted unified conversational memory layer. Mnemozine ingests conversations from every AI tool the operator uses (Claude Code, OpenAI-format agents, Hermes), distills them into a temporal knowledge graph (Graphiti on FalkorDB), and serves that memory to every agent through a single MCP server — proactively at session start and on demand mid-session.

The defining constraint: it consolidates rather than accumulates — retrieval precision stays flat as the store grows, because retrieval is always scoped (current project + global preferences + entity neighborhood) instead of searching the whole graph.

See PRD.md for the full specification and INTERFACES.md for the shared Protocol contracts every module builds against.

What it is

Layer	What it does	Where
Ingestion	Normalize Claude Code JSONL transcripts, OpenAI-format gateway turns, and Hermes turns into one common event schema; strip `tool_calls`; chunk per session into Graphiti episodes; de-dup on `(source, session_id, content-hash)`.	`mnemozine/ingestion/`
Typed extraction	Classify each memory unit as `preference` / `project_fact` / `idea_seed`; extract entities + relationships; record confidence + provenance.	`mnemozine/extract/`
Storage	Graphiti temporal knowledge graph on FalkorDB (graph and vector embeddings in one store); validity windows; scopes (`global`, `project:<id>`); hot/archive tiers.	`mnemozine/storage/`
Retrieval & delivery	One MCP server exposing `recall()` plus session-start / mid-session index tools; scoped retrieval; ~500-token injection budget.	`mnemozine/retrieval/`
Cross-reference	Surface related `idea_seed`/project nodes via shared-entity graph traversal (vector fallback), with explainable reasons.	`mnemozine/crossref/`
Maintenance	Scheduled consolidate / entity-resolve / decay / audit; 4-way dedup-reinforce-supersede-noop write decision.	`mnemozine/maintenance/`
Evals	§9 eval harness + gold-set bootstrap + synthetic distractor generator.	`mnemozine/evals/`

Architecture

[ Conversation sources ]
  Claude Code (JSONL transcripts)   OpenAI-format agents   Hermes
            |                            |                    |
            |                  (LiteLLM gateway + capture callback)
            v                            v                    v
[ 1. Ingestion ]  -- normalize to the common event schema; strip tool_calls --
            |
            v
[ 2. Typed Extraction ]  -- classify preference / project_fact / idea_seed --
            |
            v
[ 3. Storage ]  -- Graphiti temporal KG on FalkorDB (graph + bge-m3 vectors) --
            |
            v
[ 4. Retrieval & Delivery ]  -- single MCP server + Claude Code hooks --
            |
            v
[ 5. Maintenance ]  -- dedup, consolidation, decay, entity resolution (scheduled) --

Stack (PRD §5.5, pinned in pyproject.toml):

Concern	Choice
Graph + vector backend	FalkorDB (single store; no Postgres)
Temporal KG engine	Graphiti — `graphiti-core[falkordb]==0.29.2` (exact pin)
Extraction LLM	Pluggable OpenAI-format `base_url`; default Qwen2.5 served by Ollama (LiteLLM-id `openai/qwen2.5` against Ollama's `/v1` OpenAI endpoint)
Embedding model	bge-m3 via Ollama, self-hosted (1024-d)
Application process	one all-in-one `mnemozine` entrypoint runs MCP + ingest + maintenance + web under one loop
OpenAI-format gateway	LiteLLM proxy + a custom logging callback (optional — the `gateway` compose profile, for capturing OpenAI/Hermes agents)
MCP server	official `mcp` SDK (`FastMCP`)
Maintenance scheduler	APScheduler (or a k8s `CronJob`)
Language / packaging	Python ≥3.11, hatchling, `pydantic-settings` config

The whole system runs end-to-end on local models with no cloud dependency. By default Ollama serves both the bge-m3 embeddings and the Qwen2.5 extraction model, so the default stack is just 3 services (FalkorDB + Ollama + the all-in-one mnemozine); the LiteLLM gateway and a dedicated llama.cpp Qwen server are optional (compose profiles gateway / qwen-llamacpp). The extraction/embedding endpoints are pluggable, so the extraction LLM MAY point at a cloud model later on cost grounds — a one-line base_url/model swap.

Console scripts

Installed by the package (pyproject.toml [project.scripts]):

Script	Purpose
`mnemozine`	the all-in-one process — builds the container once and runs every enabled component (MCP + ingest + maintenance + web) concurrently under one asyncio loop
`mnemozine-mcp`	the single MCP server, standalone (FR-RET-1)
`mnemozine-ingest`	source → chunk → extract → store loop, standalone (FR-ING-*)
`mnemozine-maintenance`	scheduled consolidate/resolve/decay/audit, standalone (FR-MNT-*)
`mnemozine-web`	the WebUI operator console, standalone
`mnemozine-eval`	§9 eval harness + gold-set bootstrap
`mnemozine-hook-session-start`	Claude Code `SessionStart` hook (FR-RET-3)
`mnemozine-hook-user-prompt-submit`	Claude Code `UserPromptSubmit` hook (FR-RET-5)
`mnemozine-hook-stop`	Claude Code `Stop` hook — flush session (FR-ING-6)
`mnemozine-hook-pre-compact`	Claude Code `PreCompact` hook — flush before compaction (FR-ING-6)

All service workloads share one container image and differ only in the command they run.

The all-in-one `mnemozine` entrypoint and the component toggles

mnemozine (= mnemozine.app:run_all) builds the Container once and runs every enabled component concurrently under a single asyncio loop, with graceful shutdown on SIGINT/SIGTERM (so compose's default SIGTERM/stop_signal just works). This is what collapses the stack to ~3 containers — the mnemozine app, FalkorDB, and Ollama.

Four boolean toggles (prefix MNEMOZINE_, nested delimiter __) select which components run; all default true and are listed in .env.example:

Variable	Default	Component
`MNEMOZINE_RUN__MCP`	`true`	the MCP server (the `recall` tool + index tools)
`MNEMOZINE_RUN__INGEST`	`true`	the ingest loop (source → chunk → extract → store)
`MNEMOZINE_RUN__MAINTENANCE`	`true`	the maintenance scheduler
`MNEMOZINE_RUN__WEB`	`true`	the FastAPI WebUI / `/api`

A disabled component is never created, so mnemozine is a no-op-safe superset of every standalone script — running it with only one toggle on is exactly equivalent to that component's standalone script (e.g. only RUN__INGEST=true == mnemozine-ingest). Use a standalone script to split one component onto another machine (see Split deployment).

Web + MCP share one port. When MNEMOZINE_RUN__WEB=true and MNEMOZINE_RUN__MCP=true, mnemozine serves the WebUI and the MCP streamable-http transport from a single port — MNEMOZINE_WEB__PORT (default 8765) — by mounting the MCP ASGI app at path /mcp on the web app. So a networked MCP client connects to http://<host>:8765/mcp (streamable-http) and the WebUI/API is at http://<host>:8765/ (API under /api). This resolves the historical web/MCP 8765 clash — the all-in-one default exposes only 8765.

The MCP StreamableHTTP session manager runs under the FastAPI app's lifespan, so uvicorn runs with the lifespan enabled (run_all already does this — nothing to configure).

Fallback (MCP standalone): if RUN__WEB=false but RUN__MCP=true, the MCP server runs standalone on MNEMOZINE_MCP_HOST / MNEMOZINE_MCP_PORT (default 127.0.0.1:8765) at path /mcp — in that case expose MNEMOZINE_MCP_PORT instead of MNEMOZINE_WEB__PORT.

Binds. MNEMOZINE_WEB__HOST and MNEMOZINE_MCP_HOST both default to 127.0.0.1. In a container you typically must set MNEMOZINE_WEB__HOST=0.0.0.0 (and MNEMOZINE_MCP_HOST=0.0.0.0 for the standalone-MCP case) so the port is reachable from outside the container. The WebUI is local-operator only — front it with auth or keep it on a private network, and set MNEMOZINE_WEB__TOKEN to gate /api.

Setup

There are two supported deployment paths, sharing one image definition (deploy/Dockerfile):

docker-compose — local dev / running the eval harness without a cluster.
Helm chart — homelab Kubernetes.

Both are documented in detail in deploy/README.md; the essentials are below.

Path A — bare-metal dev (Python only)

python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env       # then edit endpoints/keys
python -c "import mnemozine; print(mnemozine.__version__)"
pytest

This installs the console scripts but assumes you supply FalkorDB, Ollama (bge-m3), and a Qwen/OpenAI-format endpoint yourself (the .env defaults point at localhost). For a turnkey stack, use docker-compose.

Path B — docker-compose (local full stack + eval)

# from the repo root
cp .env.example .env                                   # edit endpoints/keys if needed
docker compose -f deploy/docker-compose.yml up -d --build

The default stack is 3 services, because the all-in-one mnemozine container runs the MCP + ingest + maintenance + web components together (see the component toggles):

Service	Purpose
`falkordb`	single graph + vector store, persisted to named volume `falkordb-data` (`/data`)
`ollama` (+ `ollama-init`)	serves both the bge-m3 embeddings and the qwen extraction model; `ollama-init` pulls them into `ollama-data` on first `up`
`mnemozine`	the all-in-one app — MCP + ingest + maintenance + WebUI, published on `:8765` (WebUI at `/`, `/api`; MCP at `/mcp`); mounts `~/.claude` read-only at `/claude` for the Claude Code watcher

Extraction runs on Ollama alongside embeddings, so neither a separate qwen container nor LiteLLM is needed by default. The recommended extraction env (use verbatim) points the LiteLLM-format client at Ollama's OpenAI-compatible endpoint:

MNEMOZINE_EXTRACTION__BASE_URL=http://ollama:11434/v1   # the /v1 suffix is REQUIRED (Ollama's OpenAI-compatible endpoint)
MNEMOZINE_EXTRACTION__MODEL=openai/qwen2.5             # LiteLLM provider/model form; "qwen2.5" is the Ollama tag (any pulled qwen tag works, e.g. openai/qwen2.5:7b)
MNEMOZINE_EXTRACTION__API_KEY=not-needed
# embeddings stay on Ollama too:
MNEMOZINE_EMBEDDING__BASE_URL=http://ollama:11434
MNEMOZINE_EMBEDDING__MODEL=bge-m3
MNEMOZINE_EMBEDDING__DIMENSIONS=1024

The extraction model id is a LiteLLM id. Against Ollama's OpenAI-compatible /v1 endpoint it must be prefixed openai/ (the config default openai/qwen2.5 already is). The openai/ provider treats /v1 as a plain OpenAI server; do not use the ollama/ provider here — it speaks Ollama's native /api/* surface and would 404 against the /v1 base_url. (To talk to the native Ollama API instead, drop the /v1 suffix and use ollama/qwen2.5.)

Optional profiles

Two backends are kept off the default up and enabled via compose profiles when you want them:

Profile	Brings up	When
`gateway`	`litellm` (OpenAI-format gateway + logging callback, on `:4000`)	to capture OpenAI-format / Hermes agents (FR-ING-3/4) — see Pointing OpenAI-format agents and Hermes at the gateway
`qwen-llamacpp`	`qwen` (a llama.cpp OpenAI-format server, weights in `qwen-models`)	to run extraction on a dedicated llama.cpp server instead of on Ollama

# default 3-service stack:
docker compose -f deploy/docker-compose.yml up -d --build
# add the LiteLLM gateway:
docker compose -f deploy/docker-compose.yml --profile gateway up -d
# run extraction on a dedicated llama.cpp qwen server instead of Ollama:
docker compose -f deploy/docker-compose.yml --profile qwen-llamacpp up -d

Inter-service URLs are set under each service's environment: (which overrides env_file in Compose), so containers reach each other by service name (redis://falkordb:6379, http://ollama:11434, http://ollama:11434/v1 for extraction) even though .env ships localhost defaults for bare-metal dev. Override any of them with the MZ_COMPOSE_* interpolation vars, e.g.:

MZ_COMPOSE_EXTRACTION_URL=https://api.openai.com/v1 \
MZ_COMPOSE_EXTRACTION_MODEL=openai/gpt-4o-mini \
MZ_COMPOSE_EXTRACTION_API_KEY=sk-... \
docker compose -f deploy/docker-compose.yml up -d

Local Qwen on llama.cpp (qwen-llamacpp profile). The qwen service runs a llama.cpp OpenAI-compatible server; drop a GGUF into the qwen-models volume (or bind-mount one) and set QWEN_MODEL to its filename (default qwen2.5-7b-instruct-q4_k_m.gguf). To use a cloud extraction endpoint instead, point the extraction URL at it (above); the qwen service stays off.

Claude Code transcripts. The mnemozine app mounts the host Claude Code config dir read-only for the ingest component. Override the host path with HOST_CLAUDE_CONFIG_DIR (defaults to $HOME/.claude).

WebUI + MCP on one port. With the all-in-one default (RUN__WEB and RUN__MCP both true), the published :8765 serves the WebUI at / (API under /api) and the MCP streamable-http transport at /mcp — there is no separate mnemozine-web to start and no port clash to manage. The container sets MNEMOZINE_WEB__HOST=0.0.0.0 so the port is reachable from the host; gate /api with MNEMOZINE_WEB__TOKEN and keep the console on a private network (it is a local-operator surface). To turn any component off in compose, set its MNEMOZINE_RUN__* toggle to false.

Path B′ — frontend dev loop (Vite)

When you are iterating on the WebUI itself, run the FastAPI backend (mnemozine-web) on :8765 and the Vite dev server with hot-reload from web/:

cd web
npm install
npm run dev          # serves the SPA on :5173, proxies /api → http://127.0.0.1:8765

Point the dev server at a remote backend by overriding the proxy target:

MNEMOZINE_API_TARGET=http://my-backend:8765 npm run dev

Build the production bundle (emitted into mnemozine/web/static, where mnemozine-web serves it from) with:

npm run build

Path C — Helm (homelab k8s)

helm lint deploy/helm/mnemozine
helm install mz deploy/helm/mnemozine -n mnemozine --create-namespace
# render without installing:
helm template mz deploy/helm/mnemozine

Rendered objects:

FalkorDB — StatefulSet + headless Service + volumeClaimTemplate (graph + vector persistence at /data).
Ollama / Qwen / LiteLLM — Deployment + Service (+ PVCs for model storage). Ollama pulls bge-m3 via an init container on first start.
mcp / ingest / maintenance — Deployments from the shared image. Maintenance can render as a k8s CronJob instead (maintenance.asCronJob=true).
ConfigMap — all non-secret MNEMOZINE_* env, including every §6.6 tuning param from .Values.tuning; mounted into every workload via envFrom.
Secret — FalkorDB password + extraction API key (+ extraSecrets).

When a bundled dependency is enabled, its in-cluster Service DNS is wired automatically. To use something you run elsewhere, set <dep>.enabled=false and the matching endpoints.external.*:

helm install mz deploy/helm/mnemozine \
  --set falkordb.enabled=false --set endpoints.external.falkordbUrl=redis://my-falkor:6379 \
  --set ollama.enabled=false   --set endpoints.external.ollamaBaseUrl=http://my-ollama:11434 \
  --set litellm.enabled=false  --set qwen.enabled=false \
  --set endpoints.external.extractionBaseUrl=https://api.openai.com/v1 \
  --set extraSecrets.MNEMOZINE_EXTRACTION__API_KEY=sk-...

Reach the MCP server in-cluster at http://<release>-mcp.<namespace>.svc:8765, or port-forward it:

kubectl -n mnemozine port-forward svc/mz-mnemozine-mcp 8765:8765

Split deployment — running ingest on the main PC

The common operator scenario: keep the homelab running the always-on memory layer (FalkorDB + Ollama + MCP/web/maintenance), but run ingest on your main PC so the Claude Code watcher and the in-process gateway/Hermes callbacks live where your transcripts and agents actually run. Because the all-in-one mnemozine with only RUN__INGEST=true is exactly equivalent to the standalone mnemozine-ingest script (the same _run_ingest), splitting is just two opposite toggle sets pointed at the same FalkorDB.

The two halves:

Homelab — everything except ingest. Run the consolidated stack with the ingest component disabled:
- docker-compose: set MNEMOZINE_RUN__INGEST=false (the mnemozine container then serves MCP + web + maintenance only).
- Helm: --set ingest.enabled=false (the homelab renders the MCP / web / maintenance workloads but not the ingest one).
The homelab FalkorDB must be network-reachable from the main PC — bind it on the LAN (compose publishes :6379; in k8s expose it via a Service / NodePort / port-forward) so the remote ingester can write to it.

Main PC — ingest only, pointed at the homelab. Run the ingest half via deploy/docker-compose.ingest.yml (or the mnemozine-ingest console script in a venv). Disable the other three components and point the three remote endpoints at the homelab box:

# the ingest-only env (exactly equivalent to `mnemozine-ingest`):
MNEMOZINE_RUN__INGEST=true
MNEMOZINE_RUN__MCP=false
MNEMOZINE_RUN__MAINTENANCE=false
MNEMOZINE_RUN__WEB=false

# point at the homelab's FalkorDB + Ollama (embeddings) + extraction endpoint:
MNEMOZINE_FALKORDB__URL=redis://<homelab-host>:6379
MNEMOZINE_EMBEDDING__BASE_URL=http://<ollama-host>:11434
MNEMOZINE_EXTRACTION__BASE_URL=http://<extraction-host>/v1

# main PC, with deploy/docker-compose.ingest.yml:
docker compose -f deploy/docker-compose.ingest.yml up -d --build

# …or in a venv (Path A) — the standalone script is identical:
mnemozine-ingest

When the homelab serves extraction on Ollama (the default), set MNEMOZINE_EXTRACTION__BASE_URL=http://<ollama-host>:11434/v1 and MNEMOZINE_EXTRACTION__MODEL=openai/qwen2.5 (the /v1 suffix and openai/ prefix are required — the ollama/ provider would 404 on /v1; see Path B). The main-PC ingester still mounts your local ~/.claude read-only so the watcher tails your real transcripts.

The memory written by the remote ingester flows straight into the same FalkorDB the homelab's MCP server reads from, so recall on every agent sees it — the single-store invariant (Same store, both ways) holds across machines.

Configuration (environment variables)

All runtime configuration lives in mnemozine/config.py (a pydantic-settings Settings) and is overridable via environment variables — prefix MNEMOZINE_, nested delimiter __. The full, authoritative list is .env.example. Nothing is a hard-coded constant; in particular the §6.6 tuning parameters are config so they can be calibrated against the eval set. Setting get_settings() is cached process-wide.

FalkorDB connection (FR-STO-2)

Variable	Default	Meaning
`MNEMOZINE_FALKORDB__URL`	`redis://localhost:6379`	FalkorDB (Redis protocol) connection URL
`MNEMOZINE_FALKORDB__GRAPH_NAME`	`mnemozine`	Graphiti graph/keyspace name
`MNEMOZINE_FALKORDB__PASSWORD`	(unset)	optional FalkorDB/Redis password

Extraction LLM — pluggable OpenAI-format `base_url`, default local Qwen (§5.5)

Variable	Default	Meaning
`MNEMOZINE_EXTRACTION__BASE_URL`	`http://localhost:8000/v1`	OpenAI-format base URL (local Qwen by default; swap to a cloud `/v1` to use cloud)
`MNEMOZINE_EXTRACTION__MODEL`	`openai/qwen2.5`	LiteLLM `provider/model` id
`MNEMOZINE_EXTRACTION__API_KEY`	`not-needed`	API key (local servers ignore it)
`MNEMOZINE_EXTRACTION__TEMPERATURE`	`0.0`	extraction wants determinism
`MNEMOZINE_EXTRACTION__TIMEOUT_S`	`120`	per-request timeout (s)

Embedding endpoint — bge-m3 via Ollama (OQ3)

Variable	Default	Meaning
`MNEMOZINE_EMBEDDING__BASE_URL`	`http://localhost:11434`	Ollama base URL
`MNEMOZINE_EMBEDDING__MODEL`	`bge-m3`	Ollama embedding model
`MNEMOZINE_EMBEDDING__DIMENSIONS`	`1024`	vector dimensionality (bge-m3 is 1024-d)
`MNEMOZINE_EMBEDDING__TIMEOUT_S`	`60`	per-request timeout (s)

Claude Code ingestion — `CLAUDE_CONFIG_DIR` / `cleanupPeriodDays` (FR-ING-2/R4)

Variable	Default	Meaning
`MNEMOZINE_INGEST__CLAUDE_CONFIG_DIR`	`~/.claude`	root of Claude Code config/transcripts (the `CLAUDE_CONFIG_DIR` override)
`MNEMOZINE_INGEST__CLEANUP_PERIOD_DAYS`	`30`	Claude Code's local-transcript retention (`cleanupPeriodDays`) before cleanup
`MNEMOZINE_INGEST__STRIP_TOOL_CALLS`	`true`	strip `tool_calls`/tool results on ingest (FR-ING-7)
`MNEMOZINE_INGEST__CHUNK_MAX_CHARS`	`8000`	§6.6 `chunk.max_size` (chars) per episode
`MNEMOZINE_INGEST__CHUNK_MAX_MESSAGES`	`40`	§6.6 `chunk.max_size` (messages) per episode

Note on cleanupPeriodDays: Claude Code deletes local transcripts after cleanupPeriodDays (default 30). The ingester runs as a near-real-time watcher plus Stop/PreCompact hooks so nothing is lost before deletion; you may also raise Claude Code's own cleanupPeriodDays as a safety net. The mnemozine setting here records that retention window for the ingest layer.

MCP server (FR-RET-1)

Variable	Default	Meaning
`MNEMOZINE_MCP_HOST`	`127.0.0.1`	MCP standalone bind host (used only when `RUN__WEB=false`; compose/Helm set `0.0.0.0`)
`MNEMOZINE_MCP_PORT`	`8765`	MCP standalone bind port (when web+mcp share a port, MCP is at `/mcp` on `MNEMOZINE_WEB__PORT` instead)
`MNEMOZINE_LOG_LEVEL`	`INFO`	logging level

WebUI operator console

Variable	Default	Meaning
`MNEMOZINE_WEB__HOST`	`127.0.0.1`	WebUI bind host (set `0.0.0.0` in a container so the port is reachable)
`MNEMOZINE_WEB__PORT`	`8765`	WebUI bind port; also serves MCP at `/mcp` when web+mcp both run
`MNEMOZINE_WEB__TOKEN`	(unset)	optional static bearer token gating `/api`; unset = open API on the bound host

Component run toggles (the all-in-one `mnemozine`)

These select which components the mnemozine entrypoint runs; all default true. They are no-ops for the standalone single-component scripts. See the toggle reference for the web+mcp single-port behavior and the split-deployment use.

Variable	Default	Component
`MNEMOZINE_RUN__MCP`	`true`	the MCP server
`MNEMOZINE_RUN__INGEST`	`true`	the ingest loop
`MNEMOZINE_RUN__MAINTENANCE`	`true`	the maintenance scheduler
`MNEMOZINE_RUN__WEB`	`true`	the WebUI / `/api`

§6.6 tuning parameters (config, not constants)

These are deliberately calibrated against the eval set, not guessed. Initial values match the PRD's initial guesses.

Injection budget (FR-RET-3 / FR-RET-5)

Variable	Default	§6.6
`MNEMOZINE_INJECT__TOKEN_BUDGET`	`500`	`inject.token_budget` — hard cap; truncate, never overflow
`MNEMOZINE_INJECT__MAX_PREFERENCE_SNIPPETS`	`5`	max top-preference snippets in the index

Cross-reference engine (FR-RET-6)

Variable	Default	§6.6
`MNEMOZINE_CROSSREF__RELEVANCE_THRESHOLD`	`0.8`	`crossref.relevance_threshold` — start high (precision over recall)
`MNEMOZINE_CROSSREF__MAX_SUGGESTIONS`	`2`	`crossref.max_suggestions` (1–2)
`MNEMOZINE_CROSSREF__VECTOR_FALLBACK_THRESHOLD`	`0.75`	min cosine sim for the FR-RET-6 vector fallback (distinct from the surfacing threshold)

Maintenance / dedup / decay (FR-MNT-*)

Variable	Default	§6.6
`MNEMOZINE_MAINTENANCE__DEDUP_EQUIVALENCE_THRESHOLD`	`0.9`	`dedup.equivalence_threshold` — reinforce-vs-add
`MNEMOZINE_MAINTENANCE__EDGE_WEIGHT_FLOOR`	`0.1`	`maintenance.edge_weight_floor` — low-weight edge pruning
`MNEMOZINE_MAINTENANCE__MAX_NODE_DEGREE`	`64`	`maintenance.max_node_degree` — traversal-bound cap
`MNEMOZINE_MAINTENANCE__CONTRADICTION_CANDIDATE_CAP`	`5`	FR-MNT-1 supersede-LLM candidate cap
`MNEMOZINE_MAINTENANCE__DECAY_HALF_LIFE_DAYS`	`30`	`decay.half_life` (days)
`MNEMOZINE_MAINTENANCE__DECAY_ARCHIVE_AFTER_DAYS`	`90`	`decay.archive_after` — hot→archive demotion (days unused)
`MNEMOZINE_MAINTENANCE__CRON`	`0 3 * * *`	scheduled maintenance cadence (FR-MNT-5)

Retrieval (FR-RET-2)

Variable	Default	§6.6
`MNEMOZINE_RETRIEVAL__P95_LATENCY_TARGET_MS`	`500`	`retrieval.p95_latency_target` — baseline set in Phase 1
`MNEMOZINE_RETRIEVAL__TOP_K`	`10`	default results per scoped query
`MNEMOZINE_RETRIEVAL__NEIGHBORHOOD_HOPS`	`1`	FR-RET-2 entity-neighborhood traversal depth

In Helm these same knobs live under .Values.tuning (camelCase) and render into the ConfigMap, e.g.:

helm upgrade mz deploy/helm/mnemozine \
  --set tuning.crossref.relevanceThreshold=0.85 \
  --set tuning.inject.tokenBudget=400 \
  --set tuning.maintenance.cron='0 4 * * *'

Registering the Claude Code hooks

Claude Code invokes a hook as a subprocess, passing a JSON payload on stdin and reading the hook's response (JSON hookSpecificOutput) from stdout. The four hook entrypoints are installed as console scripts by the package:

Hook event	Script	Does
`SessionStart`	`mnemozine-hook-session-start`	inject the compact, ~500-token memory index (FR-RET-3)
`UserPromptSubmit`	`mnemozine-hook-user-prompt-submit`	inject finer-grained prompt-scoped memory mid-session (FR-RET-5)
`Stop`	`mnemozine-hook-stop`	flush the session's chunk into ingestion at session end (FR-ING-6)
`PreCompact`	`mnemozine-hook-pre-compact`	flush the chunk before compaction (FR-ING-6)

Register all four in Claude Code's settings.json hooks block. Each entry is a command-type hook; the four entrypoints read the hook JSON from stdin and take no command-line arguments, so the command is just the path to the installed console script (no flags). SessionStart / UserPromptSubmit / Stop / PreCompact are not tool-matched events, so no matcher is needed.

Drop this into ~/.claude/settings.json (user-global) or a project's .claude/settings.json. Use the absolute path to the installed scripts — i.e. the path that which mnemozine-hook-session-start prints inside the environment where you ran pip install -e . (typically …/.venv/bin/…):

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          { "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-session-start" }
        ]
      }
    ],
    "UserPromptSubmit": [
      {
        "hooks": [
          { "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-user-prompt-submit" }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          { "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-stop" }
        ]
      }
    ],
    "PreCompact": [
      {
        "hooks": [
          { "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-pre-compact" }
        ]
      }
    ]
  }
}

If the scripts are on PATH for the shell Claude Code spawns hooks in, you may use the bare names ("command": "mnemozine-hook-session-start"), but an absolute path is the robust default since the hook subprocess does not inherit your interactive shell's activated venv. Resolve the four absolute paths at once with:

for h in session-start user-prompt-submit stop pre-compact; do
  command -v "mnemozine-hook-$h"
done

Notes:

The hooks are fail-safe: an empty/invalid payload, an unwired backend, or any internal error yields an empty injection (or no-op flush) rather than raising — a hook must never break the session.
Injected memory is wrapped in <mnemozine-memory>…</mnemozine-memory> delimiters so the model treats it as advisory background, and is truncated to inject.token_budget (~500 tokens).
The hooks call into the same wired retriever + ingest service the mnemozine-ingest process owns; running that daemon installs the loader the hooks use. The Stop/PreCompact flush is idempotent — flushing a session the watcher already tailed is a no-op (de-dup on the FR-ING-5 content hash).

Registering the MCP server (the `recall` tool)

The hooks give Claude Code memory proactively (session start + prompt submit). To let the model also pull memory on demand mid-session, register the same mnemozine-mcp server with Claude Code. It exposes recall(query, scope=None, top_k=10) plus the two index tools.

For a local Claude Code, run the MCP server over stdio (it speaks stdio by default). Add it with the CLI:

claude mcp add --transport stdio mnemozine -- mnemozine-mcp

…or declare it by hand in ~/.claude.json (user scope) / .mcp.json (project scope). Use the absolute path to the installed script and point it at the same FalkorDB the hooks write to:

{
  "mcpServers": {
    "mnemozine": {
      "command": "/abs/path/to/.venv/bin/mnemozine-mcp",
      "args": [],
      "env": {
        "MNEMOZINE_FALKORDB__URL": "redis://localhost:6379"
      }
    }
  }
}

If you are instead running the server over the network, register it as an HTTP server instead of spawning a fresh stdio process. In the consolidated default the all-in-one mnemozine container already serves MCP over streamable-http at /mcp on the published :8765 (alongside the WebUI) — point Claude Code at that path:

claude mcp add --transport http mnemozine http://localhost:8765/mcp

(If you instead run the standalone mnemozine-mcp over the network, use mnemozine-mcp --transport streamable-http (or sse) bound to MNEMOZINE_MCP_HOST / MNEMOZINE_MCP_PORT — its bundled command otherwise runs the default stdio transport, so add the flag to expose HTTP.)

Same store, both ways. The hooks and the MCP server must read the same MNEMOZINE_FALKORDB__URL — if hooks write to one FalkorDB and the MCP server reads from another, memory will not flow. Keep both pulling the URL from the same .env or environment.

Pointing OpenAI-format agents and Hermes at the gateway

Capture happens through the LiteLLM OpenAI-format gateway with a registered logging callback. The reference proxy config is mnemozine/ingestion/gateway/config.yaml (docker-compose uses deploy/litellm.config.yaml).

Phase-2, default-off. Both the gateway (FR-ING-3) and Hermes (FR-ING-4) sources are off by default — a fresh install only runs the Claude Code watcher. Turn them on with the MNEMOZINE_INGEST__ENABLE_* flags below; the ingest loop (build_ingest_sources() in mnemozine/ingestion/loop.py) reads them and fans every enabled source into one serialized consumer. The gateway callback uses an in-process asyncio.Queue, so it must run in the same process as mnemozine-ingest.

OpenAI-format agents (FR-ING-3)

Enable the gateway source on the mnemozine-ingest process:

MNEMOZINE_INGEST__ENABLE_GATEWAY=true
MNEMOZINE_INGEST__GATEWAY_DEFAULT_PROJECT=my-project   # fallback project
MNEMOZINE_INGEST__GATEWAY_QUEUE_MAX=10000              # in-process buffer

Run the gateway:

litellm --config mnemozine/ingestion/gateway/config.yaml --port 4000

(docker-compose / Helm run the litellm service for you.) The callback is registered in litellm_settings.callbacks as the dotted path mnemozine.ingestion.gateway.litellm_register.gateway_callback (LiteLLM resolves it by string lookup at runtime — the path must match exactly). The proxy's own upstream models come from the yaml (os.environ/MNEMOZINE_GATEWAY_QWEN_BASE_URL, …_QWEN_API_KEY):

model_list:
  - model_name: qwen
    litellm_params:
      model: openai/qwen2.5
      api_base: os.environ/MNEMOZINE_GATEWAY_QWEN_BASE_URL
      api_key: os.environ/MNEMOZINE_GATEWAY_QWEN_API_KEY
litellm_settings:
  callbacks: mnemozine.ingestion.gateway.litellm_register.gateway_callback

Point any operator-controlled, repointable OpenAI-format agent at the gateway by setting its OpenAI base_url to http://<gateway-host>:4000/v1 (port 4000 is the LiteLLM default; --port overrides it) and any api_key the proxy expects. Every completion that agent makes is then captured and emitted as common-schema events (source=openai), with tool_calls stripped (FR-ING-7).

To route a turn to a specific project/session, thread it through LiteLLM's metadata dict — there is no request-path routing otherwise:
```
metadata={"mnemozine_project": "my-project", "mnemozine_session_id": "sess-123"}
```
The callback resolves project from mnemozine_project → project → the configured default, and session_id from mnemozine_session_id → session_id → user → the LiteLLM call id.
The gateway's own upstream (the model it proxies to) is the local Qwen by default; swap to a cloud backend by editing the model_list api_base/ api_key (a single line) — capture still works.

Explicit non-capability (FR-ING-3): third-party apps that cannot be repointed at the gateway base_url (ChatGPT desktop, Cursor, …) are not captured by this path.

Hermes (FR-ING-4)

Hermes is the self-hosted Nous Research Hermes agent on a homelab VM. Two paths:

Preferred — direct instrumentation. Enable the Hermes source:
```
MNEMOZINE_INGEST__ENABLE_HERMES=true
MNEMOZINE_INGEST__HERMES_DEFAULT_PROJECT=hermes        # fallback project
MNEMOZINE_INGEST__HERMES_QUEUE_MAX=10000               # in-process buffer
```
Then instrument the VM to push each completed turn into the HermesAdapter (mnemozine.ingestion.hermes.HermesAdapter, an IngestSource), which normalizes Hermes-native payloads into the common schema (source=hermes), stripping tool_calls:
```
hermes.feed(payload)          # sync, returns the emitted IngestEvent list
await hermes.afeed(payload)   # async, awaits queue space
```
The adapter is field-name tolerant — conversation_id / session_id / id for the session, messages / turns for the turn list, content / text for text, timestamp / created_at for time. Recorded turns replay via backfill for the Phase-1 historical import.

Fallback — front it with a gateway. If direct instrumentation is impractical, enable the gateway source and run a second LiteLLM proxy whose upstream api_base is Hermes' OpenAI-compatible endpoint and whose callback references mnemozine.ingestion.gateway.litellm_register.hermes_gateway_callback (note: not gateway_callback — that stamps source=openai):

MNEMOZINE_INGEST__ENABLE_GATEWAY=true
MNEMOZINE_INGEST__HERMES_BASE_URL=https://hermes-agent.nousresearch.com/
MNEMOZINE_INGEST__HERMES_API_KEY=<api-key-if-needed>

model_list:
  - model_name: hermes
    litellm_params:
      model: openai/hermes
      api_base: https://hermes-agent.nousresearch.com/v1
      api_key: os.environ/MNEMOZINE_HERMES_API_KEY
litellm_settings:
  callbacks: mnemozine.ingestion.gateway.litellm_register.hermes_gateway_callback

The Hermes variant is sketched (commented) at the bottom of gateway/config.yaml.

Reading memory back

All agents — Claude Code and OpenAI/Hermes alike — read from the single MCP server (mnemozine-mcp). It exposes:

recall(query, scope=None, top_k=10) — on-demand consolidated recall (FR-RET-4). scope is optional: omit for current project + global, or pass global / project:<id> / a bare project id.
session_start_index(...) — the FR-RET-3 compact index as a tool (so non-hook agents can request it too).
mid_session_index(prompt, project=None) — the FR-RET-5 finer-grained index.

Transports: stdio (Claude Code local default) and streamable-http / sse (networked OpenAI/Hermes agents), selected with mnemozine-mcp --transport ....

Eval harness and bootstrapping the eval set

The §9 eval harness is the mnemozine-eval console script. It runs offline against a committed gold-set fixture and a packaged in-memory fake store, so it needs no FalkorDB/Ollama/Qwen.

mnemozine-eval run                  # every §9 metric once; exits non-zero on failure
mnemozine-eval run -x 10            # same, with a 10x distractor inflation
mnemozine-eval scaling              # headline: injection precision at 1x/10x/100x
mnemozine-eval show-gold            # summarize the gold set

scaling is the headline §9 assertion — that precision does not decline as the store is inflated with synthetic plausible-but-irrelevant distractors (--levels 1,10,100, --tolerance for allowed drop). It exits non-zero if precision declines.

Bootstrapping the eval set (operator task)

The eval set encodes the operator's own preferences across their own projects, so only the operator can label it (PRD §9 — this is an operator deliverable, ≈40 cases, ~2–3 hrs). Two-step flow:

# 1. Auto-propose extracted candidates and write a Markdown review sheet.
mnemozine-eval bootstrap-propose --out eval_review.md

# 2. Edit eval_review.md by hand: tick "- [x] keep" on candidates to keep,
#    optionally correcting the proposed type/scope (human-in-the-loop, R1).

# 3. Fold the labeled sheet into a committed gold set.
mnemozine-eval bootstrap-finish --in eval_review.md --out mnemozine/evals/fixtures/gold_set.json

bootstrap-finish reads the ticked candidates back, builds a GoldSet (seed memories + classifier cases), and writes it to the gold-set JSON (default the committed fixture at mnemozine/evals/fixtures/gold_set.json). Commit that file and run mnemozine-eval run on every change and on a schedule.

The offline bootstrap-propose uses a tiny demo backlog so the command is exercisable out of the box; the integration pass can point it at the real IngestSource.backfill + Extractor to propose from your actual historical import.

Operations

Maintenance schedule (FR-MNT-5)

Maintenance is a separate, idempotent, repeatable pass (consolidate → resolve entities → decay/archive → audit, in that order):

mnemozine-maintenance run      # run the full pass once and exit
mnemozine-maintenance serve    # run on the configured cron until interrupted

The cron cadence is MNEMOZINE_MAINTENANCE__CRON (default 0 3 * * *); the serve mode uses APScheduler.
In docker-compose the all-in-one mnemozine container runs the maintenance scheduler continuously as its MNEMOZINE_RUN__MAINTENANCE component (set the toggle false to disable it, e.g. on a remote ingest-only node).
In Helm it is a long-lived Deployment by default; set maintenance.asCronJob=true to render a Kubernetes CronJob (schedule from maintenance.cronSchedule, defaulting to tuning.maintenance.cron).
Each job is isolated — a failure in one is recorded as a note but does not abort the rest of the pass.
Demotion to the archive tier is governed by decay.archive_after (DECAY_ARCHIVE_AFTER_DAYS, default 90 days unused); the system archives, never hard-deletes by default.

Backing up the FalkorDB volume

FalkorDB is the single source of truth (graph and vectors). Its data lives at /data:

docker-compose — the named volume falkordb-data (mounted at /data).
Helm — the StatefulSet's data PVC (the volumeClaimTemplate, mounted at /data).

FalkorDB speaks the Redis protocol, so back up the on-disk RDB. Trigger a save then copy the dump out:

# docker-compose — trigger a save, then copy /data out of the falkordb container.
# (The named volume is <project>_falkordb-data; the project name defaults to the
#  compose file's directory, so `docker compose ... config --volumes` /
#  `docker inspect` resolve the exact volume name if you back it up by volume.)
docker compose -f deploy/docker-compose.yml exec falkordb redis-cli SAVE
docker compose -f deploy/docker-compose.yml cp falkordb:/data ./falkordb-backup-$(date +%F)

# kubernetes (StatefulSet pod <release>-mnemozine-falkordb-0, e.g. mz-mnemozine-falkordb-0)
kubectl -n mnemozine exec mz-mnemozine-falkordb-0 -- redis-cli SAVE
kubectl -n mnemozine cp mz-mnemozine-falkordb-0:/data ./falkordb-backup-$(date +%F)

If the FalkorDB password is set, pass -a "$MNEMOZINE_FALKORDB__PASSWORD" to redis-cli. Restore by stopping FalkorDB, replacing the contents of the volume / PVC with a backed-up /data, and restarting. Snapshotting the underlying volume (or PVC VolumeSnapshot) while FalkorDB is quiesced is an equivalent approach.

Superseded/decayed memories are kept (archive tier) rather than deleted, so the store grows slowly over time; size the FalkorDB volume (compose volume / Helm falkordb.persistence.size, default 10Gi) and Ollama/Qwen model volumes accordingly.

Health checks

The all-in-one mnemozine container exposes an HTTP surface on :8765 (the WebUI/API, with MCP mounted at /mcp) whenever RUN__WEB and/or RUN__MCP are on; compose/Helm probe it via TCP/HTTP.
An ingest-only or maintenance-only process (e.g. the split-deployment node, or a component-toggled standalone script) has no HTTP surface — liveness is "the watcher/scheduler process is still running" (pgrep).

Configuration reference

The single source of truth for config is mnemozine/config.py; the full env-var list (with the MNEMOZINE_ prefix and __ nesting) is .env.example. Deployment specifics — image overrides, Helm values.yaml knobs, the MZ_COMPOSE_* compose overrides — are in deploy/README.md.

Mnemozine

README

Mnemozine

What it is

Architecture

Console scripts

The all-in-one mnemozine entrypoint and the component toggles

Setup

Path A — bare-metal dev (Python only)

Path B — docker-compose (local full stack + eval)

Optional profiles

Path B′ — frontend dev loop (Vite)

Path C — Helm (homelab k8s)

Split deployment — running ingest on the main PC

Configuration (environment variables)

FalkorDB connection (FR-STO-2)

Extraction LLM — pluggable OpenAI-format base_url, default local Qwen (§5.5)

Embedding endpoint — bge-m3 via Ollama (OQ3)

Claude Code ingestion — CLAUDE_CONFIG_DIR / cleanupPeriodDays (FR-ING-2/R4)

MCP server (FR-RET-1)

WebUI operator console

Component run toggles (the all-in-one mnemozine)

§6.6 tuning parameters (config, not constants)

Registering the Claude Code hooks

Registering the MCP server (the recall tool)

Pointing OpenAI-format agents and Hermes at the gateway

OpenAI-format agents (FR-ING-3)

Hermes (FR-ING-4)

Reading memory back

Eval harness and bootstrapping the eval set

Bootstrapping the eval set (operator task)

Operations

Maintenance schedule (FR-MNT-5)

Backing up the FalkorDB volume

Health checks

Configuration reference

推荐服务器

The all-in-one `mnemozine` entrypoint and the component toggles

Extraction LLM — pluggable OpenAI-format `base_url`, default local Qwen (§5.5)

Claude Code ingestion — `CLAUDE_CONFIG_DIR` / `cleanupPeriodDays` (FR-ING-2/R4)

Component run toggles (the all-in-one `mnemozine`)

Registering the MCP server (the `recall` tool)