Mnemozine
Mnemozine is a self-hosted unified conversational memory layer that ingests conversations from AI tools, distills them into a temporal knowledge graph, and serves that memory to agents via a single MCP server.
README
Mnemozine
A self-hosted unified conversational memory layer. Mnemozine ingests conversations from every AI tool the operator uses (Claude Code, OpenAI-format agents, Hermes), distills them into a temporal knowledge graph (Graphiti on FalkorDB), and serves that memory to every agent through a single MCP server — proactively at session start and on demand mid-session.
The defining constraint: it consolidates rather than accumulates — retrieval precision stays flat as the store grows, because retrieval is always scoped (current project + global preferences + entity neighborhood) instead of searching the whole graph.
See PRD.md for the full specification and
INTERFACES.md for the shared Protocol contracts every module
builds against.
What it is
| Layer | What it does | Where |
|---|---|---|
| Ingestion | Normalize Claude Code JSONL transcripts, OpenAI-format gateway turns, and Hermes turns into one common event schema; strip tool_calls; chunk per session into Graphiti episodes; de-dup on (source, session_id, content-hash). |
mnemozine/ingestion/ |
| Typed extraction | Classify each memory unit as preference / project_fact / idea_seed; extract entities + relationships; record confidence + provenance. |
mnemozine/extract/ |
| Storage | Graphiti temporal knowledge graph on FalkorDB (graph and vector embeddings in one store); validity windows; scopes (global, project:<id>); hot/archive tiers. |
mnemozine/storage/ |
| Retrieval & delivery | One MCP server exposing recall() plus session-start / mid-session index tools; scoped retrieval; ~500-token injection budget. |
mnemozine/retrieval/ |
| Cross-reference | Surface related idea_seed/project nodes via shared-entity graph traversal (vector fallback), with explainable reasons. |
mnemozine/crossref/ |
| Maintenance | Scheduled consolidate / entity-resolve / decay / audit; 4-way dedup-reinforce-supersede-noop write decision. | mnemozine/maintenance/ |
| Evals | §9 eval harness + gold-set bootstrap + synthetic distractor generator. | mnemozine/evals/ |
Architecture
[ Conversation sources ]
Claude Code (JSONL transcripts) OpenAI-format agents Hermes
| | |
| (LiteLLM gateway + capture callback)
v v v
[ 1. Ingestion ] -- normalize to the common event schema; strip tool_calls --
|
v
[ 2. Typed Extraction ] -- classify preference / project_fact / idea_seed --
|
v
[ 3. Storage ] -- Graphiti temporal KG on FalkorDB (graph + bge-m3 vectors) --
|
v
[ 4. Retrieval & Delivery ] -- single MCP server + Claude Code hooks --
|
v
[ 5. Maintenance ] -- dedup, consolidation, decay, entity resolution (scheduled) --
Stack (PRD §5.5, pinned in pyproject.toml):
| Concern | Choice |
|---|---|
| Graph + vector backend | FalkorDB (single store; no Postgres) |
| Temporal KG engine | Graphiti — graphiti-core[falkordb]==0.29.2 (exact pin) |
| Extraction LLM | Pluggable OpenAI-format base_url; default Qwen2.5 served by Ollama (LiteLLM-id openai/qwen2.5 against Ollama's /v1 OpenAI endpoint) |
| Embedding model | bge-m3 via Ollama, self-hosted (1024-d) |
| Application process | one all-in-one mnemozine entrypoint runs MCP + ingest + maintenance + web under one loop |
| OpenAI-format gateway | LiteLLM proxy + a custom logging callback (optional — the gateway compose profile, for capturing OpenAI/Hermes agents) |
| MCP server | official mcp SDK (FastMCP) |
| Maintenance scheduler | APScheduler (or a k8s CronJob) |
| Language / packaging | Python ≥3.11, hatchling, pydantic-settings config |
The whole system runs end-to-end on local models with no cloud dependency.
By default Ollama serves both the bge-m3 embeddings and the Qwen2.5 extraction
model, so the default stack is just 3 services (FalkorDB + Ollama + the
all-in-one mnemozine); the LiteLLM gateway and a dedicated llama.cpp Qwen server
are optional (compose profiles gateway / qwen-llamacpp). The
extraction/embedding endpoints are pluggable, so the extraction LLM MAY point at a
cloud model later on cost grounds — a one-line base_url/model swap.
Console scripts
Installed by the package (pyproject.toml [project.scripts]):
| Script | Purpose |
|---|---|
mnemozine |
the all-in-one process — builds the container once and runs every enabled component (MCP + ingest + maintenance + web) concurrently under one asyncio loop |
mnemozine-mcp |
the single MCP server, standalone (FR-RET-1) |
mnemozine-ingest |
source → chunk → extract → store loop, standalone (FR-ING-*) |
mnemozine-maintenance |
scheduled consolidate/resolve/decay/audit, standalone (FR-MNT-*) |
mnemozine-web |
the WebUI operator console, standalone |
mnemozine-eval |
§9 eval harness + gold-set bootstrap |
mnemozine-hook-session-start |
Claude Code SessionStart hook (FR-RET-3) |
mnemozine-hook-user-prompt-submit |
Claude Code UserPromptSubmit hook (FR-RET-5) |
mnemozine-hook-stop |
Claude Code Stop hook — flush session (FR-ING-6) |
mnemozine-hook-pre-compact |
Claude Code PreCompact hook — flush before compaction (FR-ING-6) |
All service workloads share one container image and differ only in the command they run.
The all-in-one mnemozine entrypoint and the component toggles
mnemozine (= mnemozine.app:run_all) builds the Container once and runs
every enabled component concurrently under a single asyncio loop, with graceful
shutdown on SIGINT/SIGTERM (so compose's default SIGTERM/stop_signal just
works). This is what collapses the stack to ~3 containers — the mnemozine
app, FalkorDB, and Ollama.
Four boolean toggles (prefix MNEMOZINE_, nested delimiter __) select which
components run; all default true and are listed in
.env.example:
| Variable | Default | Component |
|---|---|---|
MNEMOZINE_RUN__MCP |
true |
the MCP server (the recall tool + index tools) |
MNEMOZINE_RUN__INGEST |
true |
the ingest loop (source → chunk → extract → store) |
MNEMOZINE_RUN__MAINTENANCE |
true |
the maintenance scheduler |
MNEMOZINE_RUN__WEB |
true |
the FastAPI WebUI / /api |
A disabled component is never created, so mnemozine is a no-op-safe superset of
every standalone script — running it with only one toggle on is exactly
equivalent to that component's standalone script (e.g. only RUN__INGEST=true
== mnemozine-ingest). Use a standalone script to split one component onto
another machine (see Split deployment).
Web + MCP share one port. When MNEMOZINE_RUN__WEB=true and
MNEMOZINE_RUN__MCP=true, mnemozine serves the WebUI and the MCP
streamable-http transport from a single port — MNEMOZINE_WEB__PORT (default
8765) — by mounting the MCP ASGI app at path /mcp on the web app. So a
networked MCP client connects to http://<host>:8765/mcp (streamable-http) and
the WebUI/API is at http://<host>:8765/ (API under /api). This resolves the
historical web/MCP 8765 clash — the all-in-one default exposes only 8765.
The MCP StreamableHTTP session manager runs under the FastAPI app's lifespan,
so uvicorn runs with the lifespan enabled (run_all already does this — nothing
to configure).
Fallback (MCP standalone): if
RUN__WEB=falsebutRUN__MCP=true, the MCP server runs standalone onMNEMOZINE_MCP_HOST/MNEMOZINE_MCP_PORT(default127.0.0.1:8765) at path/mcp— in that case exposeMNEMOZINE_MCP_PORTinstead ofMNEMOZINE_WEB__PORT.
Binds.
MNEMOZINE_WEB__HOSTandMNEMOZINE_MCP_HOSTboth default to127.0.0.1. In a container you typically must setMNEMOZINE_WEB__HOST=0.0.0.0(andMNEMOZINE_MCP_HOST=0.0.0.0for the standalone-MCP case) so the port is reachable from outside the container. The WebUI is local-operator only — front it with auth or keep it on a private network, and setMNEMOZINE_WEB__TOKENto gate/api.
Setup
There are two supported deployment paths, sharing one image definition
(deploy/Dockerfile):
- docker-compose — local dev / running the eval harness without a cluster.
- Helm chart — homelab Kubernetes.
Both are documented in detail in deploy/README.md; the
essentials are below.
Path A — bare-metal dev (Python only)
python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env # then edit endpoints/keys
python -c "import mnemozine; print(mnemozine.__version__)"
pytest
This installs the console scripts but assumes you supply FalkorDB, Ollama
(bge-m3), and a Qwen/OpenAI-format endpoint yourself (the .env defaults point
at localhost). For a turnkey stack, use docker-compose.
Path B — docker-compose (local full stack + eval)
# from the repo root
cp .env.example .env # edit endpoints/keys if needed
docker compose -f deploy/docker-compose.yml up -d --build
The default stack is 3 services, because the all-in-one mnemozine
container runs the MCP + ingest + maintenance + web components together (see
the component toggles):
| Service | Purpose |
|---|---|
falkordb |
single graph + vector store, persisted to named volume falkordb-data (/data) |
ollama (+ ollama-init) |
serves both the bge-m3 embeddings and the qwen extraction model; ollama-init pulls them into ollama-data on first up |
mnemozine |
the all-in-one app — MCP + ingest + maintenance + WebUI, published on :8765 (WebUI at /, /api; MCP at /mcp); mounts ~/.claude read-only at /claude for the Claude Code watcher |
Extraction runs on Ollama alongside embeddings, so neither a separate qwen
container nor LiteLLM is needed by default. The recommended extraction env (use
verbatim) points the LiteLLM-format client at Ollama's OpenAI-compatible endpoint:
MNEMOZINE_EXTRACTION__BASE_URL=http://ollama:11434/v1 # the /v1 suffix is REQUIRED (Ollama's OpenAI-compatible endpoint)
MNEMOZINE_EXTRACTION__MODEL=openai/qwen2.5 # LiteLLM provider/model form; "qwen2.5" is the Ollama tag (any pulled qwen tag works, e.g. openai/qwen2.5:7b)
MNEMOZINE_EXTRACTION__API_KEY=not-needed
# embeddings stay on Ollama too:
MNEMOZINE_EMBEDDING__BASE_URL=http://ollama:11434
MNEMOZINE_EMBEDDING__MODEL=bge-m3
MNEMOZINE_EMBEDDING__DIMENSIONS=1024
The extraction model id is a LiteLLM id. Against Ollama's OpenAI-compatible
/v1endpoint it must be prefixedopenai/(the config defaultopenai/qwen2.5already is). Theopenai/provider treats/v1as a plain OpenAI server; do not use theollama/provider here — it speaks Ollama's native/api/*surface and would 404 against the/v1base_url. (To talk to the native Ollama API instead, drop the/v1suffix and useollama/qwen2.5.)
Optional profiles
Two backends are kept off the default up and enabled via compose profiles when
you want them:
| Profile | Brings up | When |
|---|---|---|
gateway |
litellm (OpenAI-format gateway + logging callback, on :4000) |
to capture OpenAI-format / Hermes agents (FR-ING-3/4) — see Pointing OpenAI-format agents and Hermes at the gateway |
qwen-llamacpp |
qwen (a llama.cpp OpenAI-format server, weights in qwen-models) |
to run extraction on a dedicated llama.cpp server instead of on Ollama |
# default 3-service stack:
docker compose -f deploy/docker-compose.yml up -d --build
# add the LiteLLM gateway:
docker compose -f deploy/docker-compose.yml --profile gateway up -d
# run extraction on a dedicated llama.cpp qwen server instead of Ollama:
docker compose -f deploy/docker-compose.yml --profile qwen-llamacpp up -d
Inter-service URLs are set under each service's environment: (which overrides
env_file in Compose), so containers reach each other by service name
(redis://falkordb:6379, http://ollama:11434, http://ollama:11434/v1 for
extraction) even though .env ships localhost defaults for bare-metal dev.
Override any of them with the MZ_COMPOSE_* interpolation vars, e.g.:
MZ_COMPOSE_EXTRACTION_URL=https://api.openai.com/v1 \
MZ_COMPOSE_EXTRACTION_MODEL=openai/gpt-4o-mini \
MZ_COMPOSE_EXTRACTION_API_KEY=sk-... \
docker compose -f deploy/docker-compose.yml up -d
Local Qwen on llama.cpp (qwen-llamacpp profile). The qwen service runs a
llama.cpp OpenAI-compatible server; drop a GGUF into the qwen-models volume (or
bind-mount one) and set QWEN_MODEL to its filename (default
qwen2.5-7b-instruct-q4_k_m.gguf). To use a cloud extraction endpoint instead,
point the extraction URL at it (above); the qwen service stays off.
Claude Code transcripts. The mnemozine app mounts the host Claude Code
config dir read-only for the ingest component. Override the host path with
HOST_CLAUDE_CONFIG_DIR (defaults to $HOME/.claude).
WebUI + MCP on one port. With the all-in-one default (RUN__WEB and
RUN__MCP both true), the published :8765 serves the WebUI at / (API
under /api) and the MCP streamable-http transport at /mcp — there is no
separate mnemozine-web to start and no port clash to manage. The container sets
MNEMOZINE_WEB__HOST=0.0.0.0 so the port is reachable from the host; gate /api
with MNEMOZINE_WEB__TOKEN and keep the console on a private network (it is a
local-operator surface). To turn any component off in compose, set its
MNEMOZINE_RUN__* toggle to false.
Path B′ — frontend dev loop (Vite)
When you are iterating on the WebUI itself, run the FastAPI backend
(mnemozine-web) on :8765 and the Vite dev server with hot-reload from web/:
cd web
npm install
npm run dev # serves the SPA on :5173, proxies /api → http://127.0.0.1:8765
Point the dev server at a remote backend by overriding the proxy target:
MNEMOZINE_API_TARGET=http://my-backend:8765 npm run dev
Build the production bundle (emitted into mnemozine/web/static, where
mnemozine-web serves it from) with:
npm run build
Path C — Helm (homelab k8s)
helm lint deploy/helm/mnemozine
helm install mz deploy/helm/mnemozine -n mnemozine --create-namespace
# render without installing:
helm template mz deploy/helm/mnemozine
Rendered objects:
- FalkorDB —
StatefulSet+ headlessService+volumeClaimTemplate(graph + vector persistence at/data). - Ollama / Qwen / LiteLLM —
Deployment+Service(+ PVCs for model storage). Ollama pulls bge-m3 via an init container on first start. - mcp / ingest / maintenance —
Deployments from the shared image. Maintenance can render as a k8sCronJobinstead (maintenance.asCronJob=true). - ConfigMap — all non-secret
MNEMOZINE_*env, including every §6.6 tuning param from.Values.tuning; mounted into every workload viaenvFrom. - Secret — FalkorDB password + extraction API key (+
extraSecrets).
When a bundled dependency is enabled, its in-cluster Service DNS is wired
automatically. To use something you run elsewhere, set <dep>.enabled=false and
the matching endpoints.external.*:
helm install mz deploy/helm/mnemozine \
--set falkordb.enabled=false --set endpoints.external.falkordbUrl=redis://my-falkor:6379 \
--set ollama.enabled=false --set endpoints.external.ollamaBaseUrl=http://my-ollama:11434 \
--set litellm.enabled=false --set qwen.enabled=false \
--set endpoints.external.extractionBaseUrl=https://api.openai.com/v1 \
--set extraSecrets.MNEMOZINE_EXTRACTION__API_KEY=sk-...
Reach the MCP server in-cluster at
http://<release>-mcp.<namespace>.svc:8765, or port-forward it:
kubectl -n mnemozine port-forward svc/mz-mnemozine-mcp 8765:8765
Split deployment — running ingest on the main PC
The common operator scenario: keep the homelab running the always-on memory
layer (FalkorDB + Ollama + MCP/web/maintenance), but run ingest on your main
PC so the Claude Code watcher and the in-process gateway/Hermes callbacks live
where your transcripts and agents actually run. Because the all-in-one
mnemozine with only RUN__INGEST=true is exactly equivalent to the
standalone mnemozine-ingest script (the same _run_ingest), splitting is just
two opposite toggle sets pointed at the same FalkorDB.
The two halves:
-
Homelab — everything except ingest. Run the consolidated stack with the ingest component disabled:
- docker-compose: set
MNEMOZINE_RUN__INGEST=false(themnemozinecontainer then serves MCP + web + maintenance only). - Helm:
--set ingest.enabled=false(the homelab renders the MCP / web / maintenance workloads but not the ingest one).
The homelab FalkorDB must be network-reachable from the main PC — bind it on the LAN (compose publishes
:6379; in k8s expose it via aService/NodePort/ port-forward) so the remote ingester can write to it. - docker-compose: set
-
Main PC — ingest only, pointed at the homelab. Run the ingest half via
deploy/docker-compose.ingest.yml(or themnemozine-ingestconsole script in a venv). Disable the other three components and point the three remote endpoints at the homelab box:# the ingest-only env (exactly equivalent to `mnemozine-ingest`): MNEMOZINE_RUN__INGEST=true MNEMOZINE_RUN__MCP=false MNEMOZINE_RUN__MAINTENANCE=false MNEMOZINE_RUN__WEB=false # point at the homelab's FalkorDB + Ollama (embeddings) + extraction endpoint: MNEMOZINE_FALKORDB__URL=redis://<homelab-host>:6379 MNEMOZINE_EMBEDDING__BASE_URL=http://<ollama-host>:11434 MNEMOZINE_EXTRACTION__BASE_URL=http://<extraction-host>/v1# main PC, with deploy/docker-compose.ingest.yml: docker compose -f deploy/docker-compose.ingest.yml up -d --build # …or in a venv (Path A) — the standalone script is identical: mnemozine-ingestWhen the homelab serves extraction on Ollama (the default), set
MNEMOZINE_EXTRACTION__BASE_URL=http://<ollama-host>:11434/v1andMNEMOZINE_EXTRACTION__MODEL=openai/qwen2.5(the/v1suffix andopenai/prefix are required — theollama/provider would 404 on/v1; see Path B). The main-PC ingester still mounts your local~/.clauderead-only so the watcher tails your real transcripts.
The memory written by the remote ingester flows straight into the same FalkorDB the homelab's MCP server reads from, so recall on every agent sees it — the single-store invariant (Same store, both ways) holds across machines.
Configuration (environment variables)
All runtime configuration lives in mnemozine/config.py (a
pydantic-settings Settings) and is overridable via environment variables —
prefix MNEMOZINE_, nested delimiter __. The full, authoritative list is
.env.example. Nothing is a hard-coded constant; in particular
the §6.6 tuning parameters are config so they can be calibrated against the eval
set. Setting get_settings() is cached process-wide.
FalkorDB connection (FR-STO-2)
| Variable | Default | Meaning |
|---|---|---|
MNEMOZINE_FALKORDB__URL |
redis://localhost:6379 |
FalkorDB (Redis protocol) connection URL |
MNEMOZINE_FALKORDB__GRAPH_NAME |
mnemozine |
Graphiti graph/keyspace name |
MNEMOZINE_FALKORDB__PASSWORD |
(unset) | optional FalkorDB/Redis password |
Extraction LLM — pluggable OpenAI-format base_url, default local Qwen (§5.5)
| Variable | Default | Meaning |
|---|---|---|
MNEMOZINE_EXTRACTION__BASE_URL |
http://localhost:8000/v1 |
OpenAI-format base URL (local Qwen by default; swap to a cloud /v1 to use cloud) |
MNEMOZINE_EXTRACTION__MODEL |
openai/qwen2.5 |
LiteLLM provider/model id |
MNEMOZINE_EXTRACTION__API_KEY |
not-needed |
API key (local servers ignore it) |
MNEMOZINE_EXTRACTION__TEMPERATURE |
0.0 |
extraction wants determinism |
MNEMOZINE_EXTRACTION__TIMEOUT_S |
120 |
per-request timeout (s) |
Embedding endpoint — bge-m3 via Ollama (OQ3)
| Variable | Default | Meaning |
|---|---|---|
MNEMOZINE_EMBEDDING__BASE_URL |
http://localhost:11434 |
Ollama base URL |
MNEMOZINE_EMBEDDING__MODEL |
bge-m3 |
Ollama embedding model |
MNEMOZINE_EMBEDDING__DIMENSIONS |
1024 |
vector dimensionality (bge-m3 is 1024-d) |
MNEMOZINE_EMBEDDING__TIMEOUT_S |
60 |
per-request timeout (s) |
Claude Code ingestion — CLAUDE_CONFIG_DIR / cleanupPeriodDays (FR-ING-2/R4)
| Variable | Default | Meaning |
|---|---|---|
MNEMOZINE_INGEST__CLAUDE_CONFIG_DIR |
~/.claude |
root of Claude Code config/transcripts (the CLAUDE_CONFIG_DIR override) |
MNEMOZINE_INGEST__CLEANUP_PERIOD_DAYS |
30 |
Claude Code's local-transcript retention (cleanupPeriodDays) before cleanup |
MNEMOZINE_INGEST__STRIP_TOOL_CALLS |
true |
strip tool_calls/tool results on ingest (FR-ING-7) |
MNEMOZINE_INGEST__CHUNK_MAX_CHARS |
8000 |
§6.6 chunk.max_size (chars) per episode |
MNEMOZINE_INGEST__CHUNK_MAX_MESSAGES |
40 |
§6.6 chunk.max_size (messages) per episode |
Note on
cleanupPeriodDays: Claude Code deletes local transcripts aftercleanupPeriodDays(default 30). The ingester runs as a near-real-time watcher plusStop/PreCompacthooks so nothing is lost before deletion; you may also raise Claude Code's owncleanupPeriodDaysas a safety net. The mnemozine setting here records that retention window for the ingest layer.
MCP server (FR-RET-1)
| Variable | Default | Meaning |
|---|---|---|
MNEMOZINE_MCP_HOST |
127.0.0.1 |
MCP standalone bind host (used only when RUN__WEB=false; compose/Helm set 0.0.0.0) |
MNEMOZINE_MCP_PORT |
8765 |
MCP standalone bind port (when web+mcp share a port, MCP is at /mcp on MNEMOZINE_WEB__PORT instead) |
MNEMOZINE_LOG_LEVEL |
INFO |
logging level |
WebUI operator console
| Variable | Default | Meaning |
|---|---|---|
MNEMOZINE_WEB__HOST |
127.0.0.1 |
WebUI bind host (set 0.0.0.0 in a container so the port is reachable) |
MNEMOZINE_WEB__PORT |
8765 |
WebUI bind port; also serves MCP at /mcp when web+mcp both run |
MNEMOZINE_WEB__TOKEN |
(unset) | optional static bearer token gating /api; unset = open API on the bound host |
Component run toggles (the all-in-one mnemozine)
These select which components the mnemozine entrypoint runs; all default
true. They are no-ops for the standalone single-component scripts. See
the toggle reference
for the web+mcp single-port behavior and the split-deployment use.
| Variable | Default | Component |
|---|---|---|
MNEMOZINE_RUN__MCP |
true |
the MCP server |
MNEMOZINE_RUN__INGEST |
true |
the ingest loop |
MNEMOZINE_RUN__MAINTENANCE |
true |
the maintenance scheduler |
MNEMOZINE_RUN__WEB |
true |
the WebUI / /api |
§6.6 tuning parameters (config, not constants)
These are deliberately calibrated against the eval set, not guessed. Initial values match the PRD's initial guesses.
Injection budget (FR-RET-3 / FR-RET-5)
| Variable | Default | §6.6 |
|---|---|---|
MNEMOZINE_INJECT__TOKEN_BUDGET |
500 |
inject.token_budget — hard cap; truncate, never overflow |
MNEMOZINE_INJECT__MAX_PREFERENCE_SNIPPETS |
5 |
max top-preference snippets in the index |
Cross-reference engine (FR-RET-6)
| Variable | Default | §6.6 |
|---|---|---|
MNEMOZINE_CROSSREF__RELEVANCE_THRESHOLD |
0.8 |
crossref.relevance_threshold — start high (precision over recall) |
MNEMOZINE_CROSSREF__MAX_SUGGESTIONS |
2 |
crossref.max_suggestions (1–2) |
MNEMOZINE_CROSSREF__VECTOR_FALLBACK_THRESHOLD |
0.75 |
min cosine sim for the FR-RET-6 vector fallback (distinct from the surfacing threshold) |
Maintenance / dedup / decay (FR-MNT-*)
| Variable | Default | §6.6 |
|---|---|---|
MNEMOZINE_MAINTENANCE__DEDUP_EQUIVALENCE_THRESHOLD |
0.9 |
dedup.equivalence_threshold — reinforce-vs-add |
MNEMOZINE_MAINTENANCE__EDGE_WEIGHT_FLOOR |
0.1 |
maintenance.edge_weight_floor — low-weight edge pruning |
MNEMOZINE_MAINTENANCE__MAX_NODE_DEGREE |
64 |
maintenance.max_node_degree — traversal-bound cap |
MNEMOZINE_MAINTENANCE__CONTRADICTION_CANDIDATE_CAP |
5 |
FR-MNT-1 supersede-LLM candidate cap |
MNEMOZINE_MAINTENANCE__DECAY_HALF_LIFE_DAYS |
30 |
decay.half_life (days) |
MNEMOZINE_MAINTENANCE__DECAY_ARCHIVE_AFTER_DAYS |
90 |
decay.archive_after — hot→archive demotion (days unused) |
MNEMOZINE_MAINTENANCE__CRON |
0 3 * * * |
scheduled maintenance cadence (FR-MNT-5) |
Retrieval (FR-RET-2)
| Variable | Default | §6.6 |
|---|---|---|
MNEMOZINE_RETRIEVAL__P95_LATENCY_TARGET_MS |
500 |
retrieval.p95_latency_target — baseline set in Phase 1 |
MNEMOZINE_RETRIEVAL__TOP_K |
10 |
default results per scoped query |
MNEMOZINE_RETRIEVAL__NEIGHBORHOOD_HOPS |
1 |
FR-RET-2 entity-neighborhood traversal depth |
In Helm these same knobs live under .Values.tuning (camelCase) and render into
the ConfigMap, e.g.:
helm upgrade mz deploy/helm/mnemozine \
--set tuning.crossref.relevanceThreshold=0.85 \
--set tuning.inject.tokenBudget=400 \
--set tuning.maintenance.cron='0 4 * * *'
Registering the Claude Code hooks
Claude Code invokes a hook as a subprocess, passing a JSON payload on stdin
and reading the hook's response (JSON hookSpecificOutput) from stdout. The
four hook entrypoints are installed as console scripts by the package:
| Hook event | Script | Does |
|---|---|---|
SessionStart |
mnemozine-hook-session-start |
inject the compact, ~500-token memory index (FR-RET-3) |
UserPromptSubmit |
mnemozine-hook-user-prompt-submit |
inject finer-grained prompt-scoped memory mid-session (FR-RET-5) |
Stop |
mnemozine-hook-stop |
flush the session's chunk into ingestion at session end (FR-ING-6) |
PreCompact |
mnemozine-hook-pre-compact |
flush the chunk before compaction (FR-ING-6) |
Register all four in Claude Code's settings.json hooks block. Each entry is
a command-type hook; the four entrypoints read the hook JSON from stdin and
take no command-line arguments, so the command is just the path to the
installed console script (no flags). SessionStart / UserPromptSubmit /
Stop / PreCompact are not tool-matched events, so no matcher is needed.
Drop this into ~/.claude/settings.json (user-global) or a project's
.claude/settings.json. Use the absolute path to the installed scripts —
i.e. the path that which mnemozine-hook-session-start prints inside the
environment where you ran pip install -e . (typically …/.venv/bin/…):
{
"hooks": {
"SessionStart": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-session-start" }
]
}
],
"UserPromptSubmit": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-user-prompt-submit" }
]
}
],
"Stop": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-stop" }
]
}
],
"PreCompact": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-pre-compact" }
]
}
]
}
}
If the scripts are on PATH for the shell Claude Code spawns hooks in, you may
use the bare names ("command": "mnemozine-hook-session-start"), but an absolute
path is the robust default since the hook subprocess does not inherit your
interactive shell's activated venv. Resolve the four absolute paths at once with:
for h in session-start user-prompt-submit stop pre-compact; do
command -v "mnemozine-hook-$h"
done
Notes:
- The hooks are fail-safe: an empty/invalid payload, an unwired backend, or any internal error yields an empty injection (or no-op flush) rather than raising — a hook must never break the session.
- Injected memory is wrapped in
<mnemozine-memory>…</mnemozine-memory>delimiters so the model treats it as advisory background, and is truncated toinject.token_budget(~500 tokens). - The hooks call into the same wired retriever + ingest service the
mnemozine-ingestprocess owns; running that daemon installs the loader the hooks use. TheStop/PreCompactflush is idempotent — flushing a session the watcher already tailed is a no-op (de-dup on the FR-ING-5 content hash).
Registering the MCP server (the recall tool)
The hooks give Claude Code memory proactively (session start + prompt
submit). To let the model also pull memory on demand mid-session, register
the same mnemozine-mcp server with Claude Code. It exposes recall(query, scope=None, top_k=10) plus the two index tools.
For a local Claude Code, run the MCP server over stdio (it speaks stdio by default). Add it with the CLI:
claude mcp add --transport stdio mnemozine -- mnemozine-mcp
…or declare it by hand in ~/.claude.json (user scope) / .mcp.json (project
scope). Use the absolute path to the installed script and point it at the
same FalkorDB the hooks write to:
{
"mcpServers": {
"mnemozine": {
"command": "/abs/path/to/.venv/bin/mnemozine-mcp",
"args": [],
"env": {
"MNEMOZINE_FALKORDB__URL": "redis://localhost:6379"
}
}
}
}
If you are instead running the server over the network, register it as an HTTP
server instead of spawning a fresh stdio process. In the consolidated default the
all-in-one mnemozine container already serves MCP over streamable-http at
/mcp on the published :8765 (alongside the WebUI) — point Claude Code at
that path:
claude mcp add --transport http mnemozine http://localhost:8765/mcp
(If you instead run the standalone mnemozine-mcp over the network, use
mnemozine-mcp --transport streamable-http (or sse) bound to
MNEMOZINE_MCP_HOST / MNEMOZINE_MCP_PORT — its bundled command otherwise runs
the default stdio transport, so add the flag to expose HTTP.)
Same store, both ways. The hooks and the MCP server must read the same
MNEMOZINE_FALKORDB__URL— if hooks write to one FalkorDB and the MCP server reads from another, memory will not flow. Keep both pulling the URL from the same.envor environment.
Pointing OpenAI-format agents and Hermes at the gateway
Capture happens through the LiteLLM OpenAI-format gateway with a registered
logging callback. The reference proxy config is
mnemozine/ingestion/gateway/config.yaml
(docker-compose uses deploy/litellm.config.yaml).
Phase-2, default-off. Both the gateway (FR-ING-3) and Hermes (FR-ING-4) sources are off by default — a fresh install only runs the Claude Code watcher. Turn them on with the
MNEMOZINE_INGEST__ENABLE_*flags below; the ingest loop (build_ingest_sources()inmnemozine/ingestion/loop.py) reads them and fans every enabled source into one serialized consumer. The gateway callback uses an in-processasyncio.Queue, so it must run in the same process asmnemozine-ingest.
OpenAI-format agents (FR-ING-3)
-
Enable the gateway source on the
mnemozine-ingestprocess:MNEMOZINE_INGEST__ENABLE_GATEWAY=true MNEMOZINE_INGEST__GATEWAY_DEFAULT_PROJECT=my-project # fallback project MNEMOZINE_INGEST__GATEWAY_QUEUE_MAX=10000 # in-process buffer -
Run the gateway:
litellm --config mnemozine/ingestion/gateway/config.yaml --port 4000(docker-compose / Helm run the
litellmservice for you.) The callback is registered inlitellm_settings.callbacksas the dotted pathmnemozine.ingestion.gateway.litellm_register.gateway_callback(LiteLLM resolves it by string lookup at runtime — the path must match exactly). The proxy's own upstream models come from the yaml (os.environ/MNEMOZINE_GATEWAY_QWEN_BASE_URL,…_QWEN_API_KEY):model_list: - model_name: qwen litellm_params: model: openai/qwen2.5 api_base: os.environ/MNEMOZINE_GATEWAY_QWEN_BASE_URL api_key: os.environ/MNEMOZINE_GATEWAY_QWEN_API_KEY litellm_settings: callbacks: mnemozine.ingestion.gateway.litellm_register.gateway_callback -
Point any operator-controlled, repointable OpenAI-format agent at the gateway by setting its OpenAI
base_urltohttp://<gateway-host>:4000/v1(port 4000 is the LiteLLM default;--portoverrides it) and anyapi_keythe proxy expects. Every completion that agent makes is then captured and emitted as common-schema events (source=openai), withtool_callsstripped (FR-ING-7).To route a turn to a specific project/session, thread it through LiteLLM's metadata dict — there is no request-path routing otherwise:
metadata={"mnemozine_project": "my-project", "mnemozine_session_id": "sess-123"}The callback resolves
projectfrommnemozine_project→project→ the configured default, andsession_idfrommnemozine_session_id→session_id→user→ the LiteLLM call id. -
The gateway's own upstream (the model it proxies to) is the local Qwen by default; swap to a cloud backend by editing the
model_listapi_base/api_key(a single line) — capture still works.
Explicit non-capability (FR-ING-3): third-party apps that cannot be repointed at the gateway
base_url(ChatGPT desktop, Cursor, …) are not captured by this path.
Hermes (FR-ING-4)
Hermes is the self-hosted Nous Research Hermes agent on a homelab VM. Two paths:
-
Preferred — direct instrumentation. Enable the Hermes source:
MNEMOZINE_INGEST__ENABLE_HERMES=true MNEMOZINE_INGEST__HERMES_DEFAULT_PROJECT=hermes # fallback project MNEMOZINE_INGEST__HERMES_QUEUE_MAX=10000 # in-process bufferThen instrument the VM to push each completed turn into the
HermesAdapter(mnemozine.ingestion.hermes.HermesAdapter, anIngestSource), which normalizes Hermes-native payloads into the common schema (source=hermes), strippingtool_calls:hermes.feed(payload) # sync, returns the emitted IngestEvent list await hermes.afeed(payload) # async, awaits queue spaceThe adapter is field-name tolerant —
conversation_id/session_id/idfor the session,messages/turnsfor the turn list,content/textfor text,timestamp/created_atfor time. Recorded turns replay viabackfillfor the Phase-1 historical import. -
Fallback — front it with a gateway. If direct instrumentation is impractical, enable the gateway source and run a second LiteLLM proxy whose upstream
api_baseis Hermes' OpenAI-compatible endpoint and whose callback referencesmnemozine.ingestion.gateway.litellm_register.hermes_gateway_callback(note: notgateway_callback— that stampssource=openai):MNEMOZINE_INGEST__ENABLE_GATEWAY=true MNEMOZINE_INGEST__HERMES_BASE_URL=https://hermes-agent.nousresearch.com/ MNEMOZINE_INGEST__HERMES_API_KEY=<api-key-if-needed>model_list: - model_name: hermes litellm_params: model: openai/hermes api_base: https://hermes-agent.nousresearch.com/v1 api_key: os.environ/MNEMOZINE_HERMES_API_KEY litellm_settings: callbacks: mnemozine.ingestion.gateway.litellm_register.hermes_gateway_callbackThe Hermes variant is sketched (commented) at the bottom of
gateway/config.yaml.
Reading memory back
All agents — Claude Code and OpenAI/Hermes alike — read from the single MCP
server (mnemozine-mcp). It exposes:
recall(query, scope=None, top_k=10)— on-demand consolidated recall (FR-RET-4).scopeis optional: omit for current project + global, or passglobal/project:<id>/ a bare project id.session_start_index(...)— the FR-RET-3 compact index as a tool (so non-hook agents can request it too).mid_session_index(prompt, project=None)— the FR-RET-5 finer-grained index.
Transports: stdio (Claude Code local default) and streamable-http / sse
(networked OpenAI/Hermes agents), selected with mnemozine-mcp --transport ....
Eval harness and bootstrapping the eval set
The §9 eval harness is the mnemozine-eval console script. It runs offline
against a committed gold-set fixture and a packaged in-memory fake store, so it
needs no FalkorDB/Ollama/Qwen.
mnemozine-eval run # every §9 metric once; exits non-zero on failure
mnemozine-eval run -x 10 # same, with a 10x distractor inflation
mnemozine-eval scaling # headline: injection precision at 1x/10x/100x
mnemozine-eval show-gold # summarize the gold set
scaling is the headline §9 assertion — that precision does not decline as
the store is inflated with synthetic plausible-but-irrelevant distractors
(--levels 1,10,100, --tolerance for allowed drop). It exits non-zero if
precision declines.
Bootstrapping the eval set (operator task)
The eval set encodes the operator's own preferences across their own projects, so only the operator can label it (PRD §9 — this is an operator deliverable, ≈40 cases, ~2–3 hrs). Two-step flow:
# 1. Auto-propose extracted candidates and write a Markdown review sheet.
mnemozine-eval bootstrap-propose --out eval_review.md
# 2. Edit eval_review.md by hand: tick "- [x] keep" on candidates to keep,
# optionally correcting the proposed type/scope (human-in-the-loop, R1).
# 3. Fold the labeled sheet into a committed gold set.
mnemozine-eval bootstrap-finish --in eval_review.md --out mnemozine/evals/fixtures/gold_set.json
bootstrap-finish reads the ticked candidates back, builds a GoldSet (seed
memories + classifier cases), and writes it to the gold-set JSON (default the
committed fixture at mnemozine/evals/fixtures/gold_set.json). Commit that file
and run mnemozine-eval run on every change and on a schedule.
The offline bootstrap-propose uses a tiny demo backlog so the command is
exercisable out of the box; the integration pass can point it at the real
IngestSource.backfill + Extractor to propose from your actual historical
import.
Operations
Maintenance schedule (FR-MNT-5)
Maintenance is a separate, idempotent, repeatable pass (consolidate → resolve entities → decay/archive → audit, in that order):
mnemozine-maintenance run # run the full pass once and exit
mnemozine-maintenance serve # run on the configured cron until interrupted
- The cron cadence is
MNEMOZINE_MAINTENANCE__CRON(default0 3 * * *); theservemode uses APScheduler. - In docker-compose the all-in-one
mnemozinecontainer runs the maintenance scheduler continuously as itsMNEMOZINE_RUN__MAINTENANCEcomponent (set the togglefalseto disable it, e.g. on a remote ingest-only node). - In Helm it is a long-lived
Deploymentby default; setmaintenance.asCronJob=trueto render a KubernetesCronJob(schedule frommaintenance.cronSchedule, defaulting totuning.maintenance.cron). - Each job is isolated — a failure in one is recorded as a note but does not abort the rest of the pass.
- Demotion to the archive tier is governed by
decay.archive_after(DECAY_ARCHIVE_AFTER_DAYS, default 90 days unused); the system archives, never hard-deletes by default.
Backing up the FalkorDB volume
FalkorDB is the single source of truth (graph and vectors). Its data lives at
/data:
- docker-compose — the named volume
falkordb-data(mounted at/data). - Helm — the StatefulSet's
dataPVC (thevolumeClaimTemplate, mounted at/data).
FalkorDB speaks the Redis protocol, so back up the on-disk RDB. Trigger a save then copy the dump out:
# docker-compose — trigger a save, then copy /data out of the falkordb container.
# (The named volume is <project>_falkordb-data; the project name defaults to the
# compose file's directory, so `docker compose ... config --volumes` /
# `docker inspect` resolve the exact volume name if you back it up by volume.)
docker compose -f deploy/docker-compose.yml exec falkordb redis-cli SAVE
docker compose -f deploy/docker-compose.yml cp falkordb:/data ./falkordb-backup-$(date +%F)
# kubernetes (StatefulSet pod <release>-mnemozine-falkordb-0, e.g. mz-mnemozine-falkordb-0)
kubectl -n mnemozine exec mz-mnemozine-falkordb-0 -- redis-cli SAVE
kubectl -n mnemozine cp mz-mnemozine-falkordb-0:/data ./falkordb-backup-$(date +%F)
If the FalkorDB password is set, pass -a "$MNEMOZINE_FALKORDB__PASSWORD" to
redis-cli. Restore by stopping FalkorDB, replacing the contents of the volume /
PVC with a backed-up /data, and restarting. Snapshotting the underlying volume
(or PVC VolumeSnapshot) while FalkorDB is quiesced is an equivalent approach.
Superseded/decayed memories are kept (archive tier) rather than deleted, so the store grows slowly over time; size the FalkorDB volume (compose volume / Helm
falkordb.persistence.size, default 10Gi) and Ollama/Qwen model volumes accordingly.
Health checks
- The all-in-one
mnemozinecontainer exposes an HTTP surface on:8765(the WebUI/API, with MCP mounted at/mcp) wheneverRUN__WEBand/orRUN__MCPare on; compose/Helm probe it via TCP/HTTP. - An ingest-only or maintenance-only process (e.g. the split-deployment
node, or a component-toggled standalone script) has no HTTP surface — liveness
is "the watcher/scheduler process is still running" (
pgrep).
Configuration reference
The single source of truth for config is mnemozine/config.py; the full env-var
list (with the MNEMOZINE_ prefix and __ nesting) is
.env.example. Deployment specifics — image overrides, Helm
values.yaml knobs, the MZ_COMPOSE_* compose overrides — are in
deploy/README.md.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。