journal-rag
Hybrid retrieval MCP server for searching team markdown journals using BM25 and local vector embeddings, with tools for search, browse, and regex lookup.
README
journal-rag
Source-control-friendly hybrid retrieval over team markdown journals. Heading-chunked BM25 + local vector embeddings fused via Reciprocal Rank Fusion (RRF), with regex as an escape hatch. Index built on startup with an optional gitignored JSON cache.
Embeddings run locally via @huggingface/transformers (default model: all-MiniLM-L6-v2) — no API keys, no external calls.
Each consuming repo commits journal-rag.config.json and markdown under docs/journal/ (or other configured folders). This package is the shared engine.
Per-repo config
Create journal-rag.config.json at the repo root:
{
"sources": ["docs/journal"],
"cachePath": ".journal-rag/index.json",
"embeddingModel": "Xenova/all-MiniLM-L6-v2"
}
| Field | Required | Default | Description |
|---|---|---|---|
sources |
yes | — | Directories containing markdown journals |
cachePath |
no | .journal-rag/index.json |
BM25 chunk index cache path |
embeddingModel |
no | Xenova/all-MiniLM-L6-v2 |
Hugging Face model ID for local embeddings |
The vector cache (vectors.json) is stored in the same directory as cachePath.
Add to .gitignore:
.journal-rag/
Build & install (once per machine)
cd c:/repos/journal-rag
npm install # runs prepare → build
npm link # puts journal + journal-mcp on your PATH
npm link registers two global commands:
| Command | What it runs |
|---|---|
journal |
CLI (search, list, get, …) |
journal-mcp |
MCP stdio server (for editor config) |
Re-run npm run build (or npm link again) after pulling server changes.
Alternative to link: npm install -g . from this repo (same effect).
CLI (any teammate)
From a repo root with config:
journal search "HttpFacade singleton" # hybrid BM25 + vector (default)
journal search "HttpFacade singleton" --bm25 # BM25-only (no embedding)
journal list --filter dialog
journal get docs/journal/2026-04-21_vapp-http-facade-and-singleton-sweep.md
journal index --rebuild
After npm link in this repo, journal search "..." works globally.
Set JOURNAL_RAG_WORKSPACE to an absolute repo root only when you must run the CLI from a subdirectory.
The first run downloads the embedding model (~80 MB) to the Hugging Face cache directory. Subsequent runs load from cache.
MCP tools
| Tool | Purpose |
|---|---|
search_journal |
Hybrid BM25 + vector search with RRF fusion (query, k). Falls back to BM25-only if vector index is unavailable. |
get_entry |
Full file by path or filename |
list_entries |
Browse metadata (filter optional) |
search_regex |
Exact / path / symbol lookup |
Editor setup
Use stdio — spawn Node with dist/server.js.
Put MCP config in the workspace, not your user profile
The server resolves journal-rag.config.json by walking up from its working directory. That file lives at each consuming repo's root (next to docs/journal/), not in journal-rag itself.
If you add the server to a global / user-level editor profile, the spawn cwd is usually wrong (home dir, editor install dir, last random folder, etc.) and the server cannot find config — even if you hardcode "cwd": "C:/repos/my-repo", that breaks the moment you open a second repo workspace.
Do this instead: commit workspace-level MCP config inside each repo that has journals. Teammates run npm link once (see above) so journal-mcp is on PATH — no machine-specific paths in the committed JSON.
Cursor
.cursor/mcp.json at the repo root (e.g. my-repo/.cursor/mcp.json) — safe to commit:
{
"mcpServers": {
"journal": {
"command": "journal-mcp",
"cwd": "${workspaceFolder}",
"env": {
"JOURNAL_RAG_WORKSPACE": "${workspaceFolder}"
}
}
}
}
${workspaceFolder} resolves to the repo you opened. journal-mcp comes from npm link in the journal-rag repo.
VS Code (Copilot agent mode)
Same idea: .vscode/mcp.json in the repo, not User settings:
{
"servers": {
"journal": {
"type": "stdio",
"command": "journal-mcp",
"cwd": "${workspaceFolder}"
}
}
}
JetBrains AI Assistant / Junie
Configure MCP at project scope (.idea / project settings), not the IDE default profile. Open the repo as the project root. Command: journal-mcp (after npm link).
If journal-mcp is not found
Ensure npm's global bin dir is on your PATH (npm bin -g). On Windows that is usually %APPDATA%\\npm. Then re-run npm link from journal-rag. Fallback for a single machine only: "command": "node", "args": ["<absolute-path>/journal-rag/dist/server.js"].
Fallback
If an editor cannot set cwd per workspace, set env JOURNAL_RAG_WORKSPACE to the absolute path of the consuming repo root in that workspace's MCP config.
Design notes
- Corpus is small (~tens of files); BM25 over heading chunks matches how journals are written.
- Vector embeddings (local, via Transformers.js) add semantic recall for paraphrased or conceptual queries.
- Reciprocal Rank Fusion (RRF, k=60) merges BM25 and vector rankings without needing score normalization.
- Index caches are optional and gitignored; markdown in git is the source of truth.
- Vector cache is incremental — only new/changed chunks are re-embedded on rebuild.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
mcp-server-qdrant
这个仓库展示了如何为向量搜索引擎 Qdrant 创建一个 MCP (Managed Control Plane) 服务器的示例。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器