Mnemo
Provides AI assistants with extended memory by loading large codebases, documentation sites, PDFs, and GitHub repos into Gemini's context cache for perfect recall querying without complex RAG pipelines.
README
Mnemo
Extended memory for AI assistants via Gemini context caching.
Mnemo (Greek: memory) gives AI assistants like Claude access to large codebases, documentation sites, PDFs, and more by leveraging Gemini's 1M token context window and context caching features.
Why Mnemo?
Instead of complex RAG pipelines with embeddings and retrieval, Mnemo takes a simpler approach:
- Load your entire codebase into Gemini's context cache
- Query it with natural language
- Let Claude orchestrate while Gemini holds the context
This gives you:
- Perfect recall - no chunking or retrieval means no lost context
- Lower latency - cached context is served quickly
- Cost savings - cached tokens cost 75-90% less than regular input tokens
- Simplicity - no vector databases, embeddings, or complex retrieval logic
What Can Mnemo Load?
| Source | Local Server | Worker |
|---|---|---|
| GitHub repos (public) | ✅ | ✅ |
| GitHub repos (private) | ✅ | ✅ |
| Any URL (docs, articles) | ✅ | ✅ |
| PDF documents | ✅ | ✅ |
| JSON APIs | ✅ | ✅ |
| Local files/directories | ✅ | ❌ |
| Multi-page crawls | ✅ unlimited | ✅ 40 pages max |
Deployment Options
Mnemo can be deployed in three ways depending on your needs.
Option 1: Local Server (Development & Full Features)
Best for development and when you need to load local files.
# Clone and install
git clone https://github.com/logos-flux/mnemo
cd mnemo
bun install
# Set your Gemini API key
export GEMINI_API_KEY=your_key_here
# Start the server
bun run dev
Claude Code MCP config:
{
"mcpServers": {
"mnemo": {
"type": "http",
"url": "http://localhost:8080/mcp"
}
}
}
Option 2: Self-Hosted Cloudflare Worker (Recommended for Claude.ai)
Deploy to your own Cloudflare account. You control your data and costs.
Prerequisites:
- Cloudflare account (free tier works)
- Gemini API key
# Clone and install
git clone https://github.com/logos-flux/mnemo
cd mnemo/packages/cf-worker
# Configure secrets
bunx wrangler secret put GEMINI_API_KEY
bunx wrangler secret put MNEMO_AUTH_TOKEN # Optional but recommended
# Create D1 database
bunx wrangler d1 create mnemo-cache
# Deploy
bunx wrangler deploy
Claude.ai MCP config:
{
"mcpServers": {
"mnemo": {
"type": "http",
"url": "https://mnemo.<your-subdomain>.workers.dev/mcp",
"headers": {
"Authorization": "Bearer YOUR_AUTH_TOKEN"
}
}
}
}
Why use this? Claude.ai can't connect to localhost. The Worker gives you an external endpoint that Claude.ai can reach.
Option 3: Managed Hosting (VIP)
Don't want to manage infrastructure? We offer fully managed Mnemo hosting for select clients.
Includes:
- Dedicated Worker deployment
- Priority support
- Custom domain
- Usage monitoring
Contact: lf@logosflux.io for pricing and availability.
Usage Examples
# Load a GitHub repo
curl -X POST http://localhost:8080/tools/context_load \
-H "Content-Type: application/json" \
-d '{"source": "https://github.com/honojs/hono", "alias": "hono"}'
# Load a documentation site (crawls up to token target)
curl -X POST http://localhost:8080/tools/context_load \
-H "Content-Type: application/json" \
-d '{"source": "https://hono.dev/docs", "alias": "hono-docs"}'
# Load a PDF
curl -X POST http://localhost:8080/tools/context_load \
-H "Content-Type: application/json" \
-d '{"source": "https://arxiv.org/pdf/2303.08774.pdf", "alias": "gpt4-paper"}'
# Load a private repo (with GitHub token)
curl -X POST http://localhost:8080/tools/context_load \
-H "Content-Type: application/json" \
-d '{"source": "https://github.com/owner/private-repo", "alias": "private", "githubToken": "ghp_xxx"}'
# Load multiple sources into one cache
curl -X POST http://localhost:8080/tools/context_load \
-H "Content-Type: application/json" \
-d '{"sources": ["https://github.com/owner/repo", "https://docs.example.com"], "alias": "combined"}'
# Query the cache
curl -X POST http://localhost:8080/tools/context_query \
-H "Content-Type: application/json" \
-d '{"alias": "hono", "query": "How do I add middleware?"}'
# List active caches
curl -X POST http://localhost:8080/tools/context_list \
-H "Content-Type: application/json" -d '{}'
# Get usage stats with cost tracking
curl -X POST http://localhost:8080/tools/context_stats \
-H "Content-Type: application/json" -d '{}'
# Evict when done
curl -X POST http://localhost:8080/tools/context_evict \
-H "Content-Type: application/json" \
-d '{"alias": "hono"}'
CLI
# Start server
mnemo serve
# Start MCP stdio transport (for Claude Desktop)
mnemo stdio
# Load a project
mnemo load ./my-project my-proj
# Query
mnemo query my-proj "What's the main entry point?"
# List caches
mnemo list
# Remove cache
mnemo evict my-proj
MCP Tools
| Tool | Description |
|---|---|
context_load |
Load GitHub repos, URLs, PDFs, or local dirs into Gemini cache |
context_query |
Query a cached context with natural language |
context_list |
List all active caches with token counts and expiry |
context_evict |
Remove a cache |
context_stats |
Get usage statistics with cost tracking |
context_refresh |
Reload a cache with fresh content |
context_load Parameters
| Parameter | Description |
|---|---|
source |
Single source: GitHub URL, any URL, or local path |
sources |
Multiple sources to combine into one cache |
alias |
Friendly name for this cache (1-64 chars) |
ttl |
Time to live in seconds (60-86400, default 3600) |
githubToken |
GitHub token for private repos |
systemInstruction |
Custom system prompt for queries |
Configuration
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY |
Your Gemini API key | Required |
MNEMO_PORT |
Server port (local only) | 8080 |
MNEMO_DIR |
Data directory (local only) | ~/.mnemo |
MNEMO_AUTH_TOKEN |
Auth token for protected endpoints | None |
Authentication
When MNEMO_AUTH_TOKEN is configured, the /mcp and /tools/* endpoints require authentication:
# Set auth token (Workers)
bunx wrangler secret put MNEMO_AUTH_TOKEN
# Requests must include header:
Authorization: Bearer your-token-here
Public endpoints (no auth required):
GET /health- Health checkGET /- Service infoGET /tools- List available tools
Costs
You always pay for Gemini API usage regardless of deployment option. Mnemo uses Gemini's context caching which is significantly cheaper than standard input:
| Resource | Cost |
|---|---|
| Cache storage | ~$4.50 per 1M tokens per hour |
| Cached input | 75-90% discount vs regular input |
| Regular input | ~$0.075 per 1M tokens (Flash) |
Example: 100K token codebase cached for 1 hour with 10 queries ≈ $0.47
Cloudflare costs (self-hosted):
- Workers: Free tier includes 100K requests/day
- D1: Free tier includes 5M reads/day
- Likely $0 for moderate usage
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Mnemo │
├─────────────────────────────────────────────────────────────┤
│ MCP Tools │
│ • context_load - Load into Gemini cache │
│ • context_query - Query cached context │
│ • context_list - Show active caches │
│ • context_evict - Remove cache │
│ • context_stats - Token usage, costs │
│ • context_refresh - Reload cache │
├─────────────────────────────────────────────────────────────┤
│ Adapters (v0.2) │
│ • GitHub repos (via API) │
│ • URL loading (HTML, PDF, JSON, text) │
│ • Token-targeted crawling │
│ • robots.txt compliance │
├─────────────────────────────────────────────────────────────┤
│ Packages │
│ • @mnemo/core - Gemini client, loaders, adapters │
│ • @mnemo/mcp-server - MCP protocol handling │
│ • @mnemo/cf-worker - Cloudflare Workers deployment │
│ • @mnemo/local - Bun-based local server │
└─────────────────────────────────────────────────────────────┘
License
MIT
Credits
Built by Logos Flux | Voltage Labs
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。