L.O.G. (Latent Orchestration Gateway)
A privacy-first memory layer that pseudonymizes sensitive data locally before sharing a 'Working-Fiction' version with external AI agents. It enables secure agentic workflows by ensuring personally identifiable information never leaves the user's sovereign hardware.
README
<p align="center"> <strong>LOG-mcp</strong><br> <em>Stop guessing which AI model to use. Let your own judgment build the answer.</em> </p>
Every AI gateway routes your prompts. None of them learn from your choices.
LOG-mcp sends your prompt to multiple models simultaneously, you pick the best response, and the system builds a comparative dataset from your judgment. Over time it learns which models excel at your specific tasks — not synthetic benchmarks, not marketing claims, but your actual usage patterns.
It also strips your personal data before it reaches any cloud API, caches similar queries locally, and exports everything you need to fine-tune a local model that gradually replaces the cloud entirely.
The draft round isn't a feature. It's a data collection primitive that doesn't exist anywhere else.
How It Works
Your prompt
│
▼
┌─────────────────────────────────┐
│ 🎯 precise (temp 0.2) │
│ 💡 creative (temp 0.7) │──► You pick the winner
│ 🧠 deep (reasoner) │
└─────────────────────────────────┘
│
▼
Routing learns: "For this user, code questions → reasoner,
creative writing → creative, facts → precise"
│
▼
Eventually: draft rankings become training data →
fine-tuned local model replaces cloud API
Why This Is Different
Every other AI gateway (LiteLLM, OpenRouter, Portkey, Helicone) solves one problem: call multiple providers through one API. They're middleware for routing. You pick models based on benchmark scores, pricing pages, or vibes.
LOG-mcp solves a different problem: building a dataset from your actual preferences that makes routing, caching, and eventually local inference provably better over time.
| Other Gateways | LOG-mcp | |
|---|---|---|
| Route to multiple providers | ✅ | ✅ |
| Learn which provider you prefer | ❌ | ✅ (draft comparison) |
| Privacy: strip PII before cloud API | ❌ (rare) | ✅ (default) |
| Cache semantically similar queries | ❌ (rare) | ✅ (local embeddings) |
| Export preference data for training | ❌ | ✅ (LoRA/DPO format) |
| Run local models with GPU isolation | ❌ | ✅ (subprocess mode) |
| Self-hosted, single binary | Sometimes | ✅ (Python, SQLite, no runtime deps) |
The moat isn't the code. It's the comparative dataset — the same prompt, multiple models, human judgment, repeated thousands of times. That dataset doesn't exist publicly, and you can't buy it.
Who Is This For
Developers building AI-powered apps. You're currently calling one model and hoping it's good enough. LOG-mcp gives you an OpenAI-compatible API that automatically picks the best model for each query, based on your users' actual feedback.
Power users who talk to AI all day. You're paying for multiple subscriptions and manually switching between ChatGPT, Claude, and DeepSeek depending on the task. LOG-mcp gives you one interface that routes intelligently and learns your preferences.
Teams with privacy requirements. You can't send customer emails, employee names, or financial data to OpenAI. LOG-mcp strips PII before it leaves your server and puts it back in the response. Your AI provider never sees personal data.
People who want to own their AI stack. Today you use cloud APIs. Tomorrow you want a local model that's as good. LOG-mcp's training pipeline turns your draft rankings into fine-tuning data for that transition.
Quick Start
git clone https://github.com/CedarBeach2019/LOG-mcp.git
cd LOG-mcp
cp .env.example .env # Edit with your API key and passphrase
pip install -r requirements.txt
python -m gateway.server
Open http://localhost:8000. That's it.
Works with DeepSeek out of the box (free tier available). Also supports Groq, OpenAI, OpenRouter, and local GGUF models.
Docker
cp .env.example .env # Edit first
docker compose up -d
Using as an API
Drop-in replacement for any OpenAI SDK:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="your-passphrase")
# That's not an API key — it's your LOG-mcp passphrase
response = client.chat.completions.create(
model="auto", # LOG-mcp picks the best model
messages=[{"role": "user", "content": "Write a Python sort function"}],
)
# Route badge tells you which model was used
print(response.choices[0].message.content)
What's Under the Hood
Privacy Pipeline
Every request passes through dehydration before reaching a cloud API. Emails become [EMAIL_1], phone numbers become [PHONE_1], names become [PERSON_1]. The PII map is stored locally and used to rehydrate the response. The cloud API never sees your data.
Intelligent Routing
A pattern-matching classifier categorizes every message (code, creative, factual, debug, etc.) and routes to the appropriate model. The classifier improves over time from your feedback — not by training a model, but by updating rules based on what actually worked.
Draft Comparison
The headline feature. Toggle draft mode and your prompt goes to 3 profiles simultaneously (configurable: different models, temperatures, system prompts). You see all responses, pick the winner, and optionally elaborate. Every ranking is stored and feeds the training pipeline.
Adaptive Learning
Tracks model reliability (does it crash?), response quality (do you thumbs-up?), latency, and estimated cost. Routes around degraded providers automatically. Over time, builds a profile of which model excels at which task for you.
Semantic Cache
Locally-hosted embedding model (optional) caches semantically similar queries. "What is 2+2?" and "What does two plus two equal?" hit the same cache entry. Reduces API costs and latency.
Training Pipeline
Exports your draft rankings as properly formatted LoRA and DPO training data. The dataset includes the prompt, the winning response (chosen), the losing response (rejected), and quality metadata. Feed this into any fine-tuning framework to create a model tuned to your preferences.
Local Inference
Run GGUF models (Llama, Qwen, Phi, Mistral) directly on your hardware. On constrained devices (Jetson, Raspberry Pi), models run in an isolated subprocess to avoid GPU memory conflicts. Hot-swap models without downtime.
Architecture
┌──────────────┐ ┌──────────────────────────────────────────┐
│ Client │────►│ Gateway (Starlette) │
│ Web / SDK │ │ │
└──────────────┘ │ Auth → PII Strip → Route → Model Call │
│ → PII Restore → Cache → Respond │
│ │
│ ┌─────────┐ ┌──────────┐ ┌───────────┐ │
│ │ Router │ │ Draft │ │ Adaptive │ │
│ │ Rules │ │ Compare │ │ Learner │ │
│ └─────────┘ └──────────┘ └───────────┘ │
└──────────────────┬───────────────────────┘
│
┌──────────────────┼───────────────────────┐
│ │ │
┌────▼────┐ ┌─────▼──────┐ ┌────────▼────┐
│ DeepSeek │ │ Groq │ │ Local │
│ (API) │ │ (API) │ │ (GGUF) │
└──────────┘ └────────────┘ └─────────────┘
Configuration
# Required
LOG_API_KEY=sk-... # DeepSeek API key (get one free at platform.deepseek.com)
LOG_PASSPHRASE=a-secret-phrase # Login passphrase for the web UI and API
# Optional
LOG_CHEAP_MODEL=deepseek-chat # Model for simple queries (default: deepseek-chat)
LOG_ESCALATION_MODEL=deepseek-reasoner # Model for complex queries (default: deepseek-reasoner)
LOG_PRIVACY_MODE=true # Strip PII before cloud API calls (default: true)
LOG_CACHE_ENABLED=true # Cache similar queries locally (default: true)
LOG_DB_PATH=~/.log/vault.db # Where to store your data (default: ~/.log/vault.db)
LOG_CORS_ORIGINS=http://localhost:8000 # Allowed origins (set to * to allow all)
LOG_JWT_SECRET= # JWT signing key (auto-generated if not set)
LOG_STREAM_TIMEOUT=120 # Max seconds for streaming responses (default: 120)
LOG_MAX_BODY_SIZE=1048576 # Max request body size in bytes (default: 1MB)
See .env.example for a complete template.
API Endpoints
OpenAI-compatible at POST /v1/chat/completions. Also includes:
POST /v1/drafts— Multi-model draft comparisonPOST /v1/feedback— Submit preference (thumbs up/down)GET/POST/DELETE /v1/sessions— Conversation historyGET/POST/DELETE /v1/preferences— User preferencesGET/POST/DELETE /v1/profiles— Provider profilesGET /v1/health— Deep health check (DB, model, disk, memory)GET /v1/metrics— Request metrics (latency, error rate, cache hits)GET /v1/adaptive/dashboard— Model health and cost trackingGET /v1/discovery/search— Browse available modelsGET /v1/training/export— Export training dataGET/PUT /v1/config— Runtime configuration
What You Need
- Python 3.10+
- A DeepSeek API key (free tier) — or any OpenAI-compatible API
- ~100MB disk for the app, ~1GB+ if you use local models
- Optional: CUDA GPU for local inference, sentence-transformers for semantic cache
What's Working Now
✅ Core pipeline (PII strip → route → model call → response)
✅ Draft comparison with user ranking
✅ Feedback loop and preference learning
✅ Multi-provider routing (DeepSeek, Groq, OpenAI, OpenRouter, local)
✅ Adaptive model health scoring and cost tracking
✅ Semantic caching with local embeddings
✅ Local GGUF model inference with GPU subprocess isolation
✅ Training data export (LoRA + DPO format)
✅ Dataset quality scoring and deduplication
✅ Prompt template selection and context window management
✅ Session management, streaming, observability, rate limiting
✅ Docker deployment
What's Coming
🔜 Provider management UI
🔜 LoRA training runner (consume exported data)
🔜 Evaluation harness (benchmark your fine-tuned models)
🔜 Bulk annotation UI (review and rank past interactions)
🔜 Mobile-responsive web UI
🔜 OpenAI function/tool calling passthrough
Security & Privacy
- PII stripping is on by default. Emails, phone numbers, names, addresses, dates, SSNs, credit card numbers are replaced with tokens before reaching any cloud API.
- All data stored locally in SQLite. Nothing is sent to LOG-mcp servers — there are none.
- JWT authentication with configurable secret.
- Timing-safe passphrase comparison.
- CORS locked to localhost by default. Explicitly configure origins for production.
- No telemetry. No phone home. No analytics. Your data is yours.
- Rate limiting prevents abuse (60 req/min, 10 burst).
- Request body size limits prevent memory exhaustion.
Development
# Install deps
pip install -r requirements.txt
# Run tests
make test
# or
python -m pytest tests/ -q
# Run the server
make run
# or
python -m gateway.server
518 tests passing. CI runs on Python 3.10, 3.11, 3.12.
License
MIT
<p align="center"> <strong>The moat isn't the code.</strong> It's the comparative dataset —<br> the same prompt, multiple models, human judgment, repeated thousands of times. </p>
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。