L.O.G. (Latent Orchestration Gateway)

L.O.G. (Latent Orchestration Gateway)

A privacy-first memory layer that pseudonymizes sensitive data locally before sharing a 'Working-Fiction' version with external AI agents. It enables secure agentic workflows by ensuring personally identifiable information never leaves the user's sovereign hardware.

Category
访问服务器

README

<p align="center"> <strong>LOG-mcp</strong><br> <em>Stop guessing which AI model to use. Let your own judgment build the answer.</em> </p>


Every AI gateway routes your prompts. None of them learn from your choices.

LOG-mcp sends your prompt to multiple models simultaneously, you pick the best response, and the system builds a comparative dataset from your judgment. Over time it learns which models excel at your specific tasks — not synthetic benchmarks, not marketing claims, but your actual usage patterns.

It also strips your personal data before it reaches any cloud API, caches similar queries locally, and exports everything you need to fine-tune a local model that gradually replaces the cloud entirely.

The draft round isn't a feature. It's a data collection primitive that doesn't exist anywhere else.

How It Works

Your prompt
    │
    ▼
┌─────────────────────────────────┐
│  🎯 precise    (temp 0.2)       │
│  💡 creative   (temp 0.7)       │──► You pick the winner
│  🧠 deep       (reasoner)       │
└─────────────────────────────────┘
    │
    ▼
Routing learns: "For this user, code questions → reasoner,
                 creative writing → creative, facts → precise"
    │
    ▼
Eventually: draft rankings become training data →
            fine-tuned local model replaces cloud API

Why This Is Different

Every other AI gateway (LiteLLM, OpenRouter, Portkey, Helicone) solves one problem: call multiple providers through one API. They're middleware for routing. You pick models based on benchmark scores, pricing pages, or vibes.

LOG-mcp solves a different problem: building a dataset from your actual preferences that makes routing, caching, and eventually local inference provably better over time.

Other Gateways LOG-mcp
Route to multiple providers
Learn which provider you prefer ✅ (draft comparison)
Privacy: strip PII before cloud API ❌ (rare) ✅ (default)
Cache semantically similar queries ❌ (rare) ✅ (local embeddings)
Export preference data for training ✅ (LoRA/DPO format)
Run local models with GPU isolation ✅ (subprocess mode)
Self-hosted, single binary Sometimes ✅ (Python, SQLite, no runtime deps)

The moat isn't the code. It's the comparative dataset — the same prompt, multiple models, human judgment, repeated thousands of times. That dataset doesn't exist publicly, and you can't buy it.

Who Is This For

Developers building AI-powered apps. You're currently calling one model and hoping it's good enough. LOG-mcp gives you an OpenAI-compatible API that automatically picks the best model for each query, based on your users' actual feedback.

Power users who talk to AI all day. You're paying for multiple subscriptions and manually switching between ChatGPT, Claude, and DeepSeek depending on the task. LOG-mcp gives you one interface that routes intelligently and learns your preferences.

Teams with privacy requirements. You can't send customer emails, employee names, or financial data to OpenAI. LOG-mcp strips PII before it leaves your server and puts it back in the response. Your AI provider never sees personal data.

People who want to own their AI stack. Today you use cloud APIs. Tomorrow you want a local model that's as good. LOG-mcp's training pipeline turns your draft rankings into fine-tuning data for that transition.

Quick Start

git clone https://github.com/CedarBeach2019/LOG-mcp.git
cd LOG-mcp
cp .env.example .env       # Edit with your API key and passphrase
pip install -r requirements.txt
python -m gateway.server

Open http://localhost:8000. That's it.

Works with DeepSeek out of the box (free tier available). Also supports Groq, OpenAI, OpenRouter, and local GGUF models.

Docker

cp .env.example .env        # Edit first
docker compose up -d

Using as an API

Drop-in replacement for any OpenAI SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="your-passphrase")
# That's not an API key — it's your LOG-mcp passphrase

response = client.chat.completions.create(
    model="auto",  # LOG-mcp picks the best model
    messages=[{"role": "user", "content": "Write a Python sort function"}],
)

# Route badge tells you which model was used
print(response.choices[0].message.content)

What's Under the Hood

Privacy Pipeline

Every request passes through dehydration before reaching a cloud API. Emails become [EMAIL_1], phone numbers become [PHONE_1], names become [PERSON_1]. The PII map is stored locally and used to rehydrate the response. The cloud API never sees your data.

Intelligent Routing

A pattern-matching classifier categorizes every message (code, creative, factual, debug, etc.) and routes to the appropriate model. The classifier improves over time from your feedback — not by training a model, but by updating rules based on what actually worked.

Draft Comparison

The headline feature. Toggle draft mode and your prompt goes to 3 profiles simultaneously (configurable: different models, temperatures, system prompts). You see all responses, pick the winner, and optionally elaborate. Every ranking is stored and feeds the training pipeline.

Adaptive Learning

Tracks model reliability (does it crash?), response quality (do you thumbs-up?), latency, and estimated cost. Routes around degraded providers automatically. Over time, builds a profile of which model excels at which task for you.

Semantic Cache

Locally-hosted embedding model (optional) caches semantically similar queries. "What is 2+2?" and "What does two plus two equal?" hit the same cache entry. Reduces API costs and latency.

Training Pipeline

Exports your draft rankings as properly formatted LoRA and DPO training data. The dataset includes the prompt, the winning response (chosen), the losing response (rejected), and quality metadata. Feed this into any fine-tuning framework to create a model tuned to your preferences.

Local Inference

Run GGUF models (Llama, Qwen, Phi, Mistral) directly on your hardware. On constrained devices (Jetson, Raspberry Pi), models run in an isolated subprocess to avoid GPU memory conflicts. Hot-swap models without downtime.

Architecture

┌──────────────┐     ┌──────────────────────────────────────────┐
│   Client     │────►│              Gateway (Starlette)          │
│  Web / SDK   │     │                                          │
└──────────────┘     │  Auth → PII Strip → Route → Model Call   │
                     │  → PII Restore → Cache → Respond         │
                     │                                          │
                     │  ┌─────────┐ ┌──────────┐ ┌───────────┐ │
                     │  │ Router  │ │ Draft    │ │ Adaptive  │ │
                     │  │ Rules   │ │ Compare  │ │ Learner   │ │
                     │  └─────────┘ └──────────┘ └───────────┘ │
                     └──────────────────┬───────────────────────┘
                                        │
                     ┌──────────────────┼───────────────────────┐
                     │                  │                       │
                ┌────▼────┐      ┌─────▼──────┐      ┌────────▼────┐
                │ DeepSeek │      │    Groq    │      │   Local     │
                │  (API)   │      │   (API)    │      │  (GGUF)    │
                └──────────┘      └────────────┘      └─────────────┘

Full architecture docs

Configuration

# Required
LOG_API_KEY=sk-...                # DeepSeek API key (get one free at platform.deepseek.com)
LOG_PASSPHRASE=a-secret-phrase    # Login passphrase for the web UI and API

# Optional
LOG_CHEAP_MODEL=deepseek-chat     # Model for simple queries (default: deepseek-chat)
LOG_ESCALATION_MODEL=deepseek-reasoner  # Model for complex queries (default: deepseek-reasoner)
LOG_PRIVACY_MODE=true             # Strip PII before cloud API calls (default: true)
LOG_CACHE_ENABLED=true            # Cache similar queries locally (default: true)
LOG_DB_PATH=~/.log/vault.db       # Where to store your data (default: ~/.log/vault.db)
LOG_CORS_ORIGINS=http://localhost:8000  # Allowed origins (set to * to allow all)
LOG_JWT_SECRET=                   # JWT signing key (auto-generated if not set)
LOG_STREAM_TIMEOUT=120            # Max seconds for streaming responses (default: 120)
LOG_MAX_BODY_SIZE=1048576         # Max request body size in bytes (default: 1MB)

See .env.example for a complete template.

API Endpoints

OpenAI-compatible at POST /v1/chat/completions. Also includes:

  • POST /v1/drafts — Multi-model draft comparison
  • POST /v1/feedback — Submit preference (thumbs up/down)
  • GET/POST/DELETE /v1/sessions — Conversation history
  • GET/POST/DELETE /v1/preferences — User preferences
  • GET/POST/DELETE /v1/profiles — Provider profiles
  • GET /v1/health — Deep health check (DB, model, disk, memory)
  • GET /v1/metrics — Request metrics (latency, error rate, cache hits)
  • GET /v1/adaptive/dashboard — Model health and cost tracking
  • GET /v1/discovery/search — Browse available models
  • GET /v1/training/export — Export training data
  • GET/PUT /v1/config — Runtime configuration

Full API reference

What You Need

  • Python 3.10+
  • A DeepSeek API key (free tier) — or any OpenAI-compatible API
  • ~100MB disk for the app, ~1GB+ if you use local models
  • Optional: CUDA GPU for local inference, sentence-transformers for semantic cache

What's Working Now

✅ Core pipeline (PII strip → route → model call → response)
✅ Draft comparison with user ranking
✅ Feedback loop and preference learning
✅ Multi-provider routing (DeepSeek, Groq, OpenAI, OpenRouter, local)
✅ Adaptive model health scoring and cost tracking
✅ Semantic caching with local embeddings
✅ Local GGUF model inference with GPU subprocess isolation
✅ Training data export (LoRA + DPO format)
✅ Dataset quality scoring and deduplication
✅ Prompt template selection and context window management
✅ Session management, streaming, observability, rate limiting
✅ Docker deployment

What's Coming

🔜 Provider management UI
🔜 LoRA training runner (consume exported data)
🔜 Evaluation harness (benchmark your fine-tuned models)
🔜 Bulk annotation UI (review and rank past interactions)
🔜 Mobile-responsive web UI
🔜 OpenAI function/tool calling passthrough

Full roadmap

Security & Privacy

  • PII stripping is on by default. Emails, phone numbers, names, addresses, dates, SSNs, credit card numbers are replaced with tokens before reaching any cloud API.
  • All data stored locally in SQLite. Nothing is sent to LOG-mcp servers — there are none.
  • JWT authentication with configurable secret.
  • Timing-safe passphrase comparison.
  • CORS locked to localhost by default. Explicitly configure origins for production.
  • No telemetry. No phone home. No analytics. Your data is yours.
  • Rate limiting prevents abuse (60 req/min, 10 burst).
  • Request body size limits prevent memory exhaustion.

Development

# Install deps
pip install -r requirements.txt

# Run tests
make test
# or
python -m pytest tests/ -q

# Run the server
make run
# or
python -m gateway.server

518 tests passing. CI runs on Python 3.10, 3.11, 3.12.

License

MIT


<p align="center"> <strong>The moat isn't the code.</strong> It's the comparative dataset —<br> the same prompt, multiple models, human judgment, repeated thousands of times. </p>

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选