Agent Helper

Agent Helper

Enables AI agents to process files locally — OCR images, extract text from PDFs and DOCX, and describe images using local vision models, all without sending data to external services.

Category
访问服务器

README

Agent Helper

A local MCP server that gives AI agents file processing, web search, data analysis, contact lookup, local persistent memory, and system monitoring capabilities — all on your machine. OCR images, extract text from PDFs/DOCX/XLSX, detect objects with YOLOv8, describe scenes via Ollama, search the web, fetch web pages, look up people's contact info, save/recall/search memories, compress/extract archives, generate PDFs, run git operations, make HTTP requests, encrypt/decrypt data, take screenshots, query external databases, send emails, manage Docker, and more.

Architecture

                     ┌──────────────────────┐
  AI Agent (MCP) ───▶│  MCP Server :5021    │
  (opencode, etc.)   │  FastMCP / SSE       │
                     └──────────┬───────────┘
                                │
                      ┌──────────▼───────────┐
                      │  Orchestrator         │
                      │  Routes + 50+ tools   │
                      └──┬────┬────┬────┬────┘
                        │    │    │    │
                   ┌────▼┐ ┌▼───┐┌▼───┐┌▼─────┐
                   │ OCR  │ │PDF ││DOCX││ YOLO │
                   │Tesser│ │MuPDF││py- ││Obj   │
                   │act   │ │    ││docx││Detect│
                   └──────┘ └────┘└────┘└──────┘

  Browser ─────▶ Management UI :5020
                 (FastAPI dashboard)

Features

Feature Description
OCR Extract text from images via Tesseract
Object detection YOLOv8 for detecting objects in images (CPU, ~6MB model)
Scene description Optional Ollama LLaVA for image descriptions
PDF extraction Text extraction from PDFs via PyMuPDF
DOCX extraction Paragraph extraction from Word files
XLSX extraction Cell values from Excel spreadsheets via openpyxl
Web search DuckDuckGo search (free, no API key)
Web page fetch Fetch URLs, extract readable text, detects iframe content
Batch fetch Fetch multiple URLs in parallel in one call
Format converter Auto-detect & convert JSON/YAML/CSV/XML with JMESPath queries
Diff Compare two text blocks or URLs, return unified diff
RSS reader Parse RSS/Atom feeds into structured entries
Summarization Summarize text or URLs (Ollama or extractive fallback)
Date parsing Natural language dates with timezone conversion
File system List/search/read files within a configurable root path (read-only)
SQLite queries Read-only SQL queries on .db files
Archive viewer List .zip/.tar.gz contents (no extraction)
Chart generation CSV/JSON data → bar/line/pie/scatter/histogram charts (base64 PNG)
System monitoring Disk usage, memory info, running processes via psutil
Translation Translate text via Ollama
Contact search Look up people by name (phones, emails, profiles) or reverse phone lookup via DuckDuckGo + OSINT
Local memory Persistent key-value store with SQLite + FTS5 full-text search. Save, recall, search, list, delete memories
Encode / Decode Base64, UUID generation, MD5/SHA1/SHA256/SHA512 hashing
Compress / Extract Create and extract zip/tar.gz archives with zip-slip protection
PDF generation Generate PDF documents from text content via fpdf2
WHOIS / DNS Domain WHOIS lookup and DNS record queries (A, AAAA, MX, NS, CNAME, TXT, etc.)
HTTP requests Full HTTP client with SSRF protection (blocks private IPs)
Git operations Whitelisted git commands (status, log, diff, commit, push, pull, clone, etc.)
File read/write Read, write, append, delete, list files (jailed to app root directory)
Database queries Read-only PostgreSQL/MySQL/MSSQL/SQLite queries via SQLAlchemy
Send email SMTP client for sending emails
Docker management Docker ps, images, pull, run, exec, logs, inspect, stop, rm, etc. (dangerous flags blocked)
Encryption Fernet (AES) encrypt/decrypt with password-derived keys
Screenshots Capture webpage screenshots via Playwright (SSRF-guarded)
Video search Search for videos via DuckDuckGo (YouTube, Vimeo, etc.) with duration, view counts, thumbnails
Social media lookup Scrape Twitter/X profiles and tweets via Nitter — no API keys or login required
API key auth Bearer token authentication for MCP clients, managed via web UI
Management dashboard Web UI at port 5020 for settings, keys, tool toggles, job history, live logs
Tool management Enable/disable individual MCP tools from the dashboard
Job history Results cached to disk, viewable in dashboard
Live logs Poll-based log stream (no WebSocket needed)

Requirements

  • Python 3.10+
  • Tesseract OCR (system package):
    sudo apt install tesseract-ocr   # Debian/Ubuntu
    brew install tesseract           # macOS
    
  • Ollama (optional, for vision & translation):
    curl -fsSL https://ollama.com/install.sh | sh
    ollama pull llava
    

Quick start

git clone https://github.com/wajirasls/agent_helper.git
cd agent_helper

# One-shot setup:
chmod +x start.sh
./start.sh

# Or manually:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py

Open http://127.0.0.1:5020 in your browser.

Systemd service (auto-start on boot)

sudo cp agent-helper.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable agent-helper
sudo systemctl start agent-helper

Ports

Port Service Access
5020 Management UI (FastAPI) http://127.0.0.1:5020
5021 MCP Server (SSE) http://0.0.0.0:5021/sse

Management dashboard

Visit http://127.0.0.1:5020:

  • MCP Server — Start, stop, restart the MCP server
  • Vision Backend — Toggle between OCR only / Ollama LLaVA
  • Object Detection — Enable/disable YOLOv8 with confidence threshold
  • API Keys — Create and revoke keys for MCP clients
  • MCP Tools — Enable/disable individual tools
  • File System — Configure FS root path (read-only, relative to app dir)
  • Processing Folders — Browse Processing/ subfolders
  • Job History — View past processing jobs
  • Health Panel — Check Tesseract, Ollama, YOLOv8 status
  • Live Logs — Scrollable log stream

All MCP tools (50+ total)

File Processing (Processing/ folder)

Tool Parameters Description
process_folder folder_name Process all files in Processing/{folder}/. Creates folder if not found.
process_file folder_name, filename Process a single file in an existing folder.
list_folders List subfolders in Processing/.
list_files folder_name List files in a subfolder.
detect_objects_in_image folder_name, filename Run YOLOv8 detection on one image.

Direct Analysis (pass data inline, no staging needed)

Tool Parameters Description
analyze_image image_data (base64), filename OCR + object detection on base64 image.
analyze_image_url url Download image from URL → OCR + detection.
analyze_file file_data (base64), filename Analyze any supported file (PDF, DOCX, XLSX, image, text).
analyze_file_url url Download file from URL → analyze.

Web & Search

Tool Parameters Description
fetch_webpage url Fetch URL, extract readable text, detect iframes.
web_search query, max_results DuckDuckGo search (free, no API key).
batch_fetch urls (list) Fetch multiple URLs in parallel, one call.
read_feed url, max_entries Parse RSS/Atom feed into structured entries.
http_request url, method, headers, body, timeout Full HTTP client (GET/POST/PUT/DELETE). SSRF-guarded.
video_search query, max_results Search for videos via DuckDuckGo (YouTube, Vimeo, etc.). Returns title, URL, duration, view count, thumbnail.

Data & Text

Tool Parameters Description
convert_format data, from_format, to_format, query Auto-detect JSON/YAML/CSV/XML, convert, JMESPath query.
diff_text text_a, text_b, context_lines Unified diff of two text blocks.
diff_urls url_a, url_b, context_lines Fetch two URLs and diff their content.
summarize_text text, max_sentences Summarize text (Ollama or extractive TextRank).
summarize_url url, max_sentences Fetch URL + summarize in one step.
parse_date text, from_tz, to_tz Parse natural language dates, timezone conversion.
translate_text text, source_lang, target_lang Translate text via Ollama.
encode_decode operation, data, text, encoding Base64, UUID, MD5/SHA1/SHA256/SHA512 hashing.
generate_pdf text, filename, title Generate PDF from text.

Contact Info & OSINT

Tool Parameters Description
contact_search query, domain, country, max_results Look up people by name (phones, emails, social profiles) or reverse phone lookup.
whois_lookup domain WHOIS lookup for domain registration data.
dns_lookup hostname, record_type DNS queries (A, AAAA, MX, NS, CNAME, TXT, SOA, SRV).

Social Media (no API keys required)

Tool Parameters Description
social_lookup platform, username, query, nitter_instance, limit Look up Twitter/X profiles or search tweets via Nitter public instances. Returns name, bio, follower stats, recent tweets. No API key or login needed.

Local Memory (persistent, SQLite + FTS5)

Tool Parameters Description
memory_save key, content, tags Save/update a memory with key and tags (upsert).
memory_recall key Retrieve by exact key.
memory_search query, tags, limit Full-text search across all memories (FTS5 ranked).
memory_list tags, limit Browse memories, filterable by comma-separated tags.
memory_delete key Delete a memory.
memory_stats Total count, unique tags, newest/oldest entries.

File System

Tool Parameters Description
fs_list_directory path, pattern (glob) List directory contents with metadata.
fs_find_files pattern, path Recursive glob search.
fs_read_text_file path, offset, limit Read text file with line range (max 500).
fs_query_sqlite db_path, query, params Read-only SQLite query (1000 row limit).
fs_list_archive path List .zip/.tar.gz contents.
read_write_file path, content, mode, encoding Read/write/append/delete/list files (jailed to app root).

Compression

Tool Parameters Description
compress_files paths, archive_name, format Create zip/tar.gz archives. Path-traversal protected.
extract_archive archive_path, output_dir, password Extract zip/tar.gz with zip-slip protection.

Git

Tool Parameters Description
git_operation operation, args, repo_path Whitelisted git commands: status, log, diff, commit, push, pull, clone, branch, etc.

Docker

Tool Parameters Description
docker_exec action, image, command, args, timeout Docker ps, images, pull, run, exec, logs, inspect, stop, rm, stats, etc. Dangerous flags blocked.

Email & Database

Tool Parameters Description
send_email recipient, subject, body, smtp_* Send email via SMTP. Requires dashboard configuration.
database_query connection_string, query, params, max_rows Read-only SQL on PostgreSQL/MySQL/MSSQL/SQLite.

Encryption

Tool Parameters Description
crypto_encrypt text, password, algorithm Encrypt text with Fernet (AES) and password-derived key.
crypto_decrypt encrypted_data, password, algorithm, is_base64 Decrypt Fernet-encrypted data.

Screenshot

Tool Parameters Description
screenshot url, full_page, width, height Capture webpage screenshot (base64 PNG). SSRF-guarded.

Charts & System

Tool Parameters Description
create_chart data, chart_type, x_column, y_column, title CSV/JSON → bar/line/pie/scatter/histogram chart (base64 PNG).
system_disk_usage Disk usage per mount point.
system_memory_info RAM + swap usage.
system_processes limit Top processes by CPU usage.

opencode configuration

Add to your opencode.json or ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "agent_helper": {
      "type": "remote",
      "url": "http://localhost:5021/sse",
      "headers": {
        "Authorization": "Bearer <your-api-key>"
      },
      "enabled": true
    }
  }
}

File processing support

Extension Processor Output
.jpg, .png, .webp, .bmp, .tiff Tesseract OCR + YOLOv8 + optional vision OCR text + detected objects + scene description
.pdf PyMuPDF Extracted text per page
.docx python-docx Extracted paragraphs
.xlsx, .xls openpyxl Cell values per sheet
.txt, .md, .csv, .json, .xml Direct read Raw file content

Vision backends

Mode Backend Notes
ocr (default) Tesseract + YOLOv8 No external service needed
ollama LLaVA via Ollama Adds scene descriptions; requires ollama serve + ollama pull llava

Project structure

agent_helper/
├── config.py                 # Settings management (persisted to JSON)
├── logger.py                 # Ring buffer logger (500 lines, polled by UI)
├── auth.py                   # API key management (SHA-256 hashed)
├── main.py                   # Entry point
├── mcp_server.py             # FastMCP server on port 5021 (50+ tools)
├── processor_orchestrator.py # All tool implementations
├── processors/
│   ├── image.py              # Tesseract OCR
│   ├── vision.py             # Ollama LLaVA image description
│   ├── pdf.py                # PyMuPDF text extraction
│   ├── docx.py               # python-docx parsing
│   ├── excel.py              # openpyxl XLSX parsing
│   ├── detection.py          # YOLOv8 object detection
│   ├── fs_tools.py           # Read-only FS operations (path-safe)
│   ├── contact_search.py     # Name/phone lookup via DDG + OSINT
│   ├── local_memory.py       # SQLite + FTS5 persistent memory
│   └── social_lookup.py      # Twitter/X profile scraping via Nitter
├── management_ui/
│   ├── app.py                # FastAPI dashboard on port 5020
│   └── templates/
│       └── dashboard.html    # HTMX dark-theme dashboard
├── Processing/               # Watch folder (created on first run)
├── local_memory/             # SQLite memory database (auto-created)
├── data/                     # Settings & API keys (persisted)
├── logs/                     # Log output
├── requirements.txt
├── start.sh
└── agent-helper.service      # systemd user service

Security

  • API key auth: All MCP tool calls require a Bearer token for /messages/ endpoints. Keys managed via dashboard.
  • SSRF protection: http_request and screenshot tools block private/internal IP addresses (127.0.0.1, 10.x, 172.x, 192.168.x, 169.254.x, localhost).
  • Path traversal: All file operations resolve paths against the allowed root and reject .. traversal.
  • Git whitelist: Only pre-approved git commands allowed (status, log, diff, commit, push, pull, etc.). No generic shell execution.
  • Docker restrictions: Dangerous flags (--privileged, --pid=host, --network=host, --cap-add=ALL) are blocked.
  • Database read-only: External DB queries enforce SELECT/WITH/PRAGMA only. Parameterized queries prevent injection.
  • Zip-slip protection: Archive extraction validates all paths against the output directory.
  • Memory validation: Keys and tags are validated with strict regex patterns. All SQL parameterized.
  • Local-only UI: Management dashboard binds to 127.0.0.1.
  • Tool management: Any tool can be disabled via the dashboard — risky ones ship disabled by default.

License

MIT

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选