Agent Helper
Enables AI agents to process files locally — OCR images, extract text from PDFs and DOCX, and describe images using local vision models, all without sending data to external services.
README
Agent Helper
A local MCP server that gives AI agents file processing, web search, data analysis, contact lookup, local persistent memory, and system monitoring capabilities — all on your machine. OCR images, extract text from PDFs/DOCX/XLSX, detect objects with YOLOv8, describe scenes via Ollama, search the web, fetch web pages, look up people's contact info, save/recall/search memories, compress/extract archives, generate PDFs, run git operations, make HTTP requests, encrypt/decrypt data, take screenshots, query external databases, send emails, manage Docker, and more.
Architecture
┌──────────────────────┐
AI Agent (MCP) ───▶│ MCP Server :5021 │
(opencode, etc.) │ FastMCP / SSE │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ Orchestrator │
│ Routes + 50+ tools │
└──┬────┬────┬────┬────┘
│ │ │ │
┌────▼┐ ┌▼───┐┌▼───┐┌▼─────┐
│ OCR │ │PDF ││DOCX││ YOLO │
│Tesser│ │MuPDF││py- ││Obj │
│act │ │ ││docx││Detect│
└──────┘ └────┘└────┘└──────┘
Browser ─────▶ Management UI :5020
(FastAPI dashboard)
Features
| Feature | Description |
|---|---|
| OCR | Extract text from images via Tesseract |
| Object detection | YOLOv8 for detecting objects in images (CPU, ~6MB model) |
| Scene description | Optional Ollama LLaVA for image descriptions |
| PDF extraction | Text extraction from PDFs via PyMuPDF |
| DOCX extraction | Paragraph extraction from Word files |
| XLSX extraction | Cell values from Excel spreadsheets via openpyxl |
| Web search | DuckDuckGo search (free, no API key) |
| Web page fetch | Fetch URLs, extract readable text, detects iframe content |
| Batch fetch | Fetch multiple URLs in parallel in one call |
| Format converter | Auto-detect & convert JSON/YAML/CSV/XML with JMESPath queries |
| Diff | Compare two text blocks or URLs, return unified diff |
| RSS reader | Parse RSS/Atom feeds into structured entries |
| Summarization | Summarize text or URLs (Ollama or extractive fallback) |
| Date parsing | Natural language dates with timezone conversion |
| File system | List/search/read files within a configurable root path (read-only) |
| SQLite queries | Read-only SQL queries on .db files |
| Archive viewer | List .zip/.tar.gz contents (no extraction) |
| Chart generation | CSV/JSON data → bar/line/pie/scatter/histogram charts (base64 PNG) |
| System monitoring | Disk usage, memory info, running processes via psutil |
| Translation | Translate text via Ollama |
| Contact search | Look up people by name (phones, emails, profiles) or reverse phone lookup via DuckDuckGo + OSINT |
| Local memory | Persistent key-value store with SQLite + FTS5 full-text search. Save, recall, search, list, delete memories |
| Encode / Decode | Base64, UUID generation, MD5/SHA1/SHA256/SHA512 hashing |
| Compress / Extract | Create and extract zip/tar.gz archives with zip-slip protection |
| PDF generation | Generate PDF documents from text content via fpdf2 |
| WHOIS / DNS | Domain WHOIS lookup and DNS record queries (A, AAAA, MX, NS, CNAME, TXT, etc.) |
| HTTP requests | Full HTTP client with SSRF protection (blocks private IPs) |
| Git operations | Whitelisted git commands (status, log, diff, commit, push, pull, clone, etc.) |
| File read/write | Read, write, append, delete, list files (jailed to app root directory) |
| Database queries | Read-only PostgreSQL/MySQL/MSSQL/SQLite queries via SQLAlchemy |
| Send email | SMTP client for sending emails |
| Docker management | Docker ps, images, pull, run, exec, logs, inspect, stop, rm, etc. (dangerous flags blocked) |
| Encryption | Fernet (AES) encrypt/decrypt with password-derived keys |
| Screenshots | Capture webpage screenshots via Playwright (SSRF-guarded) |
| Video search | Search for videos via DuckDuckGo (YouTube, Vimeo, etc.) with duration, view counts, thumbnails |
| Social media lookup | Scrape Twitter/X profiles and tweets via Nitter — no API keys or login required |
| API key auth | Bearer token authentication for MCP clients, managed via web UI |
| Management dashboard | Web UI at port 5020 for settings, keys, tool toggles, job history, live logs |
| Tool management | Enable/disable individual MCP tools from the dashboard |
| Job history | Results cached to disk, viewable in dashboard |
| Live logs | Poll-based log stream (no WebSocket needed) |
Requirements
- Python 3.10+
- Tesseract OCR (system package):
sudo apt install tesseract-ocr # Debian/Ubuntu brew install tesseract # macOS - Ollama (optional, for vision & translation):
curl -fsSL https://ollama.com/install.sh | sh ollama pull llava
Quick start
git clone https://github.com/wajirasls/agent_helper.git
cd agent_helper
# One-shot setup:
chmod +x start.sh
./start.sh
# Or manually:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py
Open http://127.0.0.1:5020 in your browser.
Systemd service (auto-start on boot)
sudo cp agent-helper.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable agent-helper
sudo systemctl start agent-helper
Ports
| Port | Service | Access |
|---|---|---|
| 5020 | Management UI (FastAPI) | http://127.0.0.1:5020 |
| 5021 | MCP Server (SSE) | http://0.0.0.0:5021/sse |
Management dashboard
Visit http://127.0.0.1:5020:
- MCP Server — Start, stop, restart the MCP server
- Vision Backend — Toggle between OCR only / Ollama LLaVA
- Object Detection — Enable/disable YOLOv8 with confidence threshold
- API Keys — Create and revoke keys for MCP clients
- MCP Tools — Enable/disable individual tools
- File System — Configure FS root path (read-only, relative to app dir)
- Processing Folders — Browse
Processing/subfolders - Job History — View past processing jobs
- Health Panel — Check Tesseract, Ollama, YOLOv8 status
- Live Logs — Scrollable log stream
All MCP tools (50+ total)
File Processing (Processing/ folder)
| Tool | Parameters | Description |
|---|---|---|
process_folder |
folder_name |
Process all files in Processing/{folder}/. Creates folder if not found. |
process_file |
folder_name, filename |
Process a single file in an existing folder. |
list_folders |
— | List subfolders in Processing/. |
list_files |
folder_name |
List files in a subfolder. |
detect_objects_in_image |
folder_name, filename |
Run YOLOv8 detection on one image. |
Direct Analysis (pass data inline, no staging needed)
| Tool | Parameters | Description |
|---|---|---|
analyze_image |
image_data (base64), filename |
OCR + object detection on base64 image. |
analyze_image_url |
url |
Download image from URL → OCR + detection. |
analyze_file |
file_data (base64), filename |
Analyze any supported file (PDF, DOCX, XLSX, image, text). |
analyze_file_url |
url |
Download file from URL → analyze. |
Web & Search
| Tool | Parameters | Description |
|---|---|---|
fetch_webpage |
url |
Fetch URL, extract readable text, detect iframes. |
web_search |
query, max_results |
DuckDuckGo search (free, no API key). |
batch_fetch |
urls (list) |
Fetch multiple URLs in parallel, one call. |
read_feed |
url, max_entries |
Parse RSS/Atom feed into structured entries. |
http_request |
url, method, headers, body, timeout |
Full HTTP client (GET/POST/PUT/DELETE). SSRF-guarded. |
video_search |
query, max_results |
Search for videos via DuckDuckGo (YouTube, Vimeo, etc.). Returns title, URL, duration, view count, thumbnail. |
Data & Text
| Tool | Parameters | Description |
|---|---|---|
convert_format |
data, from_format, to_format, query |
Auto-detect JSON/YAML/CSV/XML, convert, JMESPath query. |
diff_text |
text_a, text_b, context_lines |
Unified diff of two text blocks. |
diff_urls |
url_a, url_b, context_lines |
Fetch two URLs and diff their content. |
summarize_text |
text, max_sentences |
Summarize text (Ollama or extractive TextRank). |
summarize_url |
url, max_sentences |
Fetch URL + summarize in one step. |
parse_date |
text, from_tz, to_tz |
Parse natural language dates, timezone conversion. |
translate_text |
text, source_lang, target_lang |
Translate text via Ollama. |
encode_decode |
operation, data, text, encoding |
Base64, UUID, MD5/SHA1/SHA256/SHA512 hashing. |
generate_pdf |
text, filename, title |
Generate PDF from text. |
Contact Info & OSINT
| Tool | Parameters | Description |
|---|---|---|
contact_search |
query, domain, country, max_results |
Look up people by name (phones, emails, social profiles) or reverse phone lookup. |
whois_lookup |
domain |
WHOIS lookup for domain registration data. |
dns_lookup |
hostname, record_type |
DNS queries (A, AAAA, MX, NS, CNAME, TXT, SOA, SRV). |
Social Media (no API keys required)
| Tool | Parameters | Description |
|---|---|---|
social_lookup |
platform, username, query, nitter_instance, limit |
Look up Twitter/X profiles or search tweets via Nitter public instances. Returns name, bio, follower stats, recent tweets. No API key or login needed. |
Local Memory (persistent, SQLite + FTS5)
| Tool | Parameters | Description |
|---|---|---|
memory_save |
key, content, tags |
Save/update a memory with key and tags (upsert). |
memory_recall |
key |
Retrieve by exact key. |
memory_search |
query, tags, limit |
Full-text search across all memories (FTS5 ranked). |
memory_list |
tags, limit |
Browse memories, filterable by comma-separated tags. |
memory_delete |
key |
Delete a memory. |
memory_stats |
— | Total count, unique tags, newest/oldest entries. |
File System
| Tool | Parameters | Description |
|---|---|---|
fs_list_directory |
path, pattern (glob) |
List directory contents with metadata. |
fs_find_files |
pattern, path |
Recursive glob search. |
fs_read_text_file |
path, offset, limit |
Read text file with line range (max 500). |
fs_query_sqlite |
db_path, query, params |
Read-only SQLite query (1000 row limit). |
fs_list_archive |
path |
List .zip/.tar.gz contents. |
read_write_file |
path, content, mode, encoding |
Read/write/append/delete/list files (jailed to app root). |
Compression
| Tool | Parameters | Description |
|---|---|---|
compress_files |
paths, archive_name, format |
Create zip/tar.gz archives. Path-traversal protected. |
extract_archive |
archive_path, output_dir, password |
Extract zip/tar.gz with zip-slip protection. |
Git
| Tool | Parameters | Description |
|---|---|---|
git_operation |
operation, args, repo_path |
Whitelisted git commands: status, log, diff, commit, push, pull, clone, branch, etc. |
Docker
| Tool | Parameters | Description |
|---|---|---|
docker_exec |
action, image, command, args, timeout |
Docker ps, images, pull, run, exec, logs, inspect, stop, rm, stats, etc. Dangerous flags blocked. |
Email & Database
| Tool | Parameters | Description |
|---|---|---|
send_email |
recipient, subject, body, smtp_* |
Send email via SMTP. Requires dashboard configuration. |
database_query |
connection_string, query, params, max_rows |
Read-only SQL on PostgreSQL/MySQL/MSSQL/SQLite. |
Encryption
| Tool | Parameters | Description |
|---|---|---|
crypto_encrypt |
text, password, algorithm |
Encrypt text with Fernet (AES) and password-derived key. |
crypto_decrypt |
encrypted_data, password, algorithm, is_base64 |
Decrypt Fernet-encrypted data. |
Screenshot
| Tool | Parameters | Description |
|---|---|---|
screenshot |
url, full_page, width, height |
Capture webpage screenshot (base64 PNG). SSRF-guarded. |
Charts & System
| Tool | Parameters | Description |
|---|---|---|
create_chart |
data, chart_type, x_column, y_column, title |
CSV/JSON → bar/line/pie/scatter/histogram chart (base64 PNG). |
system_disk_usage |
— | Disk usage per mount point. |
system_memory_info |
— | RAM + swap usage. |
system_processes |
limit |
Top processes by CPU usage. |
opencode configuration
Add to your opencode.json or ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"agent_helper": {
"type": "remote",
"url": "http://localhost:5021/sse",
"headers": {
"Authorization": "Bearer <your-api-key>"
},
"enabled": true
}
}
}
File processing support
| Extension | Processor | Output |
|---|---|---|
.jpg, .png, .webp, .bmp, .tiff |
Tesseract OCR + YOLOv8 + optional vision | OCR text + detected objects + scene description |
.pdf |
PyMuPDF | Extracted text per page |
.docx |
python-docx | Extracted paragraphs |
.xlsx, .xls |
openpyxl | Cell values per sheet |
.txt, .md, .csv, .json, .xml |
Direct read | Raw file content |
Vision backends
| Mode | Backend | Notes |
|---|---|---|
ocr (default) |
Tesseract + YOLOv8 | No external service needed |
ollama |
LLaVA via Ollama | Adds scene descriptions; requires ollama serve + ollama pull llava |
Project structure
agent_helper/
├── config.py # Settings management (persisted to JSON)
├── logger.py # Ring buffer logger (500 lines, polled by UI)
├── auth.py # API key management (SHA-256 hashed)
├── main.py # Entry point
├── mcp_server.py # FastMCP server on port 5021 (50+ tools)
├── processor_orchestrator.py # All tool implementations
├── processors/
│ ├── image.py # Tesseract OCR
│ ├── vision.py # Ollama LLaVA image description
│ ├── pdf.py # PyMuPDF text extraction
│ ├── docx.py # python-docx parsing
│ ├── excel.py # openpyxl XLSX parsing
│ ├── detection.py # YOLOv8 object detection
│ ├── fs_tools.py # Read-only FS operations (path-safe)
│ ├── contact_search.py # Name/phone lookup via DDG + OSINT
│ ├── local_memory.py # SQLite + FTS5 persistent memory
│ └── social_lookup.py # Twitter/X profile scraping via Nitter
├── management_ui/
│ ├── app.py # FastAPI dashboard on port 5020
│ └── templates/
│ └── dashboard.html # HTMX dark-theme dashboard
├── Processing/ # Watch folder (created on first run)
├── local_memory/ # SQLite memory database (auto-created)
├── data/ # Settings & API keys (persisted)
├── logs/ # Log output
├── requirements.txt
├── start.sh
└── agent-helper.service # systemd user service
Security
- API key auth: All MCP tool calls require a Bearer token for
/messages/endpoints. Keys managed via dashboard. - SSRF protection:
http_requestandscreenshottools block private/internal IP addresses (127.0.0.1, 10.x, 172.x, 192.168.x, 169.254.x, localhost). - Path traversal: All file operations resolve paths against the allowed root and reject
..traversal. - Git whitelist: Only pre-approved git commands allowed (status, log, diff, commit, push, pull, etc.). No generic shell execution.
- Docker restrictions: Dangerous flags (
--privileged,--pid=host,--network=host,--cap-add=ALL) are blocked. - Database read-only: External DB queries enforce SELECT/WITH/PRAGMA only. Parameterized queries prevent injection.
- Zip-slip protection: Archive extraction validates all paths against the output directory.
- Memory validation: Keys and tags are validated with strict regex patterns. All SQL parameterized.
- Local-only UI: Management dashboard binds to
127.0.0.1. - Tool management: Any tool can be disabled via the dashboard — risky ones ship disabled by default.
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。