MCP 服务器

eFax to JSON MCP Server

Converts eFax documents (PDF, TIFF, CCD XML) from OpenText Fax Server Software into structured JSON format with OCR support, metadata extraction, and batch processing capabilities.

README

eFax to JSON MCP Server

A Model Context Protocol (MCP) server that converts eFax documents from OpenText Fax Server Software into structured JSON format. Supports PDF, TIFF, and CCD XML document formats with advanced OCR and metadata extraction capabilities.

Features

Supported Formats

PDF Documents - Text extraction and OCR for scanned PDFs
TIFF Images - Multi-page TIFF support with OCR processing
CCD XML - Clinical Document Architecture parsing

Processing Capabilities

Intelligent OCR - Tesseract-based text recognition with confidence scoring
Metadata Extraction - Preserve document properties and fax information
Batch Processing - Convert multiple documents simultaneously
Format Validation - Comprehensive document structure validation
Error Recovery - Robust error handling with detailed reporting

Installation

Prerequisites

Node.js 18+
System-level Tesseract OCR installation:
- Ubuntu/Debian: sudo apt-get install tesseract-ocr
- macOS: brew install tesseract
- Windows: Download from UB Mannheim releases

Setup Steps

Create project directory

mkdir efax-mcp-server
cd efax-mcp-server

Initialize and install dependencies

npm init -y
npm install @modelcontextprotocol/sdk pdf-parse sharp tesseract.js xml2js
npm install -D @types/node @types/pdf-parse @types/xml2js typescript ts-node

Create directory structure

mkdir -p src/{types,processors,utils}
mkdir -p tests/test-files
mkdir -p docs

Add source files (paste the provided code into respective files)
Build the project
```
npm run build
```

Usage

MCP Client Configuration

Add to your MCP client configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "efax-converter": {
      "command": "node",
      "args": ["/path/to/efax-mcp-server/dist/server.js"]
    }
  }
}

Available Tools

1. Convert Single Document

convert_efax_document --filePath "/path/to/document.pdf" --performOCR true

Parameters:

filePath (required) - Path to eFax document
outputPath (optional) - Custom output JSON path
extractMetadata (default: true) - Extract document metadata
performOCR (default: true) - Enable OCR processing
ocrLanguage (default: "eng") - OCR language code
includeRawData (default: false) - Include raw document data

2. Batch Convert Documents

batch_convert_efax --inputDirectory "/path/to/docs" --outputDirectory "/path/to/json"

Parameters:

inputDirectory (required) - Source document directory
outputDirectory (required) - JSON output directory
filePattern (default: "*") - File matching pattern
continueOnError (default: true) - Continue on individual failures

3. Validate JSON Output

validate_efax_json --jsonPath "/path/to/output.json"

4. Get File Information

get_file_info --filePath "/path/to/document.pdf"

5. List Supported Formats

list_supported_formats

JSON Output Structure

{
  "id": "efax_document_1234567890_abc123",
  "source": "efax",
  "format": "pdf|tiff|ccd_xml",
  "timestamp": "2025-08-04T12:00:00.000Z",
  "metadata": {
    "originalFileName": "fax_document.pdf",
    "fileSize": 2048576,
    "pages": 3,
    "sender": "John Doe",
    "recipient": "Jane Smith",
    "faxNumber": "+1-555-123-4567",
    "resolution": "1200x1800",
    "ocrConfidence": 95.5,
    "processingTime": 3500
  },
  "content": {
    "text": "Full extracted text content...",
    "pages": [
      {
        "pageNumber": 1,
        "text": "Page 1 text content...",
        "confidence": 96.2,
        "metadata": {
          "width": 1200,
          "height": 1800,
          "resolution": "1200x1800"
        }
      }
    ],
    "sections": [
      {
        "title": "Patient Information",
        "content": "Patient details...",
        "type": "patient",
        "pageNumbers": [1]
      }
    ]
  },
  "rawData": {
    "pdfInfo": {},
    "imageMetadata": {}
  }
}

Architecture

Modular Design

Processors: Format-specific conversion logic
Utilities: Shared validation and file handling
Types: Comprehensive TypeScript definitions

Processing Pipeline

File Validation - Format and size checks
Format Detection - Automatic type identification
Content Extraction - Text and metadata processing
OCR Processing - Image-to-text conversion when needed
Structure Validation - Output quality assurance
JSON Serialization - Standardized output format

Development

Build Commands

npm run build     # Compile TypeScript
npm run dev       # Development mode with hot reload
npm run test      # Run test suite
npm run clean     # Clean build directory