MCP 服务器

Calibre RAG MCP Server

Enables semantic search and contextual conversations with your Calibre ebook library using vector-based RAG technology. Supports project-based organization, multi-format book processing, and OCR capabilities for enhanced content extraction and retrieval.

README

Calibre RAG MCP Server

Enhanced Calibre MCP server with RAG (Retrieval-Augmented Generation) capabilities for project-based vector search and contextual conversations.

Features

RAG-Enhanced Search: Vector-based semantic search using FAISS and Transformers
Project-Based Organization: Create isolated vector search projects for different contexts
Multi-Format Support: Process books in various formats (EPUB, PDF, MOBI, etc.)
OCR Capabilities: Extract text from images and scanned PDFs using Tesseract
Advanced Text Processing: Natural language processing for better content understanding
Windows Compatible: Designed specifically for Windows environments

Technologies Used

Vector Search: FAISS for efficient similarity search
Embeddings: Xenova Transformers for local embedding generation
OCR: Tesseract for optical character recognition
PDF Processing: Multiple PDF parsing libraries (pdf-parse, pdf-poppler, pdf2pic)
Image Processing: Sharp for image manipulation
NLP: Natural language processing with multiple libraries

Prerequisites

Node.js >= 16.0.0
Calibre installed on Windows
ImageMagick (for enhanced image processing)
Tesseract OCR (for text extraction from images)

Installation

Clone this repository:

git clone https://github.com/yourusername/calibre-rag-mcp-nodejs.git
cd calibre-rag-mcp-nodejs

Install dependencies:

npm install

Run setup (Windows):

setup.bat

Configuration

The server automatically detects your Calibre library location. For custom configurations, modify the settings in server.js.

Usage

Starting the Server

npm start

Available Tools

search: Semantic search across your ebook library
fetch: Retrieve specific content from books
list_projects: List all RAG projects
create_project: Create a new RAG project
add_books_to_project: Add books to a project for vectorization
search_project_context: Search within specific projects

Example MCP Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "calibre-rag": {
      "command": "node",
      "args": ["path/to/calibre-rag-mcp-nodejs/server.js"]
    }
  }
}

Project Structure

calibre-rag-mcp-nodejs/
├── server.js              # Main MCP server
├── package.json           # Dependencies and scripts
├── setup.bat              # Windows setup script
├── test-*.js              # Various test files
├── projects/              # RAG projects storage
├── CONFIG.md              # Configuration documentation
├── USAGE_EXAMPLES.md      # Usage examples
└── QUICK_TEST.md          # Quick testing guide

Testing

Run the test suite:

npm test

Individual test files:

test-enhanced-server.js - Enhanced server functionality
test-ocr-full.js - OCR capabilities
test-pdf-approaches.js - PDF processing
test-enhanced-auto.js - Automated testing

Documentation

Requirements

System Requirements

Windows 10/11
Node.js 16+
Calibre installed
At least 4GB RAM (8GB+ recommended for large libraries)

Optional Dependencies

ImageMagick (for enhanced image processing)
Tesseract OCR (for text extraction from scanned documents)

Troubleshooting

Common Issues

FAISS Installation: If FAISS fails to install, ensure you have proper build tools
Tesseract Not Found: Install Tesseract and add to PATH
Memory Issues: Reduce batch sizes for large document processing