MCP 服务器

RooCode-RAG-Lookup

Enables semantic search across documents and code repositories using RAG (Retrieval-Augmented Generation) with vector embeddings. Automatically indexes PDF documents and performs relevance-scored lookups through ChromaDB and sentence transformers.

README

RooCode-RAG-Lookup

RooCode MCP Server for performing RAG (Retrieval-Augmented Generation) lookups in documents and code repositories using vector embeddings and semantic search.

Example Usage

Ask a question: e.g. "What is the maximum number of entries* in a word document?" and prompt the LLM stating "use rag". The LLM is usally a decent judge of when it should use a tool or not and may decide to use the tool on its own.

*This is related to the maximum number of XML properties and elements addressable in Word

Features

Full RAG Implementation: Complete vector-based semantic search using ChromaDB and Haystack
Document Indexing: Automatic text extraction and chunking from PDF documents
Vector Embeddings: Sentence transformer embeddings for semantic similarity
RAG Lookup Tool: Search through documents and code repositories with relevance scoring
Test Tool: Simple hello world tool to verify MCP server connectivity
Async MCP Protocol: Full JSON-RPC 2.0 support via stdio

Installation

Install Python dependencies:

pip install -r requirements.txt

Configure RooCode to use this MCP server by adding the configuration from mcp_config.json to your RooCode settings.

Configuration

Add the mcp_config.json to your RooCode MCP server settings in the edit global settings part of MCP tools. If the tool is ready to use it will show a green status.
Set the following environment variables:
- RAG_LOOKUP_PATH: Path to this project directory
- PYTHON_PATH: Path to your Python executable
Configure parameters in parameters.py:
- EMBEDDING_MODEL: Sentence transformer model (default: all-mpnet-base-v2)
- COLLECTION_NAME: ChromaDB collection name
- CHUNK_SIZE: Text chunk size in words (default: 500)
- CHUNK_OVERLAP: Overlap between chunks (default: 50)
- DEFAULT_TOP_K: Number of results to return (default: 5)

Available Tools

1. `rag_lookup`

Perform semantic search using RAG in documents and code repositories. Returns relevant chunks with similarity scores and metadata.

Parameters:

query (required): The search query
source (optional): Where to search - "documents", "repos", or "both" (default: "both")

Returns:

Relevant text chunks with similarity scores
Source file information and metadata
Statistics on documents searched

Example:

{
  "query": "authentication implementation",
  "source": "both"
}

Response Format:

{
  "status": "success",
  "query": "authentication implementation",
  "results": [
    {
      "content": "...",
      "score": 0.85,
      "metadata": {
        "file_name": "document.txt",
        "source_file": "/path/to/document.txt"
      }
    }
  ],
  "metadata": {
    "documents_searched": 5,
    "repos_searched": 3,
    "total_matches": 5
  }
}

2. `say_hello`

Simple test tool that returns a greeting message with timestamp.

Parameters:

name (optional): Name to include in greeting (default: "World")

Example:

{
  "name": "RooCode"
}

Usage

1. Extract and Index Documents

Place PDF documents in the Documents/ or Repos/ folders, then run:

# Extract text from PDFs
python extraction/parse_pdf.py

# Populate the vector database
python extraction/populate_database.py

2. Query the RAG System

# Test RAG lookup directly
python query_rag.py

Or ask

3. Use via MCP Server

Once configured in RooCode, use the rag_lookup tool through the MCP interface. There is an MCP menu in RooCode settings editing the global settings will give you json settings to edit {"mcpServers":{}}, copy and paste the mcp_config.json into the global MCP settings.

Testing

Test the MCP server locally:

# Using MCP inspector
npx @modelcontextprotocol/inspector python mcp_tool.py

# Direct stdio test
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python mcp_tool.py

Project Structure

RooCode-RAG-Lookup/
├── mcp_tool.py                    # Main MCP server implementation
├── query_rag.py                   # RAG query functions
├── parameters.py                  # Configuration parameters
├── run_rag_lookup.bat             # Windows batch launcher
├── mcp_config.json                # Example RooCode configuration
├── requirements.txt               # Python dependencies
├── extraction/
│   ├── parse_pdf.py              # PDF text extraction
│   └── populate_database.py      # Database population and indexing
├── ExtractedText/                 # Extracted text files (.txt + .meta.json)
├── chroma_db/                     # ChromaDB vector database
└── README.md                      # This file

Technology Stack

MCP Python SDK: Protocol implementation for RooCode integration
Haystack: Document processing and RAG pipeline framework
ChromaDB: Vector database for embeddings storage
Sentence Transformers: Semantic embeddings (all-mpnet-base-v2)
PDFPlumber: PDF text extraction with layout preservation
Async/Await: Concurrent request handling
JSON-RPC 2.0: Communication protocol
Stdio Transport: RooCode integration

How It Works

Document Extraction: PDFs are parsed using parse_pdf.py which extracts text and metadata
Text Chunking: Documents are split into overlapping chunks using DocumentSplitter
Embedding Generation: Text chunks are converted to 768-dimensional vectors using sentence transformers
Vector Storage: Embeddings are stored in ChromaDB with metadata for retrieval
Semantic Search: Queries are embedded and matched against stored vectors using cosine similarity
Result Ranking: Top-K most relevant chunks are returned with scores and metadata