SOLVRO MCP - Knowledge Graph RAG System

SOLVRO MCP - Knowledge Graph RAG System

Enables querying a Neo4j knowledge graph about Wroclaw University of Science and Technology using natural language. Converts user questions into Cypher queries and retrieves contextual information through an intelligent RAG pipeline with LLM-powered query routing.

Category
访问服务器

README

SOLVRO MCP - Knowledge Graph RAG System

MCP Server Diagram

A production-ready Model Context Protocol (MCP) server implementing a Retrieval-Augmented Generation (RAG) system with Neo4j graph database backend. The system intelligently converts natural language queries into Cypher queries, retrieves relevant information from a knowledge graph, and provides contextual answers about Wroclaw University of Science and Technology.

Table of Contents

Architecture Overview

The system consists of three main components:

  1. MCP Server - FastMCP-based server exposing knowledge graph tools
  2. MCP Client - CLI interface for querying the knowledge graph
  3. Data Pipeline - Multi-threaded ETL pipeline for loading documents into Neo4j

Data Flow

User Query → MCP Client → MCP Server → RAG System → Neo4j Graph DB
                                    ↓
                            Langfuse Observability
                                    ↓
                            LLM Processing → Response

Key Technologies

  • FastMCP: Model Context Protocol server implementation
  • LangChain: LLM orchestration and chaining
  • LangGraph: State machine for RAG pipeline
  • Neo4j: Graph database for knowledge storage
  • Langfuse: Observability and tracing
  • OpenAI/DeepSeek: LLM providers for query generation and answering

Features

Intelligent Query Routing

  • Guardrails System: Automatically determines if queries are relevant to the knowledge base
  • Fallback Mechanism: Returns graceful responses for out-of-scope queries
  • Session Tracking: Full trace tracking across the entire query lifecycle

Advanced RAG Pipeline

  • Multi-Stage Processing: Guardrails → Cypher Generation → Retrieval → Response
  • State Machine Architecture: Built with LangGraph for predictable execution flow
  • Error Handling: Robust error recovery with fallback strategies
  • Caching: Schema caching for improved performance

Dual LLM Strategy

  • Fast LLM (gpt-5-nano): Quick decision-making for guardrails
  • Accurate LLM (gpt-5-mini): Precise Cypher query generation

Observability

  • Langfuse Integration: Complete trace and session tracking
  • Mermaid Visualization: Graph flow visualization for debugging
  • Structured Logging: Comprehensive logging throughout the pipeline

Data Pipeline

  • Multi-threaded Processing: Configurable thread pool for parallel document processing
  • PDF Support: Extract and process PDF documents
  • Dynamic Graph Schema: Configurable nodes and relationships via JSON
  • Database Management: Built-in database clearing and initialization

Prerequisites

  • Python 3.12 or higher
  • Neo4j database instance (local or cloud)
  • OpenAI API key or DeepSeek API key
  • Langfuse account (for observability)
  • uv package manager (recommended) or pip

Installation

1. Clone the Repository

git clone <repository-url>
cd SOLVRO_MCP

2. Install Dependencies

Using uv (recommended):

uv sync

Using pip:

pip install -e .

3. Set Up Neo4j

Option A: Docker Compose

cd db
docker-compose up -d

Option B: Neo4j Desktop or Aura

Follow Neo4j's installation guide for your platform.

4. Configure Environment Variables

Create a .env file in the project root:

# LLM Provider (choose one)
OPENAI_API_KEY=your_openai_api_key
# OR
DEEPSEEK_API_KEY=your_deepseek_api_key

# CLARIN Polish LLM (optional, for Polish language support)
CLARIN_API_KEY=your_clarin_api_key

# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_neo4j_password

# Langfuse Observability
LANGFUSE_SECRET_KEY=your_langfuse_secret_key
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_HOST=https://cloud.langfuse.com

Configuration

Graph Schema Configuration

Edit graph_config.json to define your knowledge graph structure:

{
  "nodes": [
    {
      "name": "Student",
      "properties": ["name", "id", "year"]
    },
    {
      "name": "Course",
      "properties": ["title", "code", "credits"]
    }
  ],
  "relationships": [
    {
      "type": "ENROLLED_IN",
      "source": "Student",
      "target": "Course",
      "properties": ["semester", "grade"]
    }
  ]
}

Usage

Running the MCP Server

Start the FastMCP server on port 8005:

just mcp-server
# OR
uv run server

The server will initialize the RAG system and expose the knowledge_graph_tool.

Querying via CLI Client

Query the knowledge graph using natural language:

just mcp-client "Czym jest nagroda dziekana?"
# OR
uv run kg "What is the dean's award?"

Example queries:

# Polish queries
uv run kg "Jakie są wymagania dla stypendium rektora?"
uv run kg "Kiedy są terminy egzaminów?"

# English queries
uv run kg "What are the scholarship requirements?"
uv run kg "When are the exam dates?"

Data Pipeline

Load Documents into Neo4j

just pipeline
# OR
uv run pipeline

Clear Database and Reload

just pipeline-clear
# OR
uv run pipeline --clear-db

Manual Pipeline Execution

uv run python src/scripts/data_pipeline/main.py \
  data/ \
  graph_config.json \
  4 \
  --clear-db

Parameters:

  • data/ - Input directory containing PDF files
  • graph_config.json - Graph schema configuration
  • 4 - Number of parallel threads
  • --clear-db - (Optional) Clear database before loading

Project Structure

SOLVRO_MCP/
├── src/
│   ├── mcp_server/          # MCP server implementation
│   │   ├── server.py        # FastMCP server entry point
│   │   └── tools/
│   │       └── knowledge_graph/
│   │           ├── rag.py           # RAG system core logic
│   │           ├── state.py         # LangGraph state definitions
│   │           └── graph_visualizer.py  # Mermaid visualization
│   │
│   ├── mcp_client/          # CLI client
│   │   └── client.py        # Client implementation with Langfuse integration
│   │
│   └── scripts/
│       └── data_pipeline/   # ETL pipeline
│           ├── main.py      # Pipeline orchestrator
│           ├── data_pipe.py # Data processing logic
│           ├── llm_pipe.py  # LLM-based entity extraction
│           └── pdf_loader.py # PDF document loader
│
├── db/
│   └── docker-compose.yaml  # Neo4j container configuration
│
├── data/                    # Input documents directory
├── graph_config.json        # Graph schema definition
├── pyproject.toml          # Project dependencies and metadata
├── justfile                # Task runner configuration
└── README.md               # This file

Development

Code Quality

The project uses Ruff for linting and formatting:

just lint
# OR
uv run ruff format src
uv run ruff check src

Configuration

Ruff Settings (in pyproject.toml):

  • Line length: 100 characters
  • Target: Python 3.13
  • Selected rules: E, F, I, N, W

Adding New Tools

To add a new MCP tool:

  1. Create a new function in src/mcp_server/server.py
  2. Decorate with @mcp.tool
  3. Add documentation and type hints
  4. Update the README

Example:

@mcp.tool
async def new_tool(param: str) -> str:
    """
    Tool description.
    
    Args:
        param: Parameter description
    
    Returns:
        Result description
    """
    # Implementation
    return result

API Reference

MCP Server

knowledge_graph_tool

Query the knowledge graph with natural language.

Parameters:

  • user_input (str): User's question or query
  • trace_id (str, optional): Trace ID for observability

Returns:

  • str: JSON string containing retrieved context or "W bazie danych nie ma informacji"

Example:

result = await client.call_tool(
    "knowledge_graph_tool",
    {
        "user_input": "What are the scholarship requirements?",
        "trace_id": "unique-trace-id"
    }
)

RAG System

RAG.ainvoke()

Async method to query the RAG system.

Parameters:

  • message (str): User's question
  • session_id (str): Session identifier (default: "default")
  • trace_id (str): Trace identifier (default: "default")
  • callback_handler (CallbackHandler): Langfuse callback handler

Returns:

  • Dict[str, Any]: Dictionary containing:
    • answer (str): JSON context or "W bazie danych nie ma informacji"
    • metadata (dict):
      • guardrail_decision (str): Routing decision
      • cypher_query (str): Generated Cypher query
      • context (list): Retrieved data from Neo4j

Data Pipeline

DataPipe.load_data_from_directory()

Load and process PDF documents from a directory.

Parameters:

  • directory (str): Path to directory containing PDF files

Returns:

  • None (processes documents in-place)

Observability

Langfuse Integration

The system provides comprehensive observability through Langfuse:

  1. Session Tracking: All queries within a session are grouped together
  2. Trace Hierarchy: Multi-level traces showing:
    • Guardrails decision
    • Cypher generation
    • Neo4j retrieval
    • Final answer generation
  3. Metadata Tagging: Traces tagged with component identifiers
  4. Performance Metrics: Latency and token usage tracking

Viewing Traces

  1. Log in to your Langfuse dashboard
  2. Navigate to "Sessions" to see grouped queries
  3. Click on individual traces for detailed execution flow
  4. Use filters to search by tags: mcp_client, knowledge_graph, guardrails, etc.

Graph Visualization

The RAG system includes Mermaid graph visualization:

print(rag.visualizer.draw_mermaid())

This outputs a Mermaid diagram showing the state machine flow.

Error Handling

Common Issues

1. Connection Refused (Neo4j)

Error: Could not connect to Neo4j at bolt://localhost:7687

Solution: Ensure Neo4j is running:

cd db && docker-compose up -d

2. API Key Issues

Error: Missing required environment variables

Solution: Check your .env file contains all required keys.

3. Import Errors

ImportError: cannot import name 'langfuse_context'

Solution: This import is not available in standard Langfuse. Use session tracking through function parameters.

Performance Tuning

Thread Configuration

Adjust parallel processing threads in the pipeline:

uv run python src/scripts/data_pipeline/main.py data/ graph_config.json 8

Recommended thread counts:

  • CPU-bound: Number of CPU cores
  • I/O-bound: 2-4x CPU cores

Neo4j Optimization

  1. Indexing: Create indexes on frequently queried properties
  2. LIMIT Clauses: Pipeline automatically adds LIMIT to queries
  3. Connection Pooling: FastMCP handles connection management

LLM Configuration

Adjust model parameters in rag.py:

self.fast_llm = BaseChatOpenAI(
    model="gpt-5-nano",
    temperature=0.1,  # Lower = more deterministic
)

AI Coding Guidelines

For AI coding assistants and developers, see .github/agents.md for detailed coding guidelines and patterns.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/new-feature
  3. Follow the coding guidelines in .github/agents.md
  4. Make changes and ensure tests pass
  5. Run linting: just lint
  6. Commit changes: git commit -m "feat: Add new feature"
  7. Push to branch: git push origin feature/new-feature
  8. Open a Pull Request // ...existing code...

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/new-feature
  3. Make changes and ensure tests pass
  4. Run linting: just lint
  5. Commit changes: git commit -m "Add new feature"
  6. Push to branch: git push origin feature/new-feature
  7. Open a Pull Request

License

[Add your license here]

Acknowledgments

  • Built for Wroclaw University of Science and Technology
  • Powered by FastMCP, LangChain, and Neo4j
  • Observability by Langfuse

Support

For issues and questions:

  • Open an issue on GitHub
  • Contact the development team
  • Check the documentation at [link]

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选