MCP-based Knowledge Graph Construction System
An automated system that processes raw text through a three-stage pipeline to assess data quality, enhance information, and generate structured knowledge graphs. It provides tools for triple extraction with confidence scoring and creates interactive HTML visualizations of the resulting graph.
README
MCP-based Knowledge Graph Construction System
A fully automated knowledge graph construction system built on the Model Context Protocol (MCP), implementing a sophisticated 3-stage data processing pipeline for intelligent knowledge extraction and graph generation.
Overview
This project implements an advanced knowledge graph construction system that automatically processes raw text data through three intelligent stages:
- Data Quality Assessment - Evaluates completeness, consistency, and relevance
- Knowledge Completion - Enhances low-quality data using LLM and external knowledge bases
- Knowledge Graph Construction - Builds structured knowledge graphs with confidence scoring
The system is built on the MCP (Model Context Protocol) architecture, providing a clean client-server interface for seamless integration and scalability.
Key Features
Fully Automated Processing
- Zero Manual Intervention: Automatically detects data quality and processing needs
- Intelligent Pipeline: Adapts processing strategy based on input data characteristics
- Real-time Processing: Immediate knowledge graph generation from raw text
3-Stage Processing Pipeline
Stage 1: Data Quality Assessment
- Completeness Analysis: Evaluates entity and relationship coverage
- Consistency Checking: Detects semantic conflicts and contradictions
- Relevance Scoring: Assesses information relevance and meaningfulness
- Quality Threshold: Automatically determines if data needs enhancement
Stage 2: Knowledge Completion (for low-quality data)
- Entity Enhancement: Completes missing entity information
- Relationship Inference: Adds missing relationships between entities
- Conflict Resolution: Corrects semantic inconsistencies
- Format Normalization: Standardizes data format and structure
- Implicit Knowledge Inference: Extracts hidden knowledge patterns
Stage 3: Knowledge Graph Construction
- Rule-based Extraction: Fast, deterministic triple generation
- LLM-enhanced Processing: Advanced semantic understanding and relationship inference
- Confidence Scoring: Assigns reliability scores to extracted knowledge
- Interactive Visualization: Generates beautiful HTML visualizations
MCP Architecture
- Client-Server Design: Clean separation of concerns
- Standardized Protocol: Built on MCP for interoperability
- Tool-based Interface: Modular, extensible tool system
- Async Processing: High-performance asynchronous operations
Requirements
- Python: 3.11 or higher
- UV Package Manager: For dependency management
- OpenAI-compatible API: For LLM integration (DeepSeek, OpenAI, etc.)
Quick Start
1. Clone and Setup
git clone https://github.com/turambar928/MCP_based_KG_construction.git
cd MCP_based_KG_construction
# Install dependencies
uv sync
2. Environment Configuration
Create a .env file with your API configuration:
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.siliconflow.cn/v1 # or your preferred endpoint
OPENAI_MODEL=Qwen/QwQ-32B # or your preferred model
Supported API Providers:
- OpenAI:
https://api.openai.com/v1 - DeepSeek:
https://api.deepseek.com - SiliconFlow:
https://api.siliconflow.cn/v1 - Any OpenAI-compatible endpoint
3. Start the MCP Server
uv run kg_server.py
The server will start and listen for MCP client connections.
4. Running Tests
There are three ways to test the system:
a. Using MCP Inspector
npx -y @modelcontextprotocol/inspector uv run kg_server.py
After running this command, click the link that appears after "MCP Inspector is up and running at" to open the MCP Inspector in your browser. Once opened:
- Click "Connect"
- Select "Tools" from the top menu
- Choose "build_knowledge_graph" from the list tools
- Enter your text in the left panel to generate the knowledge graph

b. Using Client Code
uv run kg_client.py
After the connection is successful, enter your text to view the results.

c. Using Mainstream MCP Tools (Cursor, Cherry Studio, etc.)
Example: Running in Cherry Studio
In settings, select MCP servers, click "Add Server" (import from JSON). Here's the configuration JSON (make sure to modify the local path):
{
"mcpServers": {
"kg_server": {
"command": "uv",
"args": [
"--directory",
"D:/mcp_getting_started",
"run",
"kg_server.py"
],
"env": {},
"disabled": false,
"autoApprove": []
}
}
}
After enabling this MCP server, you can use it in Cherry Studio.

🛠️ Usage Guide
Interactive Client Commands
Once the client is running, you can use these commands:
# Build knowledge graph from text
build <your_text_here>
# Example usage
build 北京大学是中国著名的高等教育机构,位于北京市海淀区
# Run demonstration examples
demo
# Exit the client
quit
Programmatic Usage
from kg_client import KnowledgeGraphClient
async def main():
client = KnowledgeGraphClient()
await client.connect_to_server()
# Build knowledge graph
result = await client.build_knowledge_graph(
"苹果公司的CEO是蒂姆·库克",
output_file="my_graph.html"
)
print(f"Generated graph: {result}")
await client.cleanup()
Example Outputs
High-Quality Input
Input: "北京大学是中国著名的高等教育机构,位于北京市海淀区。"
Processing: Direct Stage 3 (high quality detected)
Output:
- Entities: [北京大学, 中国, 高等教育机构, 北京市, 海淀区]
- Triples: [(北京大学, 是, 高等教育机构), (北京大学, 位于, 海淀区), ...]
- Visualization: Interactive HTML graph
Low-Quality Input (Incomplete)
Input: "李华去巴黎"
Processing:
- Stage 1: Detects incomplete information
- Stage 2: Enhances with "巴黎位于法国", "李华是人"
- Stage 3: Builds enhanced knowledge graph
Output: Enriched knowledge graph with inferred relationships
Low-Quality Input (Conflicting)
Input: "巴黎市是德国城市。"
Processing:
- Stage 1: Detects semantic conflict
- Stage 2: Corrects to "巴黎是法国城市"
- Stage 3: Builds corrected knowledge graph
Output: Corrected and enhanced knowledge graph
MCP Tools API
The system exposes the following MCP tools for integration:
build_knowledge_graph
Description: Complete pipeline for knowledge graph construction with automatic quality assessment and enhancement.
Parameters:
text(string): Input text to processoutput_file(string, optional): HTML visualization output filename (default: "knowledge_graph.html")
Returns: JSON object containing:
success(boolean): Processing success statusentities(array): Extracted entitiestriples(array): Generated knowledge triplesconfidence_scores(array): Confidence scores for each triplevisualization_file(string): Path to generated HTML visualizationprocessing_stages(object): Details of each processing stage
Example:
{
"success": true,
"entities": ["北京大学", "中国", "高等教育机构"],
"triples": [
{
"subject": "北京大学",
"predicate": "是",
"object": "高等教育机构",
"confidence": 0.95
}
],
"visualization_file": "knowledge_graph.html"
}
Project Structure
├── kg_server.py # Main MCP server implementation
├── kg_client.py # Interactive client for testing
├── kg_utils.py # Core knowledge graph construction utilities
├── kg_visualizer.py # HTML visualization generator
├── data_quality.py # Stage 1: Data quality assessment
├── knowledge_completion.py # Stage 2: Knowledge completion and enhancement
├── pyproject.toml # Project dependencies and configuration
├── .env # Environment variables (API keys)
└── README.md # This file
Core Components
kg_server.py: MCP server that orchestrates the 3-stage pipelinekg_client.py: Command-line client for interactive testing and batch processingkg_utils.py: Knowledge graph construction engine with rule-based and LLM-enhanced extractionkg_visualizer.py: Generates interactive HTML visualizations using Plotlydata_quality.py: Implements quality assessment algorithms for completeness, consistency, and relevanceknowledge_completion.py: Handles knowledge enhancement and conflict resolution
Advanced Features
Quality Assessment Metrics
- Completeness Score: Based on entity coverage and relationship density
- Consistency Score: Detects semantic conflicts and contradictions
- Relevance Score: Evaluates information meaningfulness
- Composite Quality Score: Weighted combination of all metrics
Knowledge Enhancement Strategies
- Entity Completion: Adds missing entity attributes and types
- Relationship Inference: Discovers implicit relationships
- Conflict Resolution: Corrects factual inconsistencies
- Format Normalization: Standardizes entity and relationship representations
Visualization Features
- Interactive Network Graph: Clickable nodes and edges
- Entity Clustering: Groups related entities by type
- Confidence Visualization: Color-coded confidence levels
- Export Options: HTML, PNG, SVG formats
Technical Details
Processing Pipeline
- Input Validation: Checks text format and encoding
- Quality Assessment: Multi-dimensional quality scoring
- Conditional Enhancement: Applies enhancement only when needed
- Graph Construction: Rule-based + LLM hybrid approach
- Confidence Calculation: Bayesian confidence scoring
- Visualization Generation: Interactive HTML output
Performance Characteristics
- Processing Speed: ~1-3 seconds per text input
- Memory Usage: ~50-100MB for typical workloads
- Scalability: Async architecture supports concurrent processing
- Accuracy: 85-95% entity extraction, 80-90% relationship accuracy
Development
Running Tests
Refer to the "Running Tests" section above for three different testing methods:
- MCP Inspector (recommended for visual testing)
- Client code (for programmatic testing)
- Mainstream MCP tools (for integration testing)
# Quick test with demonstration examples
uv run kg_client.py
# Then type: demo
# Test with custom input
uv run kg_client.py "Your test text here"
Adding New Features
- Custom Quality Metrics: Extend
data_quality.py - New Enhancement Strategies: Modify
knowledge_completion.py - Additional Visualization: Enhance
kg_visualizer.py - New MCP Tools: Add tools to
kg_server.py
Configuration Options
Environment variables in .env:
# Required
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=your_api_endpoint
OPENAI_MODEL=your_model_name
# Optional
QUALITY_THRESHOLD=0.5 # Quality threshold for enhancement
MAX_ENTITIES=50 # Maximum entities per graph
VISUALIZATION_WIDTH=1200 # HTML visualization width
VISUALIZATION_HEIGHT=800 # HTML visualization height
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test thoroughly
- Submit a pull request with detailed description
Troubleshooting
Common Issues
-
Port Occupation Error
# Find process using the port netstat -ano | findstr :6277 # Kill the process taskkill /PID <process_id> /F -
API Balance Insufficient
- Check API configuration in
.envfile - Ensure API account has sufficient balance
- Check API configuration in
-
Dependency Installation Issues
uv sync --reinstall
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on the Model Context Protocol (MCP)
- Visualization powered by Plotly
- Graph algorithms using NetworkX
- LLM integration via OpenAI API
Support
For questions, issues, or contributions:
- 📧 Email: tzf9282003@163.com
- 🐛 Issues: GitHub Issues
- 📖 Documentation: See
KNOWLEDGE_GRAPH_README.mdfor detailed technical documentation
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。