knowledge-mcp-server
A high-performance MCP server that aggregates Wikipedia, arXiv, Context7, and DevDocs into a unified interface for AI agents.
README
Knowledge MCP Server
A high-performance Model Context Protocol (MCP) server that aggregates multiple knowledge sources into a unified interface for AI agents and applications.
🎯 Features
- Multi-Source Knowledge Aggregation — Unified access to 4 powerful knowledge sources
- Production-Ready — Built-in rate limiting, caching, and structured logging
- High Performance — In-memory caching reduces API calls by up to 80%
- API Compliant — Respects rate limits for all external APIs
- Configurable — Environment-based configuration for all features
- Type-Safe — Full TypeScript implementation with strict typing
📚 Knowledge Sources
| Source | Description | Rate Limit | Cache TTL |
|---|---|---|---|
| Context7 | Up-to-date library & framework documentation | 60 req/hour (free tier) | Configurable |
| Wikipedia | General knowledge & encyclopedic content | 70 req/s | 15 minutes |
| arXiv | Academic papers & research publications | 1 req/3s | 10-15 minutes |
| DevDocs | Developer documentation for popular libraries | 5 req/s | 15-30 minutes |
⚠️ DevDocs Disclaimer: The DevDocs integration uses an unofficial API and is not affiliated with or endorsed by DevDocs.io. It may break if the site's structure changes. We'll endeavor to update it promptly, but use in production at your own discretion.
📦 Installation
Prerequisites
- Node.js >= 18.0.0
- npm or yarn
- npx (usually comes with Node.js)
Quick Start
# Clone the repository
git clone https://github.com/Maouv/knowledge-mcp-server.git
cd knowledge-mcp-server
# Install dependencies
npm install
# Build the project
npm run build
# Start the server
npm start
The server will start at http://localhost:3000/mcp by default.
⚙️ Configuration
Create a .env file in the project root (or copy from .env.example):
cp .env.example .env
Core Configuration
# Server Configuration
PORT=3000 # Server port (default: 3000)
TRANSPORT=http # Transport mode: 'http' or 'stdio'
# User-Agent (Required by some APIs)
USER_AGENT=knowledge-mcp-server/2.0.0 (your-contact@example.com)
# Logging
LOG_LEVEL=info # Levels: error, warn, info, debug
# Rate Limiting
RATE_LIMIT_ENABLED=true # Enable/disable rate limiting
RATE_LIMIT_WIKIPEDIA=70 # Requests per second
RATE_LIMIT_ARXIV=0.33 # 1 request per 3 seconds
RATE_LIMIT_CONTEXT7=60 # Requests per hour
RATE_LIMIT_DEVDOCS=5 # Requests per second
# Caching
CACHE_ENABLED=true # Enable/disable caching
CACHE_TTL=600 # Default cache TTL in seconds
Environment Variables Reference
| Variable | Type | Default | Description |
|---|---|---|---|
PORT |
number | 3000 |
HTTP server port |
TRANSPORT |
string | http |
Transport mode (http or stdio) |
USER_AGENT |
string | (required) | User agent for API requests |
LOG_LEVEL |
string | info |
Logging verbosity level |
RATE_LIMIT_ENABLED |
boolean | true |
Enable rate limiting |
RATE_LIMIT_WIKIPEDIA |
number | 70 |
Wikipedia requests per second |
RATE_LIMIT_ARXIV |
number | 0.33 |
arXiv requests per second |
RATE_LIMIT_CONTEXT7 |
number | 60 |
Context7 requests per hour |
RATE_LIMIT_DEVDOCS |
number | 5 |
DevDocs requests per second |
CACHE_ENABLED |
boolean | true |
Enable response caching |
CACHE_TTL |
number | 600 |
Default cache TTL (seconds) |
🚀 Usage
HTTP Mode (Recommended)
Start the server:
npm start
# or with custom port
PORT=8080 npm start
Connect to your MCP client:
{
"mcpServers": {
"knowledge": {
"url": "http://localhost:3000/mcp"
}
}
}
Stdio Mode
For local development or subprocess usage:
TRANSPORT=stdio npm start
Client configuration:
{
"mcpServers": {
"knowledge": {
"command": "node",
"args": ["/path/to/knowledge-mcp-server/dist/index.js"],
"env": { "TRANSPORT": "stdio" }
}
}
}
🔧 Available Tools
Wikipedia Tools
wikipedia_summary
Get a summary of a Wikipedia article by title.
Arguments:
title(string): Article title (e.g., "JavaScript", "Machine learning")
Example:
{
"title": "React (software)"
}
wikipedia_search
Search Wikipedia articles.
Arguments:
query(string): Search querylimit(number, optional): Number of results (1-20, default: 5)
wikipedia_related
Get related articles linked from a Wikipedia article.
Arguments:
title(string): Article title
arXiv Tools
arxiv_search
Search academic papers on arXiv.
Arguments:
query(string): Search querylimit(number, optional): Number of results (1-20, default: 5)category(string, optional): arXiv category filter (e.g., "cs.AI", "cs.LG")
Example:
{
"query": "large language models",
"category": "cs.CL",
"limit": 10
}
arxiv_get_paper
Get full details of a specific paper.
Arguments:
paperId(string): arXiv paper ID or URL
Context7 Tools
context7_resolve_library
Resolve a library name to its Context7-compatible ID.
Arguments:
libraryName(string): Library name (e.g., "react", "nextjs")
context7_get_docs
Fetch up-to-date documentation for a library.
Arguments:
libraryId(string): Context7 library ID (fromcontext7_resolve_library)topic(string, optional): Specific topic to focus ontokens(number, optional): Max tokens to return (1000-10000, default: 5000)
DevDocs Tools
devdocs_list
List all available documentation sets.
devdocs_search
Search within a documentation set.
Arguments:
slug(string): Documentation slug (e.g., "react", "node")query(string): Search termlimit(number, optional): Max results (1-30, default: 10)
devdocs_get_page
Fetch full page content.
Arguments:
slug(string): Documentation slugpath(string): Page path from search results
📊 Example Workflows
Finding React Documentation
# 1. List available docs
devdocs_list()
# 2. Search for hooks
devdocs_search(slug: "react", query: "useState")
# 3. Get the full page
devdocs_get_page(slug: "react", path: "hooks/use-state")
Researching Machine Learning Papers
# 1. Search arXiv
arxiv_search(query: "transformer architectures", category: "cs.LG", limit: 5)
# 2. Get specific paper details
arxiv_get_paper(paperId: "2301.00001")
Learning About a Concept
# 1. Get Wikipedia overview
wikipedia_summary(title: "Artificial neural network")
# 2. Find related concepts
wikipedia_related(title: "Artificial neural network")
# 3. Search for research papers
arxiv_search(query: "neural networks")
🏗️ Architecture
┌─────────────────────────────────────────────┐
│ Knowledge MCP Server │
├─────────────────────────────────────────────┤
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Wikipedia│ │ arXiv │ │ Context7 │ │
│ │ Service │ │ Service │ │ Service │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ┌────▼─────────────▼──────────────▼─────┐│
│ │ Rate Limiter (Bottleneck) ││
│ └────────────────────┬──────────────────┘│
│ │ │
│ ┌────────────────────▼──────────────────┐│
│ │ Cache Layer (node-cache) ││
│ └────────────────────┬──────────────────┘│
│ │ │
│ ┌────────────────────▼──────────────────┐│
│ │ Logging (Winston) ││
│ └───────────────────────────────────────┘│
└─────────────────────────────────────────────┘
🛡️ Rate Limiting
This server implements respectful rate limiting for all external APIs to ensure compliance and reliability:
Wikipedia
- Official Limit: ~200 requests/second
- Server Default: 70 requests/second (safe buffer)
- Reasoning: Conservative approach to avoid throttling
arXiv
- Official Limit: 1 request per 3 seconds
- Server Default: Strict 1 request per 3 seconds
- Reasoning: arXiv has very strict limits; violations may result in IP bans
Context7
- Free Tier Limit: 60 requests/hour
- Server Default: 60 requests/hour
- Paid Plans: Configurable via
RATE_LIMIT_CONTEXT7env var - Reasoning: Matches Upstash free tier quota
DevDocs
- Official Limit: Not officially documented (unofficial API)
- Server Default: 5 requests/second
- Reasoning: Conservative to avoid service disruption
Behavior
When rate limits are reached:
- Requests are automatically queued
- No errors are thrown
- Logs warn when approaching limits
- Requests execute when capacity is available
To disable rate limiting (not recommended for production):
RATE_LIMIT_ENABLED=false
🗄️ Caching Strategy
The server uses in-memory caching to reduce API calls and improve response times:
| Source | Cache TTL | Reasoning |
|---|---|---|
| Wikipedia Summaries | 15 minutes | Content rarely changes rapidly |
| arXiv Search | 10 minutes | Papers don't change after publication |
| arXiv Papers | 15 minutes | Static content |
| DevDocs List | 1 hour | Documentation index rarely changes |
| DevDocs Search | 15 minutes | Reasonable balance |
| DevDocs Pages | 30 minutes | Documentation rarely updates frequently |
| Context7 | Configurable | Depends on use case |
Cache statistics available programmatically:
import { getCacheStats } from './cache.js';
const stats = getCacheStats();
// { keys: 45, hits: 1234, misses: 56 }
To disable caching:
CACHE_ENABLED=false
📝 Logging
Structured logging via Winston with multiple log levels:
# Development (verbose)
LOG_LEVEL=debug npm start
# Production (standard)
LOG_LEVEL=info npm start
# Minimal
LOG_LEVEL=error npm start
All logs are written to stderr (stdout is reserved for MCP protocol).
Example log output:
2024-01-15 10:23:45 [info]: knowledge-mcp-server running on http://localhost:3000/mcp
2024-01-15 10:23:50 [info]: Rate limiting enabled {"wikipedia":"70 req/s","arxiv":"0.33 req/s"}
2024-01-15 10:24:01 [info]: Searching arXiv {"query":"transformers","limit":5}
🔍 Health Check
Check server health:
curl http://localhost:3000/health
Response:
{
"status": "ok",
"server": "knowledge-mcp-server",
"version": "2.0.0",
"tools": ["wikipedia", "context7", "arxiv", "devdocs"]
}
🧪 Development
Build
npm run build
Development Mode
npm run dev
Project Structure
knowledge-mcp-server/
├── src/
│ ├── index.ts # Entry point
│ ├── constants.ts # Configuration constants
│ ├── types.ts # TypeScript type definitions
│ ├── logger.ts # Winston logger setup
│ ├── cache.ts # In-memory caching layer
│ ├── rateLimiter.ts # Rate limiting logic
│ ├── services/ # External API integrations
│ │ ├── wikipedia.ts
│ │ ├── arxiv.ts
│ │ ├── context7.ts
│ │ └── devdocs.ts
│ └── tools/ # MCP tool definitions
│ ├── wikipedia.ts
│ ├── arxiv.ts
│ ├── context7.ts
│ └── devdocs.ts
├── dist/ # Compiled JavaScript
├── package.json
├── tsconfig.json
├── .env.example
└── README.md
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Code Style
- TypeScript with strict mode
- ES Modules (ESM)
- Async/await for asynchronous operations
- Meaningful variable and function names
- Comprehensive error handling
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Model Context Protocol by Anthropic
- Context7 by Upstash
- Wikipedia REST API
- arXiv API
- DevDocs (unofficial API)
📮 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with ❤️ for the AI community
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。