Markdown RAG
A Retrieval Augmented Generation system that enables AI assistants to perform semantic searches and manage document indices for markdown files. It supports PostgreSQL with pgvector and integrates both Google Gemini and Ollama for intelligent embedding generation.
README
Markdown RAG
A Retrieval Augmented Generation (RAG) system for markdown documentation with intelligent rate limiting and MCP server integration.
Features
- Semantic Search: Vector-based similarity search using Google Gemini or Ollama embeddings
- Markdown-Aware Chunking: Intelligent document splitting that preserves semantic boundaries
- Rate Limiting: Sophisticated sliding window algorithm with token counting and batch optimization
- MCP Server: Model Context Protocol server for AI assistant integration
- PostgreSQL Vector Store: Scalable storage using pgvector extension
- Incremental Updates: Smart deduplication prevents reprocessing existing documents
- Production Ready: Type-safe configuration, comprehensive logging, and error handling
Installation
git clone https://github.com/yourusername/markdown-rag.git
Prerequisites
- Python 3.11+
- PostgreSQL 12+ with pgvector extension installed
- Google Gemini API key (if using Google embeddings)
- Ollama (if using local embeddings)
- MCP-compatible client (Claude Desktop, Cline, etc.)
Quick Start
1. (Optional) Set Up PostgreSQL
createdb embeddings
If you do not create a database, the tool will create one for you. The pgvector extension will be automatically enabled when you first run the tool.
2. Ingest Documents
cd markdown-rag
# Use Google Gemini
uv run markdown-rag /path/to/docs --command ingest --engine google
# Or use Ollama
uv run markdown-rag /path/to/docs --command ingest --engine ollama
Required environment variables (create .env or export):
POSTGRES_PASSWORD=your_password
GOOGLE_API_KEY=your_gemini_api_key # Only if using Google engine
3. Configure MCP Client
Add to your MCP client configuration (e.g., claude_desktop_config.json). The client will automatically start the server.
Minimal configuration:
{
"mcpServers": {
"markdown-rag": {
"command": "uv",
"args": [
"run",
"--directory"
"/absolute/path/to/markdown-rag",
"markdown-rag",
"/absolute/path/to/docs",
"--command",
"mcp"
],
"env": {
"POSTGRES_PASSWORD": "your_password",
"GOOGLE_API_KEY": "your_api_key"
}
}
}
}
Full configuration:
{
"mcpServers": {
"markdown-rag": {
"command": "uv",
"args": [
"run",
"--directory"
"/absolute/path/to/markdown-rag",
"markdown-rag",
"/absolute/path/to/docs",
"--command",
"mcp"
],
"env": {
"POSTGRES_USER": "postgres_username",
"POSTGRES_PASSWORD": "your_password",
"DISABLED_TOOLS": "delete_document,update_document",
"CHUNK_OVERLAP": 50,
# Google Configuration
"GOOGLE_API_KEY": "your_api_key",
"GOOGLE_MODEL": "models/gemini-embedding-001",
"RATE_LIMIT_REQUESTS_PER_DAY": "1000",
"RATE_LIMIT_REQUESTS_PER_MINUTE": "100",
# Ollama Configuration
"OLLAMA_HOST": "http://localhost:11434",
"OLLAMA_MODEL": "mxbai-embed-large",
}
}
}
}
4. Query via MCP
The server exposes several tools:
query
- Semantic search over documentation
- Arguments:
query(string),num_results(integer, optional, default: 4)
list_documents
- List all ingested documents
- Arguments: none
delete_document
- Remove a document from the index
- Arguments:
filename(string)
update_document
- Re-ingest a specific document
- Arguments:
filename(string)
refresh_index
- Scan directory and ingest new/modified files
- Arguments: none
To disable tools (e.g., in production), set DISABLED_TOOLS environment variable:
DISABLED_TOOLS=delete_document,update_document,refresh_index
Configuration
Environment Variables
| Variable | Default | Required | Description |
|---|---|---|---|
POSTGRES_USER |
postgres |
No | PostgreSQL username |
POSTGRES_PASSWORD |
- | Yes | PostgreSQL password |
POSTGRES_HOST |
localhost |
No | PostgreSQL host |
POSTGRES_PORT |
5432 |
No | PostgreSQL port |
POSTGRES_DB |
[engine]_embeddings |
No | Database name |
GOOGLE_API_KEY |
- | Yes* | Google Gemini API key (*if using Google) |
GOOGLE_MODEL |
models/gemini... |
No | Google embedding model |
OLLAMA_HOST |
http://localhost... |
No | Ollama host URL |
OLLAMA_MODEL |
mxbai-embed-large |
No | Ollama embedding model |
RATE_LIMIT_REQUESTS_PER_MINUTE |
100 |
No | Max API requests per minute |
RATE_LIMIT_REQUESTS_PER_DAY |
1000 |
No | Max API requests per day |
DISABLED_TOOLS |
- | No | Comma-separated list of tools to disable |
Command Line Options
uv run markdown-rag <directory> [OPTIONS]
Arguments:
<directory>: Path to markdown files directory (required)
Options:
-c, --command {ingest|mcp}: Operation mode (default:mcp)ingest: Process and store documentsmcp: Start MCP server for queries
-e, --engine {google|ollama}: Embedding engine (default:google)-l, --level {debug|info|warning|error}: Logging level (default:warning)
Examples:
uv run markdown-rag ./docs --command ingest --level info --engine ollama
uv run markdown-rag /var/docs -c ingest -l debug -e google
Architecture
System Components
The following diagram shows how the system components interact:
graph TD
A[MCP Client<br/>Claude, ChatGPT, etc.] --> B[FastMCP Server<br/>Tool: query]
B --> C[MarkdownRAG]
C --> D[Text Splitters]
C --> E[Rate Limited Embeddings]
E --> F[Google Gemini<br/>Embeddings API]
C --> G[PostgreSQL<br/>+ pgvector]
Rate Limiting Strategy
The system implements a dual-window sliding algorithm:
- Request Limits: Tracks requests per minute and per day
- Token Limits: Counts tokens before API calls
- Batch Optimization: Calculates maximum safe batch sizes
- Smart Waiting: Minimal delays with automatic retry
See Architecture Documentation for detailed diagrams.
Development
Setup Development Environment
git clone https://github.com/yourusername/markdown-rag.git
cd markdown-rag
uv sync
Run Linters
uv run ruff check .
uv run mypy .
Code Style
This project follows:
- Linting: Ruff with Google docstring convention
- Type Checking: mypy with strict settings
- Line Length: 79 characters
- Import Sorting: Alphabetical with isort
Project Structure
markdown-rag/
├── src/markdown_rag/
│ ├── __init__.py
│ ├── main.py # Entry point and MCP server
│ ├── config.py # Environment and CLI configuration
│ ├── models.py # Pydantic data models
│ ├── rag.py # Core RAG logic
│ ├── embeddings.py # Rate-limited embeddings wrapper
│ └── rate_limiter.py # Rate limiting algorithm
├── docs/
│ ├── api-reference.md # API documentation
│ ├── architecture.md # Architecture documentation
│ ├── mcp-integration.md # MCP server integration guide
│ └── user-guide.md # User guide
├── pyproject.toml # Project configuration
├── .env # Environment variables (not in git)
└── README.md
Troubleshooting
Common Issues
"Failed to start store: connection refused"
PostgreSQL not running or wrong connection settings. Check your connection parameters in environment variables.
"Rate limit exceeded"
Adjust rate limits in environment variables:
RATE_LIMIT_REQUESTS_PER_MINUTE=50
RATE_LIMIT_REQUESTS_PER_DAY=500
"pgvector extension not found"
The pgvector PostgreSQL extension is not installed. Follow the pgvector installation guide for your platform.
"Skipping all files (already in vector store)"
Expected behavior. The system prevents duplicate ingestion.
Logging
uv run markdown-rag ./docs --command ingest --level debug
Security
Best Practices
- Never commit
.envfiles - Add to.gitignore - Use environment variables for all secrets
- Restrict database access - Use firewall rules
- Rotate API keys regularly
- Use read-only database users for query-only deployments
Secrets Management
All secrets use SecretStr type to prevent accidental logging:
from pydantic import SecretStr
api_key = SecretStr("secret_value")
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make changes and add tests
- Run linters (
uv run ruff check .) - Run type checks (
uv run mypy .) - Commit changes (
git commit -m 'feat: add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Commit Message Format
Follow conventional commits:
feat: add new feature
fix: resolve bug
docs: update documentation
refactor: improve code structure
test: add tests
chore: update dependencies
TODOS
- Management of embeddings store via MCP tool.
- Add support for other embeddings models.
- Add support for other vector stores.
License
This project is licensed under the MIT License.
Acknowledgments
- LangChain - RAG framework
- Google Gemini - Embedding model
- pgvector - Vector similarity search
- FastMCP - MCP server framework
Support
- Documentation: docs/architecture.md
- Issues: GitHub Issues
- Discussions: GitHub Discussions
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。