Tabular Document Retriever MCP
Transforms CSV and Excel data into Markdown-formatted vector embeddings stored in a local ChromaDB instance for semantic search. It enables MCP clients to retrieve relevant tabular data through single-row, batch, or free-text queries.
README
Tabular Document Retriever MCP
This is a Model Context Protocol (MCP) server that transforms tabular data (CSV/Excel) into Markdown key-value pairs, embeds them into a local vector database (ChromaDB), and provides retrieval tools to contextually answer queries.
It leverages the python uv package manager, mcp SDK, and FastAPI to optionally expose the server using Server-Sent Events (SSE).
Vibe-Coded
The entire code base was generated by Antigravity with the help of Gemini 3.1 Pro (High and Fast) to make a fast proof of concept.
Features
- Ingestion: Parses
.csvand.xlsxfiles and upserts Markdown-formatted strings into ChromaDB. - Retrieval Engine: Uses
sentence-transformers/all-MiniLM-L6-v2locally for exact semantic search. - MCP Server: Provides three tools exposed over an SSE endpoint:
retrieve_batchretrieve_singleretrieve_by_query
- Dockerization: Quick spin-up of the Database and the MCP Server together without exposing the raw database to the host machine.
Prerequisites
🚀 Running the Stack
To start the server and the ChromaDB vector database locally:
docker-compose up -d --build
This will launch:
- ChromaDB internally on
chroma-db:8000. - MCP Server accessible externally on
http://localhost:8000/sse.
💾 Ingesting User Data
Before you can search, you need to ingest tabular data into the running ChromaDB instance.
You can use the built CLI ingestor directly from your host machine. Make sure to map environment variables appropriately to reach your local stack or run it via Docker Compose.
To run the ingestor against a locally running ChromaDB (or inside the container):
# First, ensure dependencies are synced
uv sync
# Run the ingestor (Assuming there's a file `data/my_table.csv`)
# When interacting with the dockerized ChromeDB, make sure to temporarily expose port 8000 for chroma-db, OR simply just run ingestion locally with local persistence.
uv run python -m src.ingestor data/my_table.csv
Note: Since the docker stack makes ChromaDB private, you can either map a port for chroma-db in docker-compose.yml temporarily, or run a one-off task using docker-compose:
docker-compose exec mcp-app python src/ingestor.py /path/to/mounted/data.csv
🛠️ MCP Tools
Once running, any MCP client can connect to http://localhost:8000/sse via Server-Sent Events (SSE).
Available tools:
retrieve_single(row): Top-K search using a single row's markdown string.retrieve_batch(rows): Batch retrieval handling a list of markdown row strings.retrieve_by_query(query): Free-text query mapped exactly to ChromaDB's search.
💻 Local Testing Example
You can test the running MCP server locally using the official Python SDK. First, ensure you have the mcp package installed in your environment (uv pip install mcp or uv add mcp).
Run the example with:
uv run python tests/test_client.py
🗄️ Checking ChromaDB Records
You can easily dump the ingested records directly from your local container exposed on port 8001. A utility script is provided to connect to the database and retrieve all content from the tabular_data collection.
Run the script using:
uv run python tests/dump_records.py
Alternatively, you can query the ChromaDB REST API directly using curl to list the collections and check the status of your data:
# List all collections
curl http://localhost:8001/api/v1/collections
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。