mcp-units
An MCP server that provides deterministic unit conversions backed by Pint, enabling exact unit conversions, compatibility checks, and parsing of quantities for LLMs.
README
mcp-units
An MCP server that provides deterministic unit conversions via Pint. LLMs guess at unit conversions — this server makes them exact.
What this does
Exposes 5 tools, 3 resources, and 2 prompts over the Model Context Protocol. Any MCP client (Claude Code, Claude Desktop, Cursor) can convert units, check dimensional compatibility, parse quantity strings, and simplify expressions — all backed by Pint's 400+ unit registry instead of LLM arithmetic.
How it works
A FastMCP server wraps Pint's UnitRegistry and exposes it through MCP primitives:
- Tools —
convert,check_compatibility,parse_quantity,list_compatible_units,simplify - Resources —
units://systems,units://systems/{system},units://dimensions - Prompts —
convert_document(extract and convert all quantities in text),check_calculations(verify dimensional consistency)
The server runs over stdio by default (for Claude Code / Claude Desktop) or Streamable HTTP via fastmcp run (for remote / containerized deployment).
Quickstart
Prerequisites
- Python 3.12+
- uv
Install and run
git clone https://github.com/quantumleeps/mcp-units.git
cd mcp-units
uv sync
Add to Claude Code
claude mcp add --transport stdio mcp-units -- \
uv run --directory /path/to/mcp-units mcp-units
Add to Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"mcp-units": {
"command": "uv",
"args": ["run", "--directory", "/path/to/mcp-units", "mcp-units"]
}
}
}
Run over HTTP
uv run fastmcp run src/mcp_units/server.py --transport http --port 8000
Docker
docker build -t mcp-units .
docker run -p 8000:8000 mcp-units
Tests
uv sync --all-extras
uv run pytest
Evaluation
Does giving an LLM access to a unit conversion tool actually improve its accuracy on physics problems?

Evaluated on 70 SciBench college-level physics problems requiring 2+ unit types, across 6 Claude models (840 total runs). Opus 4.6 — the latest model — shows the largest gain (+8.6pp, 70.0% → 78.6%), suggesting that its combination of broad knowledge and refined tool-use lets it leverage unit conversion as a reliable augmentation. 4.5-Sonnet, a strong reasoner and tool user, also improves (+2.9pp). The older 3.7-Sonnet regresses (-2.9pp) — analysis shows it sometimes treats an intermediate conversion result as the final answer, or spins through repeated tool calls without converging, consistent with less mature tool-use capabilities. The surprise is 4.5-Haiku: same generation as 4.5-Sonnet with capable reasoning and tool use, yet it declines (-1.4pp). With a smaller model, the tool appears to be a distraction rather than an augmentation — the model has the sophistication to use it but not always the judgment to know when it helps. With only 70 problems and a single run per model, these per-model deltas carry real uncertainty — the 4.5-Haiku result in particular could reflect noise rather than a meaningful pattern.
Next steps
- Unit normalization — Models write
cm3but Pint needscm^3. A lightweightnormalize_unit()preprocessor plus better tool descriptions with formatting guidance would eliminate the 12 parsing failures observed in the eval. - Expression evaluation — Models sometimes pass math expressions (
-1.602e-19 * 1.33e-39 / ...) as the value parameter toconvert(). Pint rejects these since it expects a float. Accepting and evaluating simple arithmetic expressions would let the tool handle intermediate calculations. - Offset unit handling — Pint raises
OffsetUnitCalculusErrorfor °C and °F in compound expressions. Theparse_quantitytool needs special handling for temperature offsets. - Larger problem set — 70 problems demonstrates the evaluation framework but limits statistical confidence on per-model deltas. Run-to-run variance within a single model is also unknown. Expanding to 200+ problems with multiple runs per problem would quantify both effects.
Run the eval
uv sync --group eval
uv run python -m eval.runner # run all 6 models × 2 conditions (requires ANTHROPIC_API_KEY)
uv run python -m eval.visualize # generate charts from results
uv run python -m eval.analyze # print detailed analysis
Project Structure
mcp-units/
src/mcp_units/
server.py # FastMCP instance — tools, resources, prompts
registry.py # Pint UnitRegistry + compatible units workaround
models.py # Result dataclasses for structured tool output
eval/
runner.py # Async eval runner — baseline vs tool-augmented
problems.py # SciBench problem loading (70 problems, 2+ unit types)
scorer.py # Answer extraction + 5% tolerance scoring
mcp_tools.py # FastMCP Client wrapper for tool execution
results.py # RunResult dataclass + JSON persistence
visualize.py # Grouped bar chart + error histograms
analyze.py # 16-section detailed analysis
tests/
test_tools.py # 18 Pint logic tests
test_server.py # 17 MCP Client integration tests
Dockerfile # HTTP transport for containerized deployment
Contributing
PRs welcome. Run pre-commit install after cloning and ensure uv run pytest passes before submitting.
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。