Ollama MCP Server

Ollama MCP Server

Enables Claude to delegate coding tasks to local Ollama models, reducing API token usage by up to 98.75% while leveraging local compute resources. Supports code generation, review, refactoring, and file analysis with Claude providing oversight and quality assurance.

Category
访问服务器

README

Ollama MCP Server for Claude Code

This MCP (Model Context Protocol) server integrates your local Ollama instance with Claude Code, allowing Claude to delegate coding tasks to your local models (Gemma3, Mistral, etc.) to minimize API token usage.

How It Works

Claude Code acts as an orchestrator, calling tools provided by this MCP server. The tools run on your local Ollama instance, and Claude reviews/refines the results as needed. This approach:

  • ✅ Minimizes Anthropic API token usage (up to 98.75% reduction with file-aware tools!)
  • ✅ Leverages your local compute resources
  • ✅ Works across any Claude Code project/session
  • ✅ Allows Claude to provide oversight and corrections

Available Tools

String-Based Tools (Pass code as arguments)

These tools accept code as string parameters - useful when code is already in the conversation:

  1. ollama_generate_code - Generate new code from requirements
  2. ollama_explain_code - Explain how code works
  3. ollama_review_code - Review code for issues and improvements
  4. ollama_refactor_code - Refactor code to improve quality
  5. ollama_fix_code - Fix bugs or errors in code
  6. ollama_write_tests - Generate unit tests
  7. ollama_general_task - Execute any general coding task

File-Aware Tools (Massive token savings!)

These tools read files directly on the MCP server, dramatically reducing conversation token usage:

  1. ollama_review_file - Review a file by path (saves ~98.75% tokens vs reading + reviewing)
  2. ollama_explain_file - Explain a file by path
  3. ollama_analyze_files - Analyze multiple files together to understand relationships
  4. ollama_generate_code_with_context - Generate code using existing files as reference patterns

Setup Instructions

1. Install Dependencies

npm install

2. Ensure Ollama is Running

Make sure Ollama is running on localhost:11434:

ollama serve

Verify you have the models installed:

ollama list

You should see gemma3:12b, gemma3:4b, or other models you want to use. The default model is gemma3:12b with gemma3:4b as a faster fallback for simpler tasks.

3. Configure Claude Code

Add this MCP server to your Claude Code configuration. The config file location depends on your OS:

  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Add the following to your config (or merge with existing mcpServers):

{
  "mcpServers": {
    "ollama": {
      "command": "node",
      "args": ["G:\\Projects\\OllamaClaude\\index.js"]
    }
  }
}

Note: Update the path in args to match your actual installation location.

4. Restart Claude Code

After updating the configuration, restart Claude Code for the changes to take effect.

Usage

Once configured, Claude Code will automatically have access to the Ollama tools. You can:

Direct Usage

Ask Claude to use specific tools:

  • "Use ollama_generate_code to create a function that..."
  • "Use ollama_review_code to check this code for issues"

Automatic Orchestration

Simply ask Claude to do tasks, and it will decide when to delegate to Ollama:

  • "Write a function to parse JSON" → Claude may delegate to Ollama
  • "Review this code" → Claude may use Ollama for initial review, then add insights
  • "Fix this bug" → Ollama attempts fix, Claude verifies and corrects if needed

Customization

Change Default Model

Edit index.js and update these lines (near the top of the file):

const DEFAULT_MODEL = "gemma3:12b";  // Change to your preferred model
const FALLBACK_MODEL = "gemma3:4b";  // Faster model for simpler tasks

Popular Ollama models to consider:

  • gemma3:12b - Good balance of quality and speed (default)
  • gemma3:27b - Highest quality, slower, requires more VRAM
  • gemma3:4b - Fastest, good for simple tasks
  • qwen2.5-coder:7b - Specialized for coding
  • mistral-small:latest - Good balance of speed and quality

Modify Tool Prompts

Each tool method in index.js contains a prompt template. You can customize these to get better results from your specific models.

Add New Tools

Add new tools by:

  1. Adding a tool definition in ListToolsRequestSchema handler
  2. Creating a new method (like generateCode, reviewCode, etc.)
  3. Adding a case in the CallToolRequestSchema handler

Troubleshooting

"Cannot connect to Ollama" Error

  • Ensure Ollama is running: ollama serve
  • Check it's on the default port: http://localhost:11434
  • Test with: curl http://localhost:11434/api/tags

Tools Not Appearing in Claude Code

  • Verify the config path is correct
  • Restart Claude Code completely
  • Check Claude Code logs for MCP connection errors

Slow Responses / Timeouts

  • Expected behavior: Ollama calls typically take 60-180 seconds with gemma3:12b on single GPU setups
  • Consider using faster models for simple tasks (e.g., gemma3:4b instead of gemma3:12b)
  • Adjust the timeout in index.js line 362 (currently 900000ms = 15 minutes)
  • Ensure your machine has adequate resources for the model
  • For large files, consider using smaller models or breaking the analysis into chunks

Example Workflows

Basic Workflow

  1. User asks: "Create a function to validate email addresses"
  2. Claude decides: "This is a code generation task, I'll use ollama_generate_code"
  3. Ollama generates: Initial code implementation
  4. Claude reviews: Checks the code, may suggest improvements or fixes
  5. Result: User gets Ollama-generated code with Claude's oversight

File-Aware Workflow (Token Saver!)

  1. User asks: "Review the code in index.js for security issues"
  2. Claude calls: ollama_review_file with the file path and focus="security"
  3. MCP server: Reads index.js directly (no tokens used in conversation!)
  4. Ollama analyzes: Reviews the ~700 lines of code
  5. Claude refines: Adds context or additional insights
  6. Token savings: ~98.75% compared to reading the file into conversation first

Multi-File Analysis Workflow

  1. User asks: "How do index.js and package.json relate?"
  2. Claude calls: ollama_analyze_files with both file paths
  3. MCP server: Reads both files server-side
  4. Ollama analyzes: Identifies dependencies, patterns, relationships
  5. Result: Cross-file insights without sending files through Claude conversation

This hybrid approach gives you the speed and cost savings of local models with the intelligence and quality assurance of Claude.

Performance Expectations

Response Times

  • Small tasks (simple code snippets): 20-60 seconds
  • Medium tasks (function reviews, file analysis): 60-120 seconds
  • Large tasks (multiple files, complex analysis): 120-180 seconds

Response time depends on:

  • Your GPU/CPU capabilities
  • Model size (gemma3:4b is ~3x faster than gemma3:12b, gemma3:27b is ~2x slower)
  • Task complexity
  • File size for file-aware tools

Token Usage

  • Traditional approach: Read 700-line file (2000 tokens) + Review (2000 tokens) = 4000 tokens
  • File-aware approach: Call ollama_review_file with path = ~50 tokens
  • Savings: ~98.75% reduction in Claude API token usage!

Benefits Over Pure Local or Pure Cloud

  • vs Pure Ollama: Claude provides architectural guidance, catches errors, and ensures quality
  • vs Pure Claude: Significant token savings on routine coding tasks (up to 98.75%!)
  • Best of Both: Local compute for heavy lifting, Claude for orchestration and refinement

Project Structure

OllamaClaude/
├── index.js              # Main MCP server implementation
├── package.json          # Node.js dependencies
├── README.md             # This file
├── test.md               # Test cases and validation guide
└── .gitignore            # Git ignore patterns

Contributing & Future Improvements

Potential enhancements to consider:

  • Caching: Cache file contents for repeated operations
  • Glob support: Pass patterns like *.js to analyze multiple files
  • Streaming responses: Stream Ollama output for faster perceived performance
  • Auto-context: Automatically find and include related files
  • File writing: Allow Ollama to write generated code directly to files

See test.md for detailed test cases and validation procedures.

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选