MCP Indexer
Enables semantic code search across multiple repositories using natural language queries. Provides intelligent code discovery, symbol lookups, and cross-repo dependency analysis for AI coding agents.
README
MCP Indexer
Semantic code search indexer for AI tools via the Model Context Protocol (MCP).
For AI Coding Agents
If you're an AI agent working on this project, please read AGENTS.MD first. It contains instructions for using Beads issue tracking to manage tasks systematically across sessions.
Overview
MCP Indexer provides intelligent code search capabilities to any MCP-compatible LLM (Claude, etc.). It indexes your repositories using semantic embeddings, enabling natural language code search, symbol lookups, and cross-repo dependency analysis.
Features
- Semantic Search: Natural language queries find relevant code by meaning, not just keywords
- Multi-Language Support: Python, JavaScript, TypeScript, Ruby, Go
- Cross-Repo Analysis: Detect dependencies and suggest missing repos
- Incremental Updates: Track git commits and reindex only when needed
- MCP Integration: Works with any MCP-compatible LLM client
- Stack Management: Persistent configuration for repo collections
Installation
Prerequisites
- Python 3.8 or higher
- pip
Steps
- Clone the repository:
git clone https://github.com/gkatechis/mcpIndexer.git
cd mcpIndexer
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
export PYTHONPATH=/absolute/path/to/mcpIndexer/src
export MCP_INDEXER_DB_PATH=~/.mcpindexer/db # Optional, defaults to this location
- Configure MCP integration (for Claude Code or other MCP clients):
cp .mcp.json.example .mcp.json
# Edit .mcp.json and update paths to your installation directory
Quick Start
1. Try the Demo
Run the demo to see mcpIndexer in action:
python3 examples/demo.py
2. Index Your Repositories
import os
from mcpindexer.indexer import MultiRepoIndexer
from mcpindexer.embeddings import EmbeddingStore
# Initialize with your database path
db_path = os.getenv("MCP_INDEXER_DB_PATH", os.path.expanduser("~/.mcpindexer/db"))
store = EmbeddingStore(db_path=db_path, collection_name='mcp_code_index')
indexer = MultiRepoIndexer(store)
# Add and index your repository
indexer.add_repo(
repo_path='/path/to/your/repo',
repo_name='my-repo',
auto_index=True
)
3. Use with MCP Clients
Once configured in .mcp.json, the MCP server automatically starts when you use an MCP client like Claude Code.
The MCP server exposes 12 tools:
Search Tools:
semantic_search- Natural language code searchfind_definition- Find where symbols are definedfind_references- Find where symbols are usedfind_related_code- Find architecturally related files
Repository Management:
add_repo_to_stack- Add a new repositoryremove_repo- Remove a repositorylist_repos- List all indexed reposget_repo_stats- Get detailed repo statisticsreindex_repo- Force reindex a repository
Cross-Repo Analysis:
get_cross_repo_dependencies- Find dependencies between repossuggest_missing_repos- Suggest repos to add based on imports
Stack Management:
get_stack_status- Get overall indexing status
CLI Commands
Check for Updates
Check which repos need reindexing:
python3 -m mcpindexer check-updates
Reindex Changed Repos
Automatically reindex repos with new commits:
python3 -m mcpindexer reindex-changed
Stack Status
View current stack status:
python3 -m mcpindexer status
Install Git Hooks
Auto-reindex on git pull:
python3 -m mcpindexer install-hook /path/to/repo
This installs a post-merge hook that triggers reindexing after pulls.
Usage Examples
Semantic Search
import os
from mcpindexer.embeddings import EmbeddingStore
db_path = os.getenv("MCP_INDEXER_DB_PATH", os.path.expanduser("~/.mcpindexer/db"))
store = EmbeddingStore(db_path=db_path, collection_name='mcp_code_index')
# Natural language queries
results = store.semantic_search(
query="authentication logic",
n_results=10
)
for result in results:
print(f"{result.file_path}:{result.metadata['start_line']}")
print(f" {result.symbol_name} - Score: {result.score:.4f}")
Find Symbol Definitions
results = store.find_by_symbol(
symbol_name="authenticate_user",
repo_filter=["my-backend"]
)
Cross-Repo Dependencies
from mcpindexer.indexer import MultiRepoIndexer
indexer = MultiRepoIndexer(store)
# Find dependencies between repos
cross_deps = indexer.get_cross_repo_dependencies()
# Suggest missing repos to add
suggestions = indexer.suggest_missing_repos()
Configuration
Environment Variables
MCP_INDEXER_DB_PATH- Database path (default:~/.mcpindexer/db)PYTHONPATH- Must include thesrc/directory of your installation
Stack Configuration
Configuration is stored at ~/.mcpindexer/stack.json:
{
"version": "1.0",
"repos": {
"my-repo": {
"name": "my-repo",
"path": "/path/to/repo",
"status": "indexed",
"last_indexed": "2025-10-14T12:34:56.789Z",
"last_commit": "abc123...",
"files_indexed": 162,
"chunks_indexed": 302,
"auto_reindex": true
}
}
}
Architecture
Components
- Parser (
parser.py) - Tree-sitter based multi-language AST parsing - Chunker (
chunker.py) - Intelligent code chunking respecting AST boundaries - Embeddings (
embeddings.py) - ChromaDB + sentence-transformers for semantic search - Indexer (
indexer.py) - Orchestrates parsing → chunking → embedding → storage - Dependency Analyzer (
dependency_analyzer.py) - Tracks imports and dependencies - Stack Config (
stack_config.py) - Persistent configuration management - MCP Server (
server.py) - Exposes tools via Model Context Protocol - CLI (
cli.py) - Command-line interface
Indexing Pipeline
Code File → Parser → AST → Chunker → Semantic Chunks
↓
Embeddings
↓
ChromaDB Store
Performance
Based on testing with real-world repos:
- Speed: ~56 files/sec
- Zendesk App Framework: 162 files, 302 chunks in 1.86s
- 3 Repos: 255 files, 595 chunks in 4.58s
- Search Latency: ~100-200ms per query
Troubleshooting
Issue: "ModuleNotFoundError: No module named 'tree_sitter'"
Solution: Install dependencies
pip install -r requirements.txt
Issue: Slow indexing
Causes:
- Large files with many symbols
- Complex nested structures
- First-time embedding generation
Solutions:
- Use file filters to skip test/build directories
- Increase chunk size target
- Use GPU-accelerated embeddings (if available)
Issue: Poor search results
Causes:
- Query too generic
- Code not indexed
- Wrong language filter
Solutions:
- Use more specific queries ("JWT token validation" vs "auth")
- Check
list_reposto verify indexing - Try without language filter
- Increase
n_resultsparameter
Issue: Out of memory
Causes:
- Indexing too many repos at once
- Very large monoliths
Solutions:
- Index repos individually
- Increase system memory
- Use incremental indexing (git commit-based)
Issue: Git hooks not triggering
Causes:
- Hook not executable
- PYTHONPATH not set
- Hook overwritten
Solutions:
# Check hook exists and is executable
ls -la /path/to/repo/.git/hooks/post-merge
# Make executable
chmod +x /path/to/repo/.git/hooks/post-merge
# Test manually
cd /path/to/repo && .git/hooks/post-merge
Issue: Stale results after code changes
Solutions:
# Force reindex specific repo
python3 -c "
from mcpindexer.indexer import MultiRepoIndexer, EmbeddingStore
store = EmbeddingStore('./mcp_index_data', 'mcp_code_index')
indexer = MultiRepoIndexer(store)
indexer.repo_indexers['my-repo'].reindex(force=True)
"
# Or use CLI
python3 -m mcpindexer reindex-changed
Example Queries
Finding Implementations
- "password hashing"
- "JWT token validation"
- "database connection pool"
- "API rate limiting"
Finding Patterns
- "error handling"
- "logging configuration"
- "caching strategy"
- "retry logic"
Finding Components
- "user authentication"
- "payment processing"
- "email sending"
- "file upload handling"
Architecture Understanding
- "dependency injection setup"
- "middleware configuration"
- "router registration"
- "database migration"
Testing
# Run all tests
export PYTHONPATH=/path/to/mcpIndexer/src
python3 -m pytest tests/ -v
# Run specific test file
python3 -m pytest tests/test_embeddings.py -v
# Run example scripts
python3 examples/demo.py
See the examples/ directory for more usage examples.
Contributing
The codebase is organized by component:
src/mcpindexer/- Main source codetests/- Test suite (130+ tests)test_*.py- Integration test scripts
All components are independently tested with comprehensive coverage.
License
MIT License - see LICENSE file for details.
Support
For issues or questions, please open an issue on the repository.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。