codetex-mcp
A commit-aware code context manager for LLMs that indexes Git repositories into a multi-tier knowledge hierarchy (repo overviews, file summaries, symbol details) with SQLite vector search, serving context via the Model Context Protocol.
README
codetex-mcp
A commit-aware code context manager for LLMs. Indexes Git repositories into a multi-tier knowledge hierarchy — repo overviews, file summaries, and symbol details — stored in SQLite with vector search. Serves context to LLM clients via the Model Context Protocol (MCP) or a local CLI.
What It Does
codetex builds a structured, searchable index of your codebase that LLMs can query on demand:
- Tier 1 — Repo Overview: Purpose, architecture, directory structure, key technologies, entry points
- Tier 2 — File Summaries: Per-file purpose, public interfaces, dependencies, roles
- Tier 3 — Symbol Details: Function/class signatures, parameters, return types, call relationships
Summaries are generated by an LLM (Anthropic Claude). Embeddings are computed locally with sentence-transformers for semantic search. Everything is stored in a single SQLite database with sqlite-vec for vector queries.
Incremental sync means only changed files are re-analyzed when you update your code.
Requirements
- Python 3.12+
- Git
- An Anthropic API key (for indexing)
Installation
# With pip
pip install codetex-mcp
# With uv (recommended)
uv tool install codetex-mcp
Quick Start
1. Set your Anthropic API key
# Via environment variable
export ANTHROPIC_API_KEY=sk-ant-...
# Or via config
codetex config set llm.api_key sk-ant-...
2. Add a repository
# Local repo
codetex add /path/to/your/project
# Remote repo (clones to ~/.codetex/repos/)
codetex add https://github.com/user/repo.git
3. Index it
# Preview what indexing will cost (no API calls)
codetex index my-project --dry-run
# Build the full index
codetex index my-project
4. Query your codebase
# Repo overview (Tier 1)
codetex context my-project
# File summary (Tier 2)
codetex context my-project --file src/auth/login.py
# Symbol detail (Tier 3)
codetex context my-project --symbol authenticate_user
# Semantic search
codetex context my-project --query "how is authentication implemented?"
5. Keep it up to date
# Incremental sync — only re-analyzes changed files
codetex sync my-project
MCP Server Setup
The MCP server lets LLM clients (like Claude Code, Cursor, Windsurf, etc.) query your indexed codebases directly.
Claude Code
Add to your Claude Code MCP settings (~/.claude/claude_desktop_config.json):
{
"mcpServers": {
"codetex": {
"command": "codetex",
"args": ["serve"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}
If you installed with uv tool, use the full path:
{
"mcpServers": {
"codetex": {
"command": "/path/to/codetex",
"args": ["serve"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}
Find the path with which codetex or uv tool dir.
Other MCP Clients
Any client that supports MCP stdio transport can use codetex. The server command is:
codetex serve
Available MCP Tools
Once connected, the LLM has access to 7 tools:
| Tool | Description |
|---|---|
get_repo_overview |
Tier 1 repo overview (architecture, technologies, entry points) |
get_file_context |
Tier 2 file summary with symbol list |
get_symbol_detail |
Tier 3 full symbol detail (signature, params, relationships) |
search_context |
Semantic search across all indexed context |
get_repo_status |
Index status (staleness, file/symbol counts, last indexed) |
sync_repo |
Trigger incremental sync from within the LLM session |
list_repos |
List all registered repositories |
CLI Reference
codetex add <target>
Register a git repository. Accepts a local path or remote URL.
codetex add . # Current directory
codetex add /path/to/repo # Local path
codetex add https://github.com/user/repo.git # Remote (clones locally)
codetex add git@github.com:user/repo.git # SSH remote
codetex index <repo-name>
Build a full index for a registered repository.
codetex index my-project # Full index
codetex index my-project --dry-run # Preview (files, symbols, estimated LLM calls/tokens)
codetex index my-project --path src/ # Index only files under src/
codetex sync <repo-name>
Incremental sync to the current HEAD. Only files changed since the last indexed commit are re-analyzed.
codetex sync my-project # Sync changes
codetex sync my-project --dry-run # Preview what would change
codetex sync my-project --path src/ # Sync only changes under src/
codetex context <repo-name>
Query indexed context at any tier.
codetex context my-project # Tier 1: repo overview
codetex context my-project --file src/main.py # Tier 2: file summary
codetex context my-project --symbol MyClass # Tier 3: symbol detail
codetex context my-project --query "error handling" # Semantic search
codetex status <repo-name>
Show index status: indexed commit, current HEAD, staleness, file/symbol counts, token usage.
codetex list
List all registered repositories with their index status.
codetex config show
Display the current configuration.
codetex config set <key> <value>
Update a configuration value.
codetex config set llm.api_key sk-ant-...
codetex config set llm.model claude-sonnet-4-5-20250929
codetex config set indexing.max_file_size_kb 1024
codetex config set indexing.max_concurrent_llm_calls 10
Configuration
Configuration is loaded in layers (last wins):
- Defaults — sensible out-of-the-box values
- TOML file —
~/.codetex/config.toml - Environment variables — override everything
Config file
# ~/.codetex/config.toml
[storage]
data_dir = "~/.codetex" # Base directory for DB and cloned repos
[llm]
provider = "anthropic" # LLM provider (currently: anthropic)
model = "claude-sonnet-4-5-20250929" # Model used for summarization
api_key = "sk-ant-..." # Anthropic API key
[indexing]
max_file_size_kb = 512 # Skip files larger than this
max_concurrent_llm_calls = 5 # Parallel LLM requests during indexing
tier1_rebuild_threshold = 0.10 # Rebuild repo overview if >=10% of files changed on sync
[embedding]
model = "all-MiniLM-L6-v2" # Sentence-transformers model for embeddings
Environment variables
| Variable | Maps to | Example |
|---|---|---|
ANTHROPIC_API_KEY |
llm.api_key |
sk-ant-... |
CODETEX_DATA_DIR |
storage.data_dir |
/custom/path |
CODETEX_LLM_PROVIDER |
llm.provider |
anthropic |
CODETEX_LLM_MODEL |
llm.model |
claude-sonnet-4-5-20250929 |
CODETEX_MAX_FILE_SIZE_KB |
indexing.max_file_size_kb |
1024 |
CODETEX_MAX_CONCURRENT_LLM |
indexing.max_concurrent_llm_calls |
10 |
CODETEX_TIER1_THRESHOLD |
indexing.tier1_rebuild_threshold |
0.15 |
CODETEX_EMBEDDING_MODEL |
embedding.model |
all-MiniLM-L6-v2 |
File Exclusion
Files are filtered through multiple stages:
- Default excludes —
node_modules/,__pycache__/,.git/,dist/,build/,.venv/,*.lock,*.min.js,*.pyc,*.so, etc. .gitignore— standard gitignore rules from your repo.codetexignore— same syntax as.gitignore, placed in your repo root. Use!patternto un-ignore files- File size — files exceeding
max_file_size_kbare skipped - Binary detection — files with null bytes in the first 8 KB are skipped
Language Support
| Language | Tree-sitter (full AST) | Fallback (regex) |
|---|---|---|
| Python | Yes | Yes |
| JavaScript | Yes | Yes |
| TypeScript | Yes | Yes |
| Go | Yes | Yes |
| Rust | Yes | Yes |
| Java | Yes | Yes |
| Ruby | Yes | Yes |
| C/C++ | Yes | Yes |
| All others | — | Yes |
Tree-sitter grammars for all 8 languages are installed automatically. For other languages, the fallback parser uses regex patterns to extract functions, classes, and imports.
Architecture
CLI (Typer) ──┐
├──▶ Core Services (Indexer, Syncer, ContextStore, SearchEngine)
MCP (FastMCP)─┘ │ │ │
Analysis LLM Provider Embeddings
(tree-sitter + (Anthropic) (sentence-transformers)
regex fallback) │ │
└──────────────┴──────────────┘
│
SQLite + sqlite-vec
- Two entry points (CLI and MCP server) share the same core service layer
- No DI framework — services are wired via a
create_app()factory - All core services are async — CLI bridges with
asyncio.run() - Embeddings are local — no external API calls for vector search (model auto-downloads on first run, ~90 MB)
- Single SQLite database — 6 main tables + 2 vector tables (384-dimensional embeddings)
Development
git clone https://github.com/mrosata/codetex-mcp.git
cd codetex-mcp
# Install dependencies (including dev)
uv sync
# Run tests
uv run pytest
# Run tests with coverage
uv run pytest --cov=codetex_mcp
# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/
# Type check
uv run mypy src/
Releasing
Releases are automated via GitHub Actions and python-semantic-release. Version bumps are driven by conventional commit messages on main.
Commit message format
| Prefix | Effect | Example |
|---|---|---|
fix: ... |
Patch bump (0.1.0 → 0.1.1) | fix: handle missing gitignore |
feat: ... |
Minor bump (0.1.0 → 0.2.0) | feat: add Ruby tree-sitter support |
feat!: ... |
Major bump (0.1.0 → 1.0.0) | feat!: redesign context API |
docs:, chore:, ci:, test:, refactor: |
No release | docs: update README |
A BREAKING CHANGE: line in the commit body also triggers a major bump.
How it works
- Push or merge a PR to
main - CI runs lint, type check, and tests
- The release workflow analyzes commits since the last tag
- If a version bump is needed, it:
- Updates the version in
pyproject.toml - Creates a git tag (e.g.,
v0.2.0) - Publishes a GitHub Release with a changelog
- Builds and publishes the package to PyPI
- Updates the version in
Manual release (not recommended)
If you need to release without the automation:
uv build
uv publish
License
MIT
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。