marklogic-mcp
An MCP server for MarkLogic 12 that enables AI agents to interrogate, query, and manage MarkLogic databases using native capabilities including full-text search, Optic queries, SPARQL, bulk import/export, and TDE schema management.
README
marklogic-mcp
A Model Context Protocol (MCP) server for MarkLogic 12. Enables AI agents to interrogate, query, and manage MarkLogic using MarkLogic-native capabilities — full-text search, Optic row queries, SPARQL, Flux bulk import/export, TDE schema management, and more.
Features
- 80+ MCP tools across 15 domains: admin (incl. logs), documents, security, search, search options, schema, eval, SPARQL/graphs, Optic (incl. vector search), performance, QuickSight, Flux, REST extensions, Semaphore (taxonomy + classification), and approach advisory
- 5 MCP resources including a machine-readable problem→solution decision guide
- 13 MCP prompts for query planning, code generation, import design, and BI integration
- Two transports: stdio (Claude Desktop, GitHub Copilot, local agents) and HTTP+SSE (Claude Code, GitHub Copilot, remote agents, QuickSight)
- Read-only by default — writes gated behind
ML_READONLY=false, eval gated behindML_ALLOW_EVAL=true - Basic and Digest auth for MarkLogic REST API
How Agents Should Use This Server
Start with the decision guide
Before calling any query or import tool, an agent should read the marklogic://instructions resource. It contains a problem→tool decision table and a set of nine principles (e.g. "discover before you query", "native before eval", "Flux before REST for bulk loads"). This prevents common mistakes like using ml_eval_javascript for bulk import or ml_document_put in a loop.
Use the advisory tools when unsure
Two tools exist specifically to guide tool selection:
| Advisory tool / resource | When to use |
|---|---|
marklogic://instructions resource |
Read at session start — machine-readable decision guide |
ml_suggest_approach |
Call with a natural-language task to get ranked tool recommendations with ready-to-use recipe parameters |
problem_advisor prompt |
Call with a goal to get a 6-section structured analysis (classification → native approach → discovery → tool sequence → pitfalls → alternatives) |
query_approach_advisor prompt |
Call when the goal is a query and you need to choose between cts.search, Optic, or a hybrid |
Discover before you query
Never assume a collection, TDE view, or index exists. The standard discovery sequence is:
ml_collections_list → ml_schema_discover → ml_indexes_list → ml_views_list
Run these before writing any query or import plan.
Optic vs cts.search
| Goal | Use | Prerequisite |
|---|---|---|
| Find documents by content / keyword | ml_search (cts.search) |
None — universal index always available |
| Filter by exact field value or date range | ml_search structured_query |
Range index recommended (ml_indexes_list) |
| COUNT / SUM / AVG / GROUP BY | ml_optic_query (fromView) |
TDE view in Schemas DB (ml_views_list) |
| Join two collections by key | ml_optic_query (join-inner) |
TDE views for both collections |
| Full-text filter THEN aggregate (hybrid) | ml_optic_query (fromSearch) |
TDE view + cts query |
| Count distinct values / faceted nav | ml_values_query, ml_facets_query |
Range or element word index |
Use the query_approach_advisor prompt to get a concrete, filled-in query plan for any of these goals.
Multi-model data: Documents + Triples + Vectors
MarkLogic stores all three model types natively. Use data_modeling_advisor for guided design.
Entity-oriented triple pattern (preferred)
Group triples by IRI so that each entity is one document. The document URI equals the entity IRI, and triples are embedded as a sem:triples array inside the document body. This avoids a separate triple store lookup for entity properties and keeps the document and its graph relationships co-located.
Importing raw RDF (two-step)
flux_importwith subcommandimport-rdf-files→ loads triples as managed triples (quad store, one quad per document)flux_reprocesswith an SJS transform that groups quads by subject IRI and writes one entity document per subject → produces the entity-oriented layout
Vector search
Store embeddings as a JSON array field. Define a TDE column with scalar: "vec:vector". Query with ml_vector_search — it uses vec:cosine-similarity through the Optic API with no eval required. MarkLogic 12+ only.
Bulk loading
Always use flux_import for more than ~10 documents. It handles HTTP URL fetch, ZIP/gzip decompression, parallel batching, and automatic TDE view generation in a single call — 10–100× faster than looping ml_document_put.
Quick Start
New to marklogic-mcp? See the Getting Started Guide for a complete walkthrough.
Claude Desktop (stdio)
-
Install and build:
npm install && npm run build -
Configure
.env:cp .env.example .env # Edit with your MarkLogic connection details -
Add to Claude Desktop config (
~/Library/Application Support/Claude/claude_desktop_config.jsonon macOS):{ "mcpServers": { "marklogic": { "command": "node", "args": ["/path/to/marklogic-mcp/dist/index.js"], "env": { "ML_HOST": "your-marklogic-host", "ML_PORT": "8000", "ML_MANAGEMENT_PORT": "8002", "ML_USERNAME": "admin", "ML_PASSWORD": "your-password", "ML_AUTH_TYPE": "basic", "ML_READONLY": "true" } } } }
Claude Code (remote HTTP transport)
# Start server (Docker)
ML_HOST=<host> ML_PASSWORD=<pass> MCP_API_KEY=<secret> \
docker compose -f docker-compose.mcp-only.yml up -d
# Register with Claude Code
claude mcp add --transport http marklogic http://localhost:3000/mcp \
--header "Authorization: Bearer <secret>"
See docs/claude-code-remote-mcp.md for the full guide.
GitHub Copilot in VS Code
Add to VS Code user settings or .vscode/mcp.json:
{
"mcp": {
"servers": {
"marklogic": {
"type": "stdio",
"command": "node",
"args": ["/path/to/marklogic-mcp/dist/index.js"],
"env": {
"ML_HOST": "localhost",
"ML_PORT": "8000",
"ML_USERNAME": "admin",
"ML_PASSWORD": "your-password",
"ML_AUTH_TYPE": "digest",
"ML_READONLY": "true"
}
}
}
}
}
Or connect to a running HTTP server: set "type": "http" and "url": "http://localhost:3000/mcp".
See docs/getting-started.md for the full guide including per-project config with input variables for secrets.
HTTP/SSE Transport (AWS QuickSight / remote agents)
MCP_TRANSPORT=http MCP_HTTP_PORT=3000 ML_HOST=your-host ML_USERNAME=admin ML_PASSWORD=pass \
node dist/index.js
OAuth2 Bearer Token Passthrough
When MarkLogic is configured as an OAuth2 resource server, the MCP server can forward each client's Bearer token directly to MarkLogic — MarkLogic validates the JWT and enforces its own per-user RBAC.
MCP_TRANSPORT=http MCP_HTTP_PORT=3000 ML_HOST=your-host ML_AUTH_TYPE=oauth \
node dist/index.js
# ML_USERNAME / ML_PASSWORD are not used in oauth mode
# Clients pass: Authorization: Bearer <user-jwt>
To configure MarkLogic as an OAuth2 resource server, use the oauth_setup_advisor prompt in the MCP server — it generates the required Management API calls and XQuery for your OIDC provider. Key points verified on ML 12:
- Create the external security via
sec:create-external-security()(not raw XQuery) to preserve required element ordering - Set
authorization: oauthand map JWT claim values to MarkLogic roles viasec:role-set-external-names()— the claim value matches the role's external-name, not its role-name - Apply
authentication: oauthto all server groups (apps, enode, etc.)
Flux tools are disabled in oauth mode (they require username:password credentials).
Health check: GET http://localhost:3000/health
Docker Compose — full stack (MarkLogic + MCP server)
docker compose up
# MarkLogic at http://localhost:8001 (Admin UI)
# MCP server at http://localhost:3000
Docker Compose — connect to existing MarkLogic / Semaphore containers
If MarkLogic and/or Semaphore are already running in Docker on the same host, use the external-network compose file:
docker network create shared # one-time
docker network connect shared <marklogic-container> # attach existing containers
docker network connect shared <semaphore-container>
ML_HOST=marklogic SEMAPHORE_HOST=semaphore ML_PASSWORD=admin \
docker compose -f docker-compose.external.yml up -d
See docs/docker-networking.md for the full guide and alternative approaches (host network mode, host IP).
Configuration
| Variable | Default | Description |
|---|---|---|
MCP_TRANSPORT |
stdio |
stdio or http |
MCP_HTTP_PORT |
3000 |
HTTP transport port |
MCP_API_KEY |
(none) | Bearer token for HTTP transport auth |
ML_HOST |
localhost |
MarkLogic hostname or IP |
ML_PORT |
8000 |
REST API port |
ML_MANAGEMENT_PORT |
8002 |
Management API port |
ML_USERNAME |
admin |
MarkLogic username |
ML_PASSWORD |
admin |
MarkLogic password |
ML_DATABASE |
Documents |
Default database |
ML_AUTH_TYPE |
digest |
digest, basic, or oauth (Bearer token passthrough to MarkLogic) |
ML_OAUTH_TOKEN |
(none) | Static Bearer token; required in stdio mode when ML_AUTH_TYPE=oauth |
ML_SSL |
false |
Enable HTTPS |
ML_READONLY |
true |
Block all write operations |
ML_ALLOW_EVAL |
false |
Enable /v1/eval (XQuery/SJS execution) |
LOG_LEVEL |
info |
debug, info, warn, error |
LOG_FORMAT |
json |
json or pretty |
SEMAPHORE_HOST |
(none) | Semaphore hostname (enables CLS + KMM connectivity) |
SEMAPHORE_SCS_PORT |
5058 |
Classification Server port |
SEMAPHORE_KMM_PORT |
5080 |
Studio / KMM port |
SEMAPHORE_USERNAME |
(none) | KMM username |
SEMAPHORE_PASSWORD |
(none) | KMM password |
SEMAPHORE_URL |
(none) | Explicit CLS URL override (takes precedence over host:port) |
FLUX_RUNNER_URL |
(none) | Flux runner HTTP URL (e.g. http://localhost:8082) |
FLUX_DATA_DIR |
./flux-data |
Local directory mounted as /data in the Flux Docker container |
FLUX_TIMEOUT_MINUTES |
30 |
Flux operation timeout in minutes |
ML_TIMEOUT_MS |
30000 |
HTTP request timeout for MarkLogic calls (milliseconds) |
ML_SSL_REJECT_UNAUTHORIZED |
true |
Reject self-signed SSL certificates (false for dev environments) |
MCP_HTTP_HOST |
0.0.0.0 |
Bind address for HTTP transport |
MCP_CORS_ORIGIN |
(all) | Restrict CORS to a single origin (default: allow all) |
MCP_TRUST_PROXY |
(disabled) | Express trust proxy setting — set when behind a reverse proxy (nginx, ALB, ingress). Use 1 for a single proxy, a number of hops, an IP/subnet list (e.g. 10.0.0.0/8), or loopback. Avoid true (spoofable). Required to silence ERR_ERL_UNEXPECTED_X_FORWARDED_FOR from express-rate-limit. |
ML_OAUTH_TOKEN |
(none) | Static Bearer token; required in stdio mode when ML_AUTH_TYPE=oauth |
ML_DHF_CLIENT_JAR |
(none) | Absolute path to marklogic-data-hub-<version>-client.jar |
ML_DHF_PORT |
(ML_PORT) | DHF staging app server port |
ML_DHF_JOBS_PORT |
(ML_DHF_PORT+2) | DHF jobs app server port |
AWS_REGION |
(none) | AWS region for QuickSight integration |
AWS_QUICKSIGHT_ACCOUNT_ID |
(none) | QuickSight account ID |
AI Client API Keys
This MCP server does not use AI provider API keys itself — it is a tool server that AI agents connect to. The API keys for your AI provider are configured in your client application, not in this server.
| AI Client | Environment Variable | Where to configure |
|---|---|---|
| Claude Desktop | ANTHROPIC_API_KEY |
Built into the app (uses your Anthropic account) |
| Claude Code | ANTHROPIC_API_KEY |
Shell environment or ~/.bashrc / ~/.zshrc |
| OpenAI-compatible agents | OPENAI_API_KEY |
Agent's own environment or config file |
| Amazon Bedrock agents | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
AWS credentials chain |
| Google Vertex AI agents | GOOGLE_APPLICATION_CREDENTIALS |
GCP service account JSON path |
Example: Claude Code with this MCP server
# 1. Set your Anthropic API key (client-side — not in the MCP server)
export ANTHROPIC_API_KEY=sk-ant-...
# 2. Start the MCP server (server-side — no AI keys needed)
ML_HOST=my-marklogic MCP_API_KEY=my-secret \
docker compose -f docker-compose.mcp-only.yml up -d
# 3. Register the MCP server with Claude Code
claude mcp add --transport http marklogic http://localhost:3000/mcp \
--header "Authorization: Bearer my-secret"
Tip:
MCP_API_KEYsecures the MCP server's HTTP endpoint — it is unrelated to any AI provider key. Think of it as a password for the MCP server itself.
Tools Reference
Approach Advisory
| Tool | Description |
|---|---|
ml_suggest_approach |
Analyse a natural-language task and return ranked tool recommendations with ready-to-use recipe parameters. Call this before starting any non-trivial task. |
Admin (11 tools)
| Tool | Description |
|---|---|
ml_cluster_status |
Cluster health, version, host info |
ml_databases_list |
List all databases |
ml_database_properties |
Full database configuration |
ml_database_statistics |
Document counts, forest sizes |
ml_database_set_forests (write) |
Attach a specific list of forests to a database — primary fix for the forest-hang pattern when cluster nodes are offline |
ml_forests_list |
Forest status |
ml_servers_list |
App server list |
ml_server_properties |
App server configuration |
ml_reindex_status |
Check whether a database has finished reindexing after TDE installation or index config changes. Returns ready=true when safe to run ml_optic_query or ml_tde_validate. Use after flux_import with generate_tde=true to avoid SQL-TABLEREINDEXING errors. |
ml_logs_list |
List available MarkLogic log files (ErrorLog.txt, AccessLog.txt, port-specific logs). Use before ml_logs_read. |
ml_logs_read |
Read a MarkLogic server log file with optional time-range and regex filtering. Key files: ErrorLog.txt, 8002_AccessLog.txt, 8000_AccessLog.txt. |
Documents (6 tools)
| Tool | Description |
|---|---|
ml_document_get |
Retrieve document by URI |
ml_document_list |
List by collection or directory |
ml_document_sample |
Sample random documents from a collection |
ml_document_put (write) |
Create/replace document |
ml_document_delete (write) |
Delete document |
ml_document_patch (write) |
Partial update |
Security (3 tools)
| Tool | Description |
|---|---|
ml_users_list |
List all MarkLogic users (requires manage-user privilege) |
ml_roles_list |
List all roles, or retrieve full properties for a named role |
ml_document_permissions |
Return the read/update/insert/execute permissions on a document URI |
Search (5 tools)
Uses MarkLogic's universal index — no TDE or range index required for word queries.
| Tool | Description |
|---|---|
ml_search |
Full-text and structured search with cts.search semantics |
ml_search_qbe |
Query By Example — match by document structure |
ml_values_query |
Lexicon/range index value counts and aggregates |
ml_geospatial_search |
Find documents within a geospatial region — circle, bounding box, or polygon. Requires a geospatial element pair index; confirm with ml_indexes_list first. |
ml_suggest |
Search autocomplete from a partial query string |
Range queries within
ml_searchrequire a pre-existing range index. Verify withml_indexes_listfirst.
Search Options / FastTrack (4 tools)
Manage named search-options configurations stored in the FastTrack endpoint (/v1/config/query).
| Tool | Description |
|---|---|
ml_search_options_list |
List all named search-options configurations |
ml_search_options_get |
Retrieve a named search-options configuration |
ml_search_options_put (write) |
Create or replace a search-options configuration |
ml_search_options_delete (write) |
Delete a search-options configuration |
Schema Discovery (7 tools)
| Tool | Description |
|---|---|
ml_schema_discover |
Infer field shapes by sampling documents in a collection |
ml_schema_get_tde |
Retrieve TDE templates from the Schemas database |
ml_tde_validate |
Validate a TDE template against sampled documents |
ml_tde_install (write) |
Install a TDE template into the Schemas database with the correct collection — convenience wrapper around ml_document_put that sets database=Schemas and the required http://marklogic.com/xdmp/tde collection automatically |
ml_indexes_list |
All configured range, element, and field indexes |
ml_collections_list |
Collections with document counts |
ml_namespaces_list |
XML namespace registry |
Optic (3 tools)
Row-based query engine over TDE views. Use for GROUP BY, aggregations, joins, and vector similarity search. Requires a TDE template in the Schemas database — verify with ml_views_list before calling ml_optic_query.
| Tool | Description |
|---|---|
ml_optic_query |
Execute a serialised Optic plan (fromView, fromSearch, join, group-by, etc.) |
ml_vector_search |
Find k nearest neighbours via cosine similarity over a TDE vec:vector column. MarkLogic 12+, no eval required. |
ml_views_list |
List all available TDE schema.view pairs with the collections they cover |
Eval (requires ML_ALLOW_EVAL=true)
Use as a last resort — ~10 KB script payload limit, no parallel batching.
| Tool | Description |
|---|---|
ml_eval_xquery |
Execute XQuery on the server |
ml_eval_javascript |
Execute Server-Side JavaScript |
ml_invoke_module |
Call a stored SJS/XQuery module |
ml_sparql |
Execute SPARQL via sem:sparql() XQuery — handles boilerplate automatically. Use instead of ml_eval_xquery when running SPARQL with sem: API features not available via ml_sparql_query. |
Graphs / SPARQL (4 tools)
Queries MarkLogic's triple store. Supports three storage patterns: embedded triples (co-located inside the source document as a sem:triples array), named graphs (standalone RDF documents), and hybrid (entity document + named graph for cross-entity relationships).
| Tool | Description |
|---|---|
ml_sparql_query |
SPARQL 1.1 SELECT/CONSTRUCT/ASK/DESCRIBE. SELECT and ASK return { head, results } JSON. CONSTRUCT and DESCRIBE return raw Turtle text. Supports embedded, named-graph, and hybrid triple patterns. |
ml_graphs_list |
List named graphs. Identifies managed-triple graphs that may be candidates for reprocessing into entity-oriented documents via flux_reprocess. |
ml_graph_put (write) |
Load Turtle, N-Triples, JSON-LD, or RDF/XML into a named graph via PUT/PATCH /v1/graphs. |
ml_graph_delete (write) |
Permanently delete a named graph and all its triples. |
Turtle prefix syntax: Prefixed local names cannot contain
/in Turtle 1.0 (MarkLogic's parser). Use<http://full/uri>for subjects/objects whose IRI paths contain slashes, or define one prefix per entity type so local names are slash-free.
QuickSight Integration (4 tools)
| Tool | Description |
|---|---|
ml_aggregate_query |
Group-by + metrics → tabular rows for BI consumption |
ml_timeseries_query |
Date-bucketed aggregation (day/week/month/year) |
ml_export_tabular |
Export collection as CSV or JSON rows |
ml_facets_query |
Facet breakdowns for filter controls |
Performance (3 tools + 1 eval-gated)
| Tool | Description |
|---|---|
ml_explain_optic |
Get the execution plan for an Optic query without running it — shows join strategy and index usage |
ml_search_query_plan |
Run a search in debug mode to see the resolved CTS query structure and candidate estimate |
ml_forest_metrics |
Per-forest fragment counts, stand counts, deleted-fragment ratio, and merge status |
ml_profile_query (requires ML_ALLOW_EVAL=true) |
Profile XQuery, SJS, or SPARQL execution time and cache/filter metrics |
REST Extensions (5 tools)
| Tool | Description |
|---|---|
ml_extension_list |
List installed REST API extensions |
ml_extension_get |
Retrieve the source of an extension module |
ml_extension_call |
Call an extension endpoint with arbitrary method, params, and body |
ml_extension_put (write) |
Install or replace a REST extension module |
ml_extension_delete (write) |
Remove a REST extension module |
Flux (7 tools)
Flux is the preferred path for all bulk data operations. It runs as a subprocess via the MCP server host.
| Tool | Description |
|---|---|
flux_import |
Import from CSV, JSON, Parquet, Avro, JDBC, S3, or HTTP URL |
flux_export |
Export documents to file, S3, or JDBC target |
flux_copy |
Copy documents between databases |
flux_reprocess |
Re-run a transform over an existing collection |
flux_preview |
Preview import without writing to the database |
flux_help |
Get Flux subcommand flags and options |
flux_status |
Check Flux runner availability |
flux_importsupportsgenerate_tde: trueto auto-create an Optic view from the imported collection in one call.flux_importalso supports inline Semaphore classification at ingest viaclassify_with_semaphore: true— attaches taxonomy categories to every imported document.
Semaphore (20 tools)
Semaphore is the Progress Data Platform taxonomy and classification engine. These tools manage the full lifecycle: load a SKOS vocabulary into KMM, configure the publisher, publish rules to the Classification Server (CLS), and classify content.
CLS (Classification Server) — port 5058
| Tool | Description |
|---|---|
semaphore_status |
Check CLS connectivity and version |
semaphore_publish_sets |
List active taxonomy rule sets loaded in the CLS |
semaphore_classes |
List classification class names in the active rulenet |
semaphore_classify |
Classify text against the loaded rulenet (exploratory / small-scale) |
semaphore_cls_languages |
List available language packs in the CLS (uses indexed codes like en1, not ISO codes) |
KMM / Studio (taxonomy authoring) — port 5080
| Tool | Description |
|---|---|
semaphore_studio_status |
Check KMM connectivity and authentication |
semaphore_kmm_models_list |
List all taxonomy models in KMM |
semaphore_kmm_model_create |
Create a new model container in KMM |
semaphore_kmm_skos_load |
Load a SKOS vocabulary from a public URL into a KMM model |
semaphore_kmm_sparql |
Query model content via SPARQL SELECT |
semaphore_kmm_sparql_update |
Run SPARQL INSERT/DELETE/LOAD to modify model triples |
semaphore_kmm_model_delete |
Permanently delete a KMM model and all its triples |
semaphore_publish |
Trigger an async KMM publish — compiles the taxonomy into CLS rules |
semaphore_publish_config_fix_plain_skos |
Patch the publisher config for plain-SKOS vocabularies (skos:prefLabel, no SKOS-XL) — adds GRAPH clause, switches to AllConcepts, bootstraps workspace automatically |
semaphore_publish_diagnose |
Diagnose publish failures — compares KMM concept count vs CLS rule count and identifies the root cause |
Concept / Taxonomy Editing
| Tool | Description |
|---|---|
semaphore_concept_search |
Search for concepts across a KMM model by keyword (matches prefLabel, altLabel, hiddenLabel) |
semaphore_concept_get |
Retrieve full concept profile: all labels, broader/narrower hierarchy, related links, scopeNote |
semaphore_concept_labels_update |
Add or remove a single label on a concept — primary tool for classification quality tuning |
semaphore_taxonomy_validate |
Run SPARQL-based structural quality checks on a KMM model (hierarchy health, orphan detection, anti-patterns) |
semaphore_taxonomy_scaffold |
Generate a properly structured SKOS Turtle skeleton for a new taxonomy — output is ready to pass to semaphore_kmm_skos_load |
Plain-SKOS vocabularies (UNESCO, EuroVoc, AGROVOC, IPTC): run
semaphore_publish_config_fix_plain_skosbeforesemaphore_publish. Without it, the publisher generates only 1 CLS rule (for the ConceptScheme root) instead of one per concept. The root cause is that the publisher's SPARQL endpoint is a global store — each model's data lives in the named graphurn:x-evn-master:{ModelName}and is invisible without an explicitGRAPHclause. This tool adds the clause automatically.Fully programmatic pipeline: The entire taxonomy workflow — create model, load SKOS, fix config, publish — runs via API with no Semaphore Studio interaction. The publisher workspace is initialised automatically on first publish. The only one-time global prerequisite is adding a CLS environment in Studio Admin once (
Administration → Publisher → Classification Server Environments → Add); after that,semaphore_publishauto-discovers it for all future models.Configuration: Set
SEMAPHORE_HOST,SEMAPHORE_SCS_PORT(default 5058),SEMAPHORE_KMM_PORT(default 5080),SEMAPHORE_USERNAME, andSEMAPHORE_PASSWORDin the MCP server.env.
Resources Reference
| Resource URI | Description |
|---|---|
marklogic://instructions |
Problem-first decision guide — maps goals to native MarkLogic capabilities and tools. Read this at session start. |
marklogic://databases |
Live list of all databases in the cluster |
marklogic://cluster/status |
Cluster health and version |
marklogic://forests |
Forest list with status |
marklogic://documents |
Usage note for document access tools |
Prompts Reference
Query Planning
| Prompt | Purpose |
|---|---|
query_approach_advisor |
Choose between cts.search, Optic, or a hybrid approach for a query goal. Returns 6-section plan: classification, approach, prerequisites, query construction, performance notes, pitfalls. |
problem_advisor |
Map any natural-language goal to MarkLogic-native tools. Returns 6-section analysis: classification, native approach, discovery sequence, tool sequence, pitfalls, alternatives. |
structured_query_builder |
Natural language → MarkLogic structured query JSON |
optic_query_builder |
Requirements + schema/view → Optic API plan (SJS style) |
sparql_query_builder |
Natural language → SPARQL |
Code Generation
| Prompt | Purpose |
|---|---|
xquery_function_generator |
Generate XQuery with MarkLogic 12 idioms and namespace handling |
sjs_module_generator |
Generate SJS transforms, REST extensions, or library modules |
tde_schema_generator |
Generate a TDE JSON template from a collection and sample fields |
rest_extension_generator |
Scaffold a MarkLogic REST API extension with HTTP method handlers |
Import Design
| Prompt | Purpose |
|---|---|
data_import_advisor |
Choose the right import tool and strategy (always considers Flux first) |
gdelt_import |
Ready-to-run flux_import call for a GDELT 1.0 event export date |
Multi-Model Design
| Prompt | Purpose |
|---|---|
data_modeling_advisor |
Design a MarkLogic multi-model schema combining Documents, Triples, and Vectors. Returns 8-section plan: model selection, document design, triple design (entity-oriented pattern + managed-triples reprocess path), vector/embedding design, TDE schema, import sequence, query plan, pitfalls. |
QuickSight
| Prompt | Purpose |
|---|---|
quicksight_dataset_designer |
Design a QuickSight dataset sourced from MarkLogic — discovery, field mapping, aggregation strategy |
quicksight_dashboard_planner |
Plan a QuickSight dashboard from a business question |
Architecture
src/
server.ts — factory: createMcpServer() wires tools + resources + prompts
index.ts — CLI entry; selects stdio or HTTP transport
tools/ — one file per domain; registerXxxTools() functions
semaphore.ts — 12 Semaphore tools (CLS + KMM taxonomy management)
resources/ — static + dynamic resources; INSTRUCTIONS_TEXT decision guide
prompts/ — all prompts; query_approach_advisor and problem_advisor first
client/ — typed HTTP clients for each MarkLogic API surface
semaphore.ts — CLS XML API + KMM REST API + publisher workspace ZIP client
config/ — dotenv loading and Zod validation
transport/ — stdio and Express/HTTP transport wrappers
utils/ — error formatting, digest auth, multipart builder
All write tools check readonly at registration time and are not registered when ML_READONLY=true. Eval tools check allowEval and are not registered when ML_ALLOW_EVAL=false. This means tools are absent from the MCP tool list entirely — they are never silently no-ops.
Development
npm run dev # tsx watch — auto-reload on save
npm run build # TypeScript → dist/
npm run typecheck # Type check without emitting
npm test # Vitest (skips gracefully if ML_HOST not set)
npm run inspector # Launch MCP Inspector UI
AWS QuickSight Integration
QuickSight agents connect via the HTTP transport. Recommended pattern:
- Start the MCP server in HTTP mode (ECS task or EC2 accessible from QuickSight)
- Agent calls
ml_schema_discoverandml_views_listto understand data shape - Agent calls
ml_export_tabularorml_aggregate_queryto extract data rows - Agent uses the QuickSight API to create/refresh a SPICE dataset
- Use
quicksight_dataset_designerprompt for guided step-by-step assistance
Security Notes
What ML_READONLY actually does
ML_READONLY=true (the default) is a tool-layer safety belt, not a credential-level restriction. When it is on:
- Write tools are not registered.
ml_document_put/_delete/_patch,ml_tde_install,ml_graph_put/_delete,ml_search_options_put/_delete,ml_extension_put/_delete,ml_database_set_forests, anddhf_flow_runare absent from the server's tool list. - Flux write subcommands refuse.
flux_import/flux_copy/flux_reprocessreturn a structuredUNSUPPORTED_IN_BUILDerror.flux_export/flux_preview/flux_help/flux_statusremain available (read-only). - Eval tools are not registered.
ml_eval_javascript/_xquery/_sparql,ml_invoke_module,ml_profile_query, andml_force_mergeare skipped entirely — even ifML_ALLOW_EVAL=true. Server-side eval can call any write API (xdmp.documentInsert,admin:database-create,sec:create-user, etc.), so allowing it alongside readonly would defeat the safety belt. The server logs a critical warning at startup when this combination is set, then disables eval.
What ML_READONLY does NOT do
The flag controls which tools this server registers. It does not restrict what the underlying MarkLogic user can do:
- The MCP server holds one set of MarkLogic credentials (
ML_USERNAME/ML_PASSWORD). Those credentials have whatever MarkLogic roles the operator granted them. If the user isadmin, that user can do anything against MarkLogic — via the Admin UI, the Management REST API, or any other process that finds the credentials on the host. - The MCP server cannot prevent shell-level bypass. A user (or agent) with shell access to the host running the MCP server can read the credentials, write a separate Node/curl script that uses them, and call MarkLogic directly. The server is a single process; it does not control other processes on the same host.
A real-world example: an agent given ML_READONLY=true was asked to create a database. The MCP write tools were correctly unavailable. The agent then read the MCP server's source to learn the auth scheme, wrote a Node script that imported the same client classes, and ran it via node create-db.mjs — bypassing the server entirely. The database was created because the underlying user had admin privileges.
Recommended security posture
For defence in depth, both layers should be locked:
- Credential layer (most important). Create a MarkLogic role with only the privileges you actually need (typically just
rest-readerand any application-specific read privileges — norest-writer, nomanage-admin, noany-uri/any-collection update). Create a user bound to that role. SetML_USERNAME/ML_PASSWORDto those credentials. A read-only MarkLogic user makes bypass impossible regardless of what runs on the host. - Tool layer. Keep
ML_READONLY=trueso the MCP server's tool surface is sealed. This is your protection against accidental writes from agents calling write tools by name. - Host layer. Treat the credentials in the MCP server's environment as secrets. Don't run the server on a host that untrusted agents have shell access to.
Inspect the live posture
Read the marklogic://security resource at any time. It reports:
- Active config:
readonly,allowEval,authType, username hint. - Detected warnings, each with a code, severity, message, and remedy:
READONLY_DEFEATED_BY_EVAL(critical) — readonly is on alongside allowEval (eval is auto-disabled; warning explains why).READONLY_WITH_PRIVILEGED_USER(warning) — the configured username looks like an admin account; tool-layer readonly does not provide credential-layer protection.READONLY_POSTURE_OK(info) — clean posture; verify the MarkLogic role is also read-only.
Critical and warning items are also logged at startup.
Agent guidance
The marklogic://instructions resource includes explicit agent guidance: when ML_READONLY=true is set and a write operation is requested, the agent should refuse the operation rather than crafting shell scripts, curl invocations, or side-channel Node code to bypass the safety belt. This is published in the instructions so Claude / Copilot / other MCP clients pick it up.
Other relevant configuration
MCP_API_KEY— set to require Bearer token auth on the HTTP transport.ML_AUTH_TYPE=oauth— Bearer tokens from MCP clients are forwarded directly to MarkLogic; the MCP server never sees credentials, only opaque tokens; MarkLogic enforces per-user RBAC via its own JWT validation. In oauth mode, per-user RBAC is your real readonly mechanism — give each user only the roles they need.- Credentials are read from environment variables only — never hardcoded.
- Digest auth recomputes the challenge per request — no credential caching.
- The Flux runner executes on the MCP server host;
http_urlmust be reachable from that host, not from the user's machine. - In oauth mode,
MCP_API_KEYgateway auth uses theX-MCP-Api-Keyheader to avoid conflicting with theAuthorization: Bearerheader used for the user token.
推荐服务器
Baidu Map
百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright MCP Server
一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。
Magic Component Platform (MCP)
一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。
Audiense Insights MCP Server
通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。
VeyraX
一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。
graphlit-mcp-server
模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。
Kagi MCP Server
一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。
e2b-mcp-server
使用 MCP 通过 e2b 运行代码。
Neon MCP Server
用于与 Neon 管理 API 和数据库交互的 MCP 服务器
Exa MCP Server
模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。