Unstructured API MCP Server

Unstructured API MCP Server

一个 MCP 服务器实现,它能够与 Unstructured API 交互,并提供列出、创建、更新和管理源、目标和工作流的工具。

内容获取
数据库交互
数据与应用分析
访问服务器

Tools

create_s3_source

Create an S3 source connector. Args: name: A unique name for this connector remote_url: The S3 URI to the bucket or folder (e.g., s3://my-bucket/) recursive: Whether to access subfolders within the bucket Returns: String containing the created source connector information

update_s3_source

Update an S3 source connector. Args: source_id: ID of the source connector to update remote_url: The S3 URI to the bucket or folder recursive: Whether to access subfolders within the bucket Returns: String containing the updated source connector information

delete_s3_source

Delete an S3 source connector. Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion

create_azure_source

Create an Azure source connector. Args: name: A unique name for this connector remote_url: The Azure Storage remote URL, with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed> recursive: Whether to access subfolders within the bucket Returns: String containing the created source connector information

update_azure_source

Update an azure source connector. Args: source_id: ID of the source connector to update remote_url: The Azure Storage remote URL, with the format az://<container-name>/<path/to/file/or/folder/in/container/as/needed> recursive: Whether to access subfolders within the bucket Returns: String containing the updated source connector information

delete_azure_source

Delete an azure source connector. Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion

create_gdrive_source

Create an gdrive source connector. Args: name: A unique name for this connector remote_url: The gdrive URI to the bucket or folder (e.g., gdrive://my-bucket/) recursive: Whether to access subfolders within the bucket Returns: String containing the created source connector information

update_gdrive_source

Update an gdrive source connector. Args: source_id: ID of the source connector to update remote_url: The gdrive URI to the bucket or folder recursive: Whether to access subfolders within the bucket Returns: String containing the updated source connector information

delete_gdrive_source

Delete an gdrive source connector. Args: source_id: ID of the source connector to delete Returns: String containing the result of the deletion

create_s3_destination

Create an S3 destination connector. Args: name: A unique name for this connector remote_url: The S3 URI to the bucket or folder key: The AWS access key ID secret: The AWS secret access key token: The AWS STS session token for temporary access (optional) endpoint_url: Custom URL if connecting to a non-AWS S3 bucket Returns: String containing the created destination connector information

update_s3_destination

Update an S3 destination connector. Args: destination_id: ID of the destination connector to update remote_url: The S3 URI to the bucket or folder Returns: String containing the updated destination connector information

delete_s3_destination

Delete an S3 destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_weaviate_destination

Create an weaviate vector database destination connector. Args: cluster_url: URL of the weaviate cluster collection : Name of the collection to use in the weaviate cluster Note: The collection is a table in the weaviate cluster. In platform, there are dedicated code to generate collection for users here, due to the simplicity of the server, we are not generating it for users. Returns: String containing the created destination connector information

update_weaviate_destination

Update an weaviate destination connector. Args: destination_id: ID of the destination connector to update cluster_url (optional): URL of the weaviate cluster collection (optional): Name of the collection(like a file) to use in the weaviate cluster Returns: String containing the updated destination connector information

delete_weaviate_destination

Delete an weaviate destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_astradb_destination

Create an AstraDB destination connector. Args: name: A unique name for this connector collection_name: The name of the collection to use keyspace: The AstraDB keyspace batch_size: The batch size for inserting documents, must be positive (default: 20) Note: A collection in AstraDB is a schemaless document store optimized for NoSQL workloads, equivalent to a table in traditional databases. A keyspace is the top-level namespace in AstraDB that groups multiple collections. We require the users to create their own collection and keyspace before creating the connector. Returns: String containing the created destination connector information

update_astradb_destination

Update an AstraDB destination connector. Args: destination_id: ID of the destination connector to update collection_name: The name of the collection to use (optional) keyspace: The AstraDB keyspace (optional) batch_size: The batch size for inserting documents (optional) Note: We require the users to create their own collection and keyspace before creating the connector. Returns: String containing the updated destination connector information

delete_astradb_destination

Delete an AstraDB destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

create_neo4j_destination

Create an neo4j destination connector. Args: name: A unique name for this connector database: The neo4j database, e.g. "neo4j" uri: The neo4j URI, e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io username: The neo4j username Returns: String containing the created destination connector information

update_neo4j_destination

Update an neo4j destination connector. Args: destination_id: ID of the destination connector to update database: The neo4j database, e.g. "neo4j" uri: The neo4j URI, e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io username: The neo4j username Returns: String containing the updated destination connector information

delete_neo4j_destination

Delete an neo4j destination connector. Args: destination_id: ID of the destination connector to delete Returns: String containing the result of the deletion

invoke_firecrawl_crawlhtml

Start an asynchronous web crawl job using Firecrawl to retrieve HTML content. Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded limit: Maximum number of pages to crawl (default: 100) Returns: Dictionary with crawl job information including the job ID

check_crawlhtml_status

Check the status of an existing Firecrawl HTML crawl job. Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job

invoke_firecrawl_llmtxt

Start an asynchronous llmfull.txt generation job using Firecrawl. This file is a standardized markdown file containing information to help LLMs use a website at inference time. The llmstxt endpoint leverages Firecrawl to crawl your website and extracts data using gpt-4o-mini Args: url: URL to crawl s3_uri: S3 URI where results will be uploaded max_urls: Maximum number of pages to crawl (1-100, default: 10) Returns: Dictionary with job information including the job ID

check_llmtxt_status

Check the status of an existing llmfull.txt generation job. Args: job_id: ID of the llmfull.txt generation job to check Returns: Dictionary containing the current status of the job and text content if completed

cancel_crawlhtml_job

Cancel an in-progress Firecrawl HTML crawl job. Args: crawl_id: ID of the crawl job to cancel Returns: Dictionary containing the result of the cancellation

list_sources

List available sources from the Unstructured API. Args: source_type: Optional source connector type to filter by Returns: String containing the list of sources

get_source_info

Get detailed information about a specific source connector. Args: source_id: ID of the source connector to get information for, should be valid UUID Returns: String containing the source connector information

list_destinations

List available destinations from the Unstructured API. Args: destination_type: Optional destination connector type to filter by Returns: String containing the list of destinations

get_destination_info

Get detailed information about a specific destination connector. Args: destination_id: ID of the destination connector to get information for Returns: String containing the destination connector information

list_workflows

List workflows from the Unstructured API. Args: destination_id: Optional destination connector ID to filter by source_id: Optional source connector ID to filter by status: Optional workflow status to filter by Returns: String containing the list of workflows

get_workflow_info

Get detailed information about a specific workflow. Args: workflow_id: ID of the workflow to get information for Returns: String containing the workflow information

create_workflow

Create a new workflow. Args: workflow_config: A Typed Dictionary containing required fields (destination_id - should be a valid UUID, name, source_id - should be a valid UUID, workflow_type) and non-required fields (schedule, and workflow_nodes). Note workflow_nodes is only enabled when workflow_type is `custom` and is a list of WorkflowNodeTypedDict: partition, prompter,chunk, embed Below is an example of a partition workflow node: { "name": "vlm-partition", "type": "partition", "sub_type": "vlm", "settings": { "provider": "your favorite provider", "model": "your favorite model" } } Returns: String containing the created workflow information Custom workflow DAG nodes - If WorkflowType is set to custom, you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array. - A Source node is automatically created when you specify the source_id value outside of the workflow_nodes array. - A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array. - You can specify Partitioner, Chunker, Prompter, and Embedder nodes. - The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array. - Be sure to specify nodes in the allowed order. The following DAG placements are all allowed: - Source -> Partitioner -> Destination, - Source -> Partitioner -> Chunker -> Destination, - Source -> Partitioner -> Chunker -> Embedder -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Embedder -> Destination Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast. Examples: - auto strategy: { "name": "Partitioner", "type": "partition", "subtype": "vlm", "settings": { "provider": "anthropic", (required) "model": "claude-3-5-sonnet-20241022", (required) "output_format": "text/html", "user_prompt": null, "format_html": true, "unique_element_ids": true, "is_dynamic": true, "allow_fast": true } } - vlm strategy: Allowed values are provider and model. Below are examples: - "provider": "anthropic" "model": "claude-3-5-sonnet-20241022", - "provider": "openai" "model": "gpt-4o" - hi_res strategy: { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "hi_res", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language>", "<language>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } - fast strategy { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "fast", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language-code>", "<language-code>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title. - chunk_by_character { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_character", "settings": { "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } - chunk_by_title { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_title", "settings": { "multipage_sections": <true|false>, "combine_text_under_n_chars": <combine-text-under-n-chars>, "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } Prompter node An Prompter node has a type of prompter and subtype of: - openai_image_description, - anthropic_image_description, - bedrock_image_description, - vertexai_image_description, - openai_table_description, - anthropic_table_description, - bedrock_table_description, - vertexai_table_description, - openai_table2html, - openai_ner Example: { "name": "Prompter", "type": "prompter", "subtype": "<subtype>", "settings": {} } Embedder node An Embedder node has a type of embed Allowed values for subtype and model_name include: - "subtype": "azure_openai" - "model_name": "text-embedding-3-small" - "model_name": "text-embedding-3-large" - "model_name": "text-embedding-ada-002" - "subtype": "bedrock" - "model_name": "amazon.titan-embed-text-v2:0" - "model_name": "amazon.titan-embed-text-v1" - "model_name": "amazon.titan-embed-image-v1" - "model_name": "cohere.embed-english-v3" - "model_name": "cohere.embed-multilingual-v3" - "subtype": "togetherai": - "model_name": "togethercomputer/m2-bert-80M-2k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-8k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-32k-retrieval" Example: { "name": "Embedder", "type": "embed", "subtype": "<subtype>", "settings": { "model_name": "<model-name>" } }

run_workflow

Run a specific workflow. Args: workflow_id: ID of the workflow to run Returns: String containing the response from the workflow execution

update_workflow

Update an existing workflow. Args: workflow_id: ID of the workflow to update workflow_config: A Typed Dictionary containing required fields (destination_id, name, source_id, workflow_type) and non-required fields (schedule, and workflow_nodes) Returns: String containing the updated workflow information Custom workflow DAG nodes - If WorkflowType is set to custom, you must also specify the settings for the workflow’s directed acyclic graph (DAG) nodes. These nodes’ settings are specified in the workflow_nodes array. - A Source node is automatically created when you specify the source_id value outside of the workflow_nodes array. - A Destination node is automatically created when you specify the destination_id value outside of the workflow_nodes array. - You can specify Partitioner, Chunker, Prompter, and Embedder nodes. - The order of the nodes in the workflow_nodes array will be the same order that these nodes appear in the DAG, with the first node in the array added directly after the Source node. The Destination node follows the last node in the array. - Be sure to specify nodes in the allowed order. The following DAG placements are all allowed: - Source -> Partitioner -> Destination, - Source -> Partitioner -> Chunker -> Destination, - Source -> Partitioner -> Chunker -> Embedder -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Destination, - Source -> Partitioner -> Prompter -> Chunker -> Embedder -> Destination Partitioner node A Partitioner node has a type of partition and a subtype of auto, vlm, hi_res, or fast. Examples: - auto strategy: { "name": "Partitioner", "type": "partition", "subtype": "vlm", "settings": { "provider": "anthropic", (required) "model": "claude-3-5-sonnet-20241022", (required) "output_format": "text/html", "user_prompt": null, "format_html": true, "unique_element_ids": true, "is_dynamic": true, "allow_fast": true } } - vlm strategy: Allowed values are provider and model. Below are examples: - "provider": "anthropic" "model": "claude-3-5-sonnet-20241022", - "provider": "openai" "model": "gpt-4o" - hi_res strategy: { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "hi_res", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language>", "<language>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } - fast strategy { "name": "Partitioner", "type": "partition", "subtype": "unstructured_api", "settings": { "strategy": "fast", "include_page_breaks": <true|false>, "pdf_infer_table_structure": <true|false>, "exclude_elements": [ "<element-name>", "<element-name>" ], "xml_keep_tags": <true|false>, "encoding": "<encoding>", "ocr_languages": [ "<language-code>", "<language-code>" ], "extract_image_block_types": [ "image", "table" ], "infer_table_structure": <true|false> } } Chunker node A Chunker node has a type of chunk and subtype of chunk_by_character or chunk_by_title. - chunk_by_character { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_character", "settings": { "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } - chunk_by_title { "name": "Chunker", "type": "chunk", "subtype": "chunk_by_title", "settings": { "multipage_sections": <true|false>, "combine_text_under_n_chars": <combine-text-under-n-chars>, "include_orig_elements": <true|false>, "new_after_n_chars": <new-after-n-chars>, (required, if not provided set same as max_characters) "max_characters": <max-characters>, (required) "overlap": <overlap>, (required, if not provided set default to 0) "overlap_all": <true|false>, "contextual_chunking_strategy": "v1" } } Prompter node An Prompter node has a type of prompter and subtype of: - openai_image_description, - anthropic_image_description, - bedrock_image_description, - vertexai_image_description, - openai_table_description, - anthropic_table_description, - bedrock_table_description, - vertexai_table_description, - openai_table2html, - openai_ner Example: { "name": "Prompter", "type": "prompter", "subtype": "<subtype>", "settings": {} } Embedder node An Embedder node has a type of embed Allowed values for subtype and model_name include: - "subtype": "azure_openai" - "model_name": "text-embedding-3-small" - "model_name": "text-embedding-3-large" - "model_name": "text-embedding-ada-002" - "subtype": "bedrock" - "model_name": "amazon.titan-embed-text-v2:0" - "model_name": "amazon.titan-embed-text-v1" - "model_name": "amazon.titan-embed-image-v1" - "model_name": "cohere.embed-english-v3" - "model_name": "cohere.embed-multilingual-v3" - "subtype": "togetherai": - "model_name": "togethercomputer/m2-bert-80M-2k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-8k-retrieval" - "model_name": "togethercomputer/m2-bert-80M-32k-retrieval" Example: { "name": "Embedder", "type": "embed", "subtype": "<subtype>", "settings": { "model_name": "<model-name>" } }

delete_workflow

Delete a specific workflow. Args: workflow_id: ID of the workflow to delete Returns: String containing the response from the workflow deletion

list_jobs

List jobs via the Unstructured API. Args: workflow_id: Optional workflow ID to filter by status: Optional job status to filter by Returns: String containing the list of jobs

get_job_info

Get detailed information about a specific job. Args: job_id: ID of the job to get information for Returns: String containing the job information

cancel_job

Delete a specific job. Args: job_id: ID of the job to cancel Returns: String containing the response from the job cancellation

README

Unstructured API MCP 服务器

一个用于与 Unstructured API 交互的 MCP 服务器实现。该服务器提供列出源和工作流的工具。

可用工具

工具 描述
list_sources 列出 Unstructured API 中可用的源。
get_source_info 获取有关特定源连接器的详细信息。
create_[connector]_source 创建一个源连接器。目前,我们有 s3/google drive/azure 连接器(更多即将推出!)
update_[connector]_source 通过参数更新现有的源连接器。
delete_[connector]_source 通过源 ID 删除源连接器。
list_destinations 列出 Unstructured API 中可用的目标。
get_destination_info 获取有关特定目标连接器的详细信息。目前,我们有 s3/weaviate/astra/neo4j/mongo DB(更多即将推出!)
create_[connector]_destination 通过参数创建目标连接器。
update_[connector]_destination 通过目标 ID 更新现有的目标连接器。
delete_[connector]_destination 通过目标 ID 删除目标连接器。
list_workflows 列出 Unstructured API 中的工作流。
get_workflow_info 获取有关特定工作流的详细信息。
create_workflow 使用源、目标 ID 等创建一个新的工作流。
run_workflow 使用工作流 ID 运行特定的工作流。
update_workflow 通过参数更新现有的工作流。
delete_workflow 通过 ID 删除特定的工作流。
list_jobs 列出 Unstructured API 中特定工作流的作业。
get_job_info 通过作业 ID 获取有关特定作业的详细信息。
cancel_job 通过 ID 删除特定的作业。

以下是 UNS-MCP 服务器当前支持的连接器列表,请在此处查看 Unstructured 平台支持的完整源连接器列表 here 和目标列表 here。 我们计划添加更多!

目标
S3 S3
Azure Weaviate
Google Drive Pinecone
OneDrive AstraDB
Salesforce MongoDB
Sharepoint Neo4j
Databricks Volumes
Databricks Volumes Delta Table

要使用创建/更新/删除连接器的工具,必须在您的 .env 文件中定义该特定连接器的凭据。 以下是我们支持的连接器的 credentials 列表:

凭据名称 描述
ANTHROPIC_API_KEY 运行 minimal_client 以与我们的服务器交互所必需的。
AWS_KEY, AWS_SECRET 通过 uns-mcp 服务器创建 S3 连接器所必需的,请参阅 documentationhere 中的说明
WEAVIATE_CLOUD_API_KEY 创建 Weaviate 向量数据库连接器所必需的,请参阅 documentation 中的说明
FIRECRAWL_API_KEY external/firecrawl.py 中使用 Firecrawl 工具所必需的,请在 Firecrawl 上注册并获取 API 密钥。
ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT 通过 uns-mcp 服务器创建 Astradb 连接器所必需的,请参阅 documentation 中的说明
AZURE_CONNECTION_STRING 通过 uns-mcp 服务器创建 Azure 连接器的必需选项 1,请参阅 documentation 中的说明
AZURE_ACCOUNT_NAME+AZURE_ACCOUNT_KEY 通过 uns-mcp 服务器创建 Azure 连接器的必需选项 2,请参阅 documentation 中的说明
AZURE_ACCOUNT_NAME+AZURE_SAS_TOKEN 通过 uns-mcp 服务器创建 Azure 连接器的必需选项 3,请参阅 documentation 中的说明
NEO4J_PASSWORD 通过 uns-mcp 服务器创建 Neo4j 连接器所必需的,请参阅 documentation 中的说明
MONGO_DB_CONNECTION_STRING 通过 uns-mcp 服务器创建 Mongodb 连接器所必需的,请参阅 documentation 中的说明
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY 一个字符串值。 原始服务器帐户密钥(请遵循 documentation)位于 json 文件中,在终端中运行 `cat /path/to/google_service_account_key.json
DATABRICKS_CLIENT_ID,DATABRICKS_CLIENT_SECRET 通过 uns-mcp 服务器创建 Databricks volume/delta table 连接器所必需的,请参阅 documentationhere 中的说明
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_CRED,ONEDRIVE_TENANT_ID 通过 uns-mcp 服务器创建 One Drive 连接器所必需的,请参阅 documentation 中的说明
PINECONE_API_KEY 通过 uns-mcp 服务器创建 Pinecone 向量数据库连接器所必需的,请参阅 documentation 中的说明
SALESFORCE_CONSUMER_KEY,SALESFORCE_PRIVATE_KEY 通过 uns-mcp 服务器创建 salesforce 源连接器所必需的,请参阅 documentation
SHAREPOINT_CLIENT_ID, SHAREPOINT_CLIENT_CRED,SHAREPOINT_TENANT_ID 通过 uns-mcp 服务器创建 One Drive 连接器所必需的,请参阅 documentation 中的说明
LOG_LEVEL 用于为我们的 minimal_client 设置日志记录级别,例如,设置为 ERROR 以获取所有内容
CONFIRM_TOOL_USE 设置为 true,以便 minimal_client 可以在每次工具调用之前确认执行
DEBUG_API_REQUESTS 设置为 true,以便 uns_mcp/server.py 可以输出请求参数以进行更好的调试

Firecrawl 源

Firecrawl 是一个 Web 爬取 API,在我们的 MCP 中提供两个主要功能:

  1. HTML 内容检索:使用 invoke_firecrawl_crawlhtml 启动爬取作业,并使用 check_crawlhtml_status 监视它们
  2. LLM 优化文本生成:使用 invoke_firecrawl_llmtxt 生成文本,并使用 check_llmtxt_status 检索结果

Firecrawl 的工作原理:

Web 爬取过程:

  • 从指定的 URL 开始并分析它以识别链接
  • 如果可用,则使用站点地图;否则,遵循网站上找到的链接
  • 递归地遍历每个链接以发现所有子页面
  • 收集每个访问页面的内容,处理 JavaScript 渲染和速率限制
  • 如果需要,可以使用 cancel_crawlhtml_job 取消作业
  • 如果您需要将所有信息提取到原始 HTML 中,请使用此功能,Unstructured 的工作流程可以很好地清理它 :smile:

LLM 文本生成:

  • 爬取后,从爬取的页面中提取干净、有意义的文本内容
  • 生成专门为大型语言模型格式化的优化文本格式
  • 结果会自动上传到指定的 S3 位置
  • 注意:LLM 文本生成作业一旦启动就无法取消。 提供 cancel_llmtxt_job 函数是为了保持一致性,但 Firecrawl API 目前不支持它。

注意:必须设置 FIRECRAWL_API_KEY 环境变量才能使用这些函数。

安装和配置

本指南提供使用 Python 3.12 和 uv 工具设置和配置 UNS_MCP 服务器的分步说明。

先决条件

  • Python 3.12+
  • 用于环境管理的 uv
  • 来自 Unstructured 的 API 密钥。 您可以在 here 注册并获取您的 API 密钥。

使用 uv(推荐)

使用 uvx 时不需要额外的安装,因为它处理执行。 但是,如果您希望直接安装该软件包:

uv pip install uns_mcp

配置 Claude Desktop

要与 Claude Desktop 集成,请将以下内容添加到您的 claude_desktop_config.json

注意: 该文件位于 ~/Library/Application Support/Claude/ 目录中。

使用 uvx 命令:

{
   "mcpServers": {
      "UNS_MCP": {
         "command": "uvx",
         "args": ["uns_mcp"],
         "env": {
           "UNSTRUCTURED_API_KEY": "<your-key>"
         }
      }
   }
}

或者,使用 Python 包:

{
   "mcpServers": {
      "UNS_MCP": {
         "command": "python",
         "args": ["-m", "uns_mcp"],
         "env": {
           "UNSTRUCTURED_API_KEY": "<your-key>"
         }
      }
   }
}

使用源代码

  1. 克隆存储库。

  2. 安装依赖项:

    uv sync
    
  3. 将您的 Unstructured API 密钥设置为环境变量。 在根目录中创建一个 .env 文件,其中包含以下内容:

    UNSTRUCTURED_API_KEY="YOUR_KEY"
    

    有关可配置的环境变量,请参阅 .env.template

您现在可以使用以下方法之一运行服务器:

<details> <summary> 使用可编辑的软件包安装 </summary> 以可编辑的软件包安装:

uvx pip install -e .

更新您的 Claude Desktop 配置:

{
  "mcpServers": {
    "UNS_MCP": {
      "command": "uvx",
      "args": ["uns_mcp"]
    }
  }
}

注意: 请记住指向安装软件包的环境中的 uvx 可执行文件

</details>

<details> <summary> 使用 SSE 服务器协议 </summary>

注意:Claude Desktop 不支持。

对于 SSE 协议,您可以通过分离客户端和服务器来更轻松地进行调试:

  1. 在一个终端中启动服务器:

    uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080
    # 或
    make sse-server
    
  2. 在另一个终端中使用本地客户端测试服务器:

    uv run python minimal_client/client.py "http://127.0.0.1:8080/sse"
    # 或
    make sse-client
    

注意: 要停止服务,请先在客户端上使用 Ctrl+C,然后再在服务器上使用。 </details>

<details> <summary> 使用 Stdio 服务器协议 </summary>

配置 Claude Desktop 以使用 stdio:

{
  "mcpServers": {
    "UNS_MCP": {
      "command": "ABSOLUTE/PATH/TO/.local/bin/uv",
      "args": [
        "--directory",
        "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp",
        "run",
        "server.py"
      ]
    }
  }
}

或者,运行本地客户端:

uv run python minimal_client/client.py uns_mcp/server.py

</details>

其他本地客户端配置

使用环境变量配置最小客户端:

  • LOG_LEVEL="ERROR":设置为禁止 LLM 的调试输出,为用户显示清晰的消息。
  • CONFIRM_TOOL_USE='false':在执行之前禁用工具使用确认。 谨慎使用,尤其是在开发期间,因为 LLM 可能会执行昂贵的工作流程或删除数据。

调试工具

Anthropic 提供 MCP Inspector 工具来调试/测试您的 MCP 服务器。 运行以下命令以启动调试 UI。 从那里,您将能够在左侧窗格中添加环境变量(指向您的本地环境)。 在那里包含您的个人 API 密钥作为 env var。 转到 tools,您可以测试添加到 MCP 服务器的功能。

mcp dev uns_mcp/server.py

如果您需要将请求调用参数记录到 UnstructuredClient,请设置环境变量 DEBUG_API_REQUESTS=false。 日志存储在格式为 unstructured-client-{date}.log 的文件中,可以检查该文件以调试对 UnstructuredClient 函数的请求调用参数。

添加对最小客户端的终端访问

我们将使用 @wonderwhy-er/desktop-commander 将终端访问添加到最小客户端。 它构建在 MCP 文件系统服务器之上。 请小心,因为客户端(也是 LLM)现在可以访问私有文件。

执行以下命令以安装软件包:

npx @wonderwhy-er/desktop-commander setup

然后使用额外的参数启动客户端:

uv run python minimal_client/client.py "http://127.0.0.1:8080/sse" "@wonderwhy-er/desktop-commander"
# 或
make sse-client-terminal

使用工具的子集

如果您的客户端仅支持使用工具的子集,则应注意以下事项:

  • update_workflow 工具必须与 create_workflow 工具一起加载到上下文中,因为它包含有关如何创建和配置自定义节点的详细说明。

已知问题

  • update_workflow - 需要在上下文中具有它正在更新的工作流程的配置,或者由用户提供,或者通过调用 get_workflow_info 工具来提供,因为此工具不作为 patch 应用器工作,它会完全替换工作流程配置。

CHANGELOG.md

任何新开发的功能/修复/增强都将添加到 CHANGELOG.md。 在我们升级到稳定版本之前,首选 0.x.x-dev 预发布格式。

故障排除

  • 如果您遇到 Error: spawn <command> ENOENT 问题,则表示未安装 <command> 或在您的 PATH 中不可见:
    • 确保安装它并将其添加到您的 PATH。
    • 或者在您的配置的 command 字段中提供命令的绝对路径。 因此,例如,将 python 替换为 /opt/miniconda3/bin/python

推荐服务器

VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选
mult-fetch-mcp-server

mult-fetch-mcp-server

一个多功能的、符合 MCP 规范的网页内容抓取工具,支持多种模式(浏览器/Node)、格式(HTML/JSON/Markdown/文本)和智能代理检测,并提供双语界面(英语/中文)。

精选
本地
AIO-MCP Server

AIO-MCP Server

🚀 集成了 AI 搜索、RAG 和多服务(GitLab/Jira/Confluence/YouTube)的一体化 MCP 服务器,旨在增强 AI 驱动的开发工作流程。来自 Folk。

精选
本地
Knowledge Graph Memory Server

Knowledge Graph Memory Server

为 Claude 实现持久性记忆,使用本地知识图谱,允许 AI 记住用户的信息,并可在自定义位置存储,跨对话保持记忆。

精选
本地
Hyperbrowser

Hyperbrowser

欢迎来到 Hyperbrowser,人工智能的互联网。Hyperbrowser 是下一代平台,旨在增强人工智能代理的能力,并实现轻松、可扩展的浏览器自动化。它专为人工智能开发者打造,消除了本地基础设施和性能瓶颈带来的麻烦,让您能够:

精选
本地
any-chat-completions-mcp

any-chat-completions-mcp

将 Claude 与任何 OpenAI SDK 兼容的聊天完成 API 集成 - OpenAI、Perplexity、Groq、xAI、PyroPrompts 等。

精选
Exa MCP Server

Exa MCP Server

一个模型上下文协议服务器,它使像 Claude 这样的人工智能助手能够以安全和受控的方式,使用 Exa AI 搜索 API 执行实时网络搜索。

精选
BigQuery MCP Server

BigQuery MCP Server

这是一个服务器,可以让你的大型语言模型(LLM,比如Claude)直接与你的BigQuery数据对话!可以把它想象成一个友好的翻译器,它位于你的AI助手和数据库之间,确保它们可以安全高效地进行交流。

精选