LangExtract MCP Server

LangExtract MCP Server

A FastMCP server that enables AI assistants to extract structured information from unstructured text using Google's langextract library through a secure, optimized Model Context Protocol interface.

Category
访问服务器

Tools

extract_from_text

Extract structured information from text using langextract. Uses Large Language Models to extract structured information from unstructured text based on user-defined instructions and examples. Each extraction is mapped to its exact location in the source text for precise source grounding. Args: text: The text to extract information from prompt_description: Clear instructions for what to extract examples: List of example extractions to guide the model config: Configuration parameters for the extraction Returns: Dictionary containing extracted entities with source locations and metadata Raises: ToolError: If extraction fails due to invalid parameters or API issues

extract_from_url

Extract structured information from text content at a URL. Downloads text from the specified URL and extracts structured information using Large Language Models. Ideal for processing web articles, documents, or any text content accessible via HTTP/HTTPS. Args: url: URL to download text from (must start with http:// or https://) prompt_description: Clear instructions for what to extract examples: List of example extractions to guide the model config: Configuration parameters for the extraction Returns: Dictionary containing extracted entities with source locations and metadata Raises: ToolError: If URL is invalid, download fails, or extraction fails

save_extraction_results

Save extraction results to a JSONL file for later use or visualization. Saves the extraction results in JSONL (JSON Lines) format, which is commonly used for structured data and can be loaded for visualization or further processing. Args: extraction_results: Results from extract_from_text or extract_from_url output_name: Name for the output file (without .jsonl extension) output_dir: Directory to save the file (default: current directory) Returns: Dictionary with file path and save confirmation Raises: ToolError: If save operation fails

generate_visualization

Generate interactive HTML visualization from extraction results. Creates an interactive HTML file that shows extracted entities highlighted in their original text context. The visualization is self-contained and can handle thousands of entities with color coding and hover details. Args: jsonl_file_path: Path to the JSONL file containing extraction results output_html_path: Optional path for the HTML output (default: auto-generated) Returns: Dictionary with HTML file path and generation details Raises: ToolError: If visualization generation fails

list_supported_models

List all supported language models and their characteristics. This server currently supports Google Gemini models only, optimized for reliable structured extraction with schema constraints. Returns: Dictionary containing model information and recommendations

get_server_info

Get information about the LangExtract MCP server. Returns server version, capabilities, and configuration information. Returns: Dictionary containing server information and capabilities

README

LangExtract MCP Server

A FastMCP server that provides Model Context Protocol (MCP) tools for Google's langextract library. This server enables AI assistants like Claude Code to extract structured information from unstructured text using Large Language Models through a secure, optimized MCP interface.

Overview

LangExtract is a Python library that uses LLMs to extract structured information from text documents while maintaining precise source grounding. This MCP server exposes langextract's capabilities through the Model Context Protocol with advanced performance optimizations and enterprise-grade security.

The server includes intelligent caching, persistent connections, and server-side credential management to provide optimal performance in long-running environments like Claude Code while maintaining complete security isolation.

Quick Setup for Claude Code

Prerequisites

  • Claude Code installed and configured
  • Google Gemini API key (Get one here)
  • Python 3.10 or higher

Installation

Install directly into Claude Code using the built-in MCP management:

claude mcp add langextract-mcp -e LANGEXTRACT_API_KEY=your-gemini-api-key -- uv run --with fastmcp fastmcp run src/langextract_mcp/server.py

The server will automatically start and integrate with Claude Code. No additional configuration is required.

Verification

After installation, verify the integration by asking Claude Code:

Use the get_server_info tool to show the LangExtract server capabilities

You should see output indicating the server is running with optimization features enabled.

Available Tools

The server provides six MCP tools optimized for text extraction workflows:

Core Extraction

  • extract_from_text - Extract structured information from provided text
  • extract_from_url - Extract information from web content
  • save_extraction_results - Save results to JSONL format
  • generate_visualization - Create interactive HTML visualizations

Server Information

  • list_supported_models - View available language models and recommendations
  • get_server_info - Check server status and capabilities

Usage Examples

Basic Text Extraction

Ask Claude Code to extract information using natural language:

Extract medication information from this text: "Patient prescribed 500mg amoxicillin twice daily for infection"

Use these examples to guide the extraction:
- Text: "Take 250mg ibuprofen every 4 hours"
- Expected: medication=ibuprofen, dosage=250mg, frequency=every 4 hours

Advanced Configuration

For complex extractions, specify configuration parameters:

Extract character emotions from Shakespeare using:
- Model: gemini-2.5-pro for better literary analysis
- Multiple passes: 3 for comprehensive extraction
- Temperature: 0.2 for consistent results

URL Processing

Extract information directly from web content:

Extract key findings from this research paper: https://arxiv.org/abs/example
Focus on methodology, results, and conclusions

Supported Models

This server currently supports Google Gemini models only, optimized for reliable structured extraction with advanced schema constraints:

  • gemini-2.5-flash - Recommended default - Optimal balance of speed, cost, and quality
  • gemini-2.5-pro - Best for complex reasoning and analysis tasks requiring highest accuracy

The server uses persistent connections, schema caching, and connection pooling for optimal performance with Gemini models. Support for additional providers may be added in future versions.

Configuration Reference

Environment Variables

Set during installation or in server environment:

LANGEXTRACT_API_KEY=your-gemini-api-key  # Required

Tool Parameters

Configure extraction behavior through tool parameters:

{
    "model_id": "gemini-2.5-flash",     # Language model selection
    "max_char_buffer": 1000,            # Text chunk size
    "temperature": 0.5,                 # Sampling temperature (0.0-1.0)  
    "extraction_passes": 1,             # Number of extraction attempts
    "max_workers": 10                   # Parallel processing threads
}

Output Format

All extractions return consistent structured data:

{
    "document_id": "doc_123",
    "total_extractions": 5,
    "extractions": [
        {
            "extraction_class": "medication", 
            "extraction_text": "amoxicillin",
            "attributes": {"type": "antibiotic"},
            "start_char": 25,
            "end_char": 35
        }
    ],
    "metadata": {
        "model_id": "gemini-2.5-flash",
        "extraction_passes": 1,
        "temperature": 0.5
    }
}

Use Cases

LangExtract MCP Server supports a wide range of use cases across multiple domains. In healthcare and life sciences, it can extract medications, dosages, and treatment protocols from clinical notes, structure radiology and pathology reports, and process research papers or clinical trial data. For legal and compliance applications, it enables extraction of contract terms, parties, and obligations, as well as analysis of regulatory documents, compliance reports, and case law. In research and academia, the server is useful for extracting methodologies, findings, and citations from papers, analyzing survey responses and interview transcripts, and processing historical or archival materials. For business intelligence, it helps extract insights from customer feedback and reviews, analyze news articles and market reports, and process financial documents and earnings reports.

Support and Documentation

Primary Resources:

推荐服务器

Baidu Map

Baidu Map

百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。

官方
精选
JavaScript
Playwright MCP Server

Playwright MCP Server

一个模型上下文协议服务器,它使大型语言模型能够通过结构化的可访问性快照与网页进行交互,而无需视觉模型或屏幕截图。

官方
精选
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

一个由人工智能驱动的工具,可以从自然语言描述生成现代化的用户界面组件,并与流行的集成开发环境(IDE)集成,从而简化用户界面开发流程。

官方
精选
本地
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

通过模型上下文协议启用与 Audiense Insights 账户的交互,从而促进营销洞察和受众数据的提取和分析,包括人口统计信息、行为和影响者互动。

官方
精选
本地
TypeScript
VeyraX

VeyraX

一个单一的 MCP 工具,连接你所有喜爱的工具:Gmail、日历以及其他 40 多个工具。

官方
精选
本地
graphlit-mcp-server

graphlit-mcp-server

模型上下文协议 (MCP) 服务器实现了 MCP 客户端与 Graphlit 服务之间的集成。 除了网络爬取之外,还可以将任何内容(从 Slack 到 Gmail 再到播客订阅源)导入到 Graphlit 项目中,然后从 MCP 客户端检索相关内容。

官方
精选
TypeScript
Kagi MCP Server

Kagi MCP Server

一个 MCP 服务器,集成了 Kagi 搜索功能和 Claude AI,使 Claude 能够在回答需要最新信息的问题时执行实时网络搜索。

官方
精选
Python
e2b-mcp-server

e2b-mcp-server

使用 MCP 通过 e2b 运行代码。

官方
精选
Neon MCP Server

Neon MCP Server

用于与 Neon 管理 API 和数据库交互的 MCP 服务器

官方
精选
Exa MCP Server

Exa MCP Server

模型上下文协议(MCP)服务器允许像 Claude 这样的 AI 助手使用 Exa AI 搜索 API 进行网络搜索。这种设置允许 AI 模型以安全和受控的方式获取实时的网络信息。

官方
精选