MCP 服务器

Llama 4 Maverick MCP Server

Bridges Llama models with Claude Desktop through Ollama, enabling privacy-first local AI operations with 10+ built-in tools for file operations, web search, calculations, and custom model deployment. Features streaming support, hybrid intelligence workflows, and extensive Python ecosystem integration for research, development, and enterprise applications.

README

🦙 Llama 4 Maverick MCP Server (Python)

Author: Yobie Benjamin
Version: 0.9
Date: August 1, 2025

A Python implementation of the Model Context Protocol (MCP) server that bridges Llama models with Claude Desktop through Ollama. This pure Python solution offers clean architecture, high performance, and easy extensibility.

📚 Table of Contents

What Would You Use This Llama MCP Server For?
Why Python?
Features
System Requirements
Quick Start
Detailed Installation
Configuration
Available Tools
Usage Examples
Real-World Applications
Development
Performance Optimization
Troubleshooting
Contributing

🎯 What Would You Use This Llama MCP Server For?

The Revolution of Local AI + Claude Desktop

This Python MCP server creates a powerful bridge between Claude Desktop's sophisticated interface and your locally-hosted Llama models. Here's what makes this combination revolutionary:

1. Privacy-First AI Operations 🔒

The Challenge: Organizations handling sensitive data can't use cloud AI due to privacy concerns.

The Solution: This MCP server keeps everything local while providing enterprise-grade AI capabilities.

Real-World Applications:

Healthcare: A hospital can analyze patient records using AI without violating HIPAA compliance
Legal: Law firms can process confidential client documents with complete privacy
Finance: Banks can analyze transaction data without exposing customer information
Government: Agencies can process classified documents on air-gapped systems

Example Implementation:

# Process sensitive medical records locally
async def analyze_patient_data(patient_file):
    # Data never leaves your server
    content = await tool_manager.execute("read_file", {"path": patient_file})
    
    # Use specialized medical model
    analysis = await llama_service.complete(
        prompt=f"Analyze patient data for risk factors: {content}",
        model="medical-llama:latest",  # Your HIPAA-compliant fine-tuned model
        temperature=0.1  # Low temperature for medical accuracy
    )
    
    # Store results locally with encryption
    await secure_storage.save(analysis, encrypted=True)

2. Custom Model Deployment 🎯

The Challenge: Generic models don't understand your domain-specific language and requirements.

The Solution: Deploy your own fine-tuned models through the MCP interface.

Real-World Applications:

Research Labs: Use models trained on proprietary research data
Enterprises: Deploy models fine-tuned on company documentation
Educational Institutions: Use models trained on curriculum-specific content
Industry-Specific: Legal, medical, financial, or technical domain models

Example Implementation:

# Switch between specialized models based on task
class ModelSelector:
    def __init__(self):
        self.models = {
            "general": "llama3:latest",
            "code": "codellama:latest",
            "medical": "medical-llama:13b",
            "legal": "legal-llama:7b",
            "finance": "finance-llama:13b"
        }
    
    async def select_and_query(self, domain: str, prompt: str):
        model = self.models.get(domain, "llama3:latest")
        return await llama_service.complete(
            prompt=prompt,
            model=model,
            temperature=0.3 if domain in ["medical", "legal"] else 0.7
        )

3. Hybrid Intelligence Systems 🔄

The Challenge: No single AI model excels at everything.

The Solution: Combine Claude's reasoning with Llama's generation capabilities.

Real-World Applications:

Software Development: Claude plans architecture, Llama generates implementation
Content Creation: Claude creates outlines, Llama writes detailed content
Data Analysis: Claude interprets results, Llama generates reports
Research: Claude formulates hypotheses, Llama explores implications

Example Implementation:

# Hybrid workflow combining Claude and Llama
class HybridAI:
    async def complex_task(self, requirement: str):
        # Step 1: Use Claude for high-level planning
        plan = await claude.create_plan(requirement)
        
        # Step 2: Use local Llama for detailed implementation
        implementation = await llama_service.complete(
            prompt=f"Implement this plan: {plan}",
            model="codellama:34b",
            max_tokens=4096
        )
        
        # Step 3: Use Claude for review and refinement
        refined = await claude.review_and_refine(implementation)
        
        return refined

4. Offline and Edge Computing 🌐

The Challenge: Many environments lack reliable internet or prohibit cloud connections.

The Solution: Full AI capabilities without any internet requirement.

Real-World Applications:

Remote Operations: Oil rigs, ships, remote research stations
Industrial IoT: Factory floors with real-time requirements
Field Work: Geological surveys, wildlife research, disaster response
Secure Facilities: Military bases, research labs, government buildings

Example Implementation:

# Edge deployment for industrial quality control
class EdgeQualityControl:
    def __init__(self):
        self.config = Config(
            llama_model_name="quality-control:latest",
            enable_streaming=True,
            max_context_length=8192  # Optimized for edge devices
        )
        
    async def inspect_product(self, sensor_data: dict):
        # Process sensor data locally
        analysis = await llama_service.complete(
            prompt=f"Analyze sensor readings for defects: {sensor_data}",
            temperature=0.1,  # Consistent results needed
            max_tokens=256   # Quick response for real-time processing
        )
        
        # Trigger local actions based on analysis
        if "defect" in analysis.lower():
            await self.trigger_alert(analysis)
        
        return analysis

5. Experimentation and Research 🧪

The Challenge: Researchers need reproducible results and full control over model behavior.

The Solution: Complete transparency and control over every aspect of the AI pipeline.

Real-World Applications:

Academic Research: Reproducible experiments for papers
Model Comparison: A/B testing different models and parameters
Behavior Analysis: Understanding how models respond to different inputs
Prompt Engineering: Developing optimal prompts for specific tasks

Example Implementation:

# Research experiment framework
class ExperimentRunner:
    async def run_experiment(self, hypothesis: str, test_cases: list):
        results = []
        
        # Test multiple models
        for model in ["llama3:7b", "llama3:13b", "llama3:70b"]:
            # Test multiple parameters
            for temp in [0.1, 0.5, 0.9, 1.5]:
                model_results = []
                
                for test in test_cases:
                    response = await llama_service.complete(
                        prompt=test,
                        model=model,
                        temperature=temp,
                        seed=42  # Reproducible results
                    )
                    
                    model_results.append({
                        "input": test,
                        "output": response,
                        "model": model,
                        "temperature": temp,
                        "timestamp": datetime.now()
                    })
                
                results.append(model_results)
        
        # Analyze and save results
        analysis = self.analyze_results(results)
        await self.save_experiment(hypothesis, results, analysis)
        
        return analysis

6. Cost-Effective Scaling 💰

The Challenge: API costs can become prohibitive for high-volume applications.

The Solution: One-time hardware investment for unlimited usage.

Real-World Applications:

Startups: Prototype without burning through funding
Education: Provide AI access to all students without budget concerns
Non-profits: Leverage AI without ongoing costs
High-volume Processing: Batch jobs, data analysis, content generation

Cost Analysis Example:

# Cost comparison calculator
class CostAnalyzer:
    def calculate_savings(self, monthly_tokens: int):
        # API costs (approximate)
        api_cost_per_million = 15.00  # USD
        monthly_api_cost = (monthly_tokens / 1_000_000) * api_cost_per_million
        
        # Local costs (one-time hardware)
        hardware_cost = 2000  # Good GPU setup
        electricity_monthly = 50  # Approximate
        
        # Calculate break-even
        months_to_break_even = hardware_cost / (monthly_api_cost - electricity_monthly)
        
        return {
            "monthly_api_cost": monthly_api_cost,
            "monthly_local_cost": electricity_monthly,
            "monthly_savings": monthly_api_cost - electricity_monthly,
            "break_even_months": months_to_break_even,
            "first_year_savings": (monthly_api_cost * 12) - (hardware_cost + electricity_monthly * 12)
        }

7. Real-Time Processing ⚡

The Challenge: Network latency makes cloud AI unsuitable for real-time applications.

The Solution: Sub-second response times with local processing.

Real-World Applications:

Trading Systems: Analyze market data in milliseconds
Gaming: Real-time NPC dialogue and behavior
Robotics: Immediate response to sensor inputs
Live Translation: Instant language translation

Example Implementation:

# Real-time stream processing
class StreamProcessor:
    def __init__(self):
        self.buffer = []
        self.processing = False
        
    async def process_stream(self, data_stream):
        async for chunk in data_stream:
            self.buffer.append(chunk)
            
            if not self.processing and len(self.buffer) > 0:
                self.processing = True
                
                # Process immediately without network delay
                result = await llama_service.complete(
                    prompt=f"Analyze: {self.buffer[-1]}",
                    model="tinyllama:latest",  # Fast model for real-time
                    max_tokens=50,
                    stream=True
                )
                
                async for token in result:
                    yield token  # Stream results immediately
                
                self.processing = False

8. Custom Tool Integration 🛠️

The Challenge: Generic AI can't interact with your specific systems and databases.

The Solution: Build custom tools that integrate with your infrastructure.

Real-World Applications:

DevOps: AI that can manage your specific infrastructure
Database Management: Query and manage your databases via natural language
System Administration: Automate complex administrative tasks
Business Intelligence: Connect to your BI tools and data warehouses

Example Implementation:

# Custom tool for database operations
class DatabaseTool(BaseTool):
    @property
    def name(self) -> str:
        return "company_database"
    
    @property
    def description(self) -> str:
        return "Query and manage company database"
    
    async def execute(self, query: str, operation: str = "select") -> ToolResult:
        # Connect to your specific database
        async with get_company_db() as db:
            if operation == "select":
                results = await db.fetch(query)
                return ToolResult(success=True, data=results)
            elif operation == "analyze":
                # Use Llama to analyze query results
                analysis = await llama_service.complete(
                    prompt=f"Analyze this data: {results}",
                    temperature=0.3
                )
                return ToolResult(success=True, data=analysis)

9. Compliance and Governance 📋

The Challenge: Regulatory requirements demand complete control and audit trails.

The Solution: Full transparency and logging of all AI operations.

Real-World Applications:

Healthcare: HIPAA compliance with audit trails
Finance: SOX compliance with transaction monitoring
Legal: Attorney-client privilege protection
Government: Security clearance requirements

Example Implementation:

# Compliance-aware AI system
class ComplianceAI:
    def __init__(self):
        self.audit_logger = AuditLogger()
        self.encryption = EncryptionService()
        
    async def process_regulated_data(self, data: str, user: str, purpose: str):
        # Log access for audit
        audit_id = await self.audit_logger.log_access(
            user=user,
            data_type="regulated",
            purpose=purpose,
            timestamp=datetime.now()
        )
        
        # Encrypt data in transit
        encrypted = self.encryption.encrypt(data)
        
        # Process with local model (data never leaves premises)
        result = await llama_service.complete(
            prompt=f"Process: {encrypted}",
            model="compliance-llama:latest"
        )
        
        # Log completion
        await self.audit_logger.log_completion(
            audit_id=audit_id,
            success=True,
            result_hash=hashlib.sha256(result.encode()).hexdigest()
        )
        
        return self.encryption.encrypt(result)

10. Educational Environments 🎓

The Challenge: Educational institutions need affordable AI access for all students.

The Solution: Single deployment serves unlimited students without per-use costs.

Real-World Applications:

Computer Science: Teaching AI/ML concepts hands-on
Research Projects: Student research without budget constraints
Writing Centers: AI-assisted writing for all students
Language Learning: Personalized language practice

Example Implementation:

# Educational AI assistant
class EducationalAssistant:
    def __init__(self):
        self.student_profiles = {}
        self.learning_analytics = LearningAnalytics()
        
    async def personalized_tutoring(self, student_id: str, subject: str, question: str):
        # Get student's learning profile
        profile = self.student_profiles.get(student_id, self.create_profile(student_id))
        
        # Adjust response based on student level
        response = await llama_service.complete(
            prompt=f"""
            Student Level: {profile['level']}
            Subject: {subject}
            Question: {question}
            
            Provide an explanation appropriate for this student's level.
            """,
            temperature=0.7,
            model="education-llama:latest"
        )
        
        # Track learning progress
        await self.learning_analytics.record_interaction(
            student_id=student_id,
            subject=subject,
            question=question,
            response=response
        )
        
        return response

🐍 Why Python?

Advantages Over TypeScript/Node.js

Aspect	Python Advantage	Use Case
Scientific Computing	NumPy, SciPy, Pandas integration	Data analysis, research
ML Ecosystem	Direct integration with PyTorch, TensorFlow	Model experimentation
Simplicity	Cleaner async/await syntax	Faster development
Libraries	Vast ecosystem of AI/ML tools	Extended functionality
Debugging	Better error messages and debugging tools	Easier troubleshooting
Performance	uvloop for high-performance async	Better concurrency
Type Safety	Type hints + Pydantic validation	Runtime validation

✨ Features

Core Capabilities

🚀 High Performance: Async/await with uvloop support
🛠️ 10+ Built-in Tools: Web search, file ops, calculations, and more
📝 Prompt Templates: Pre-defined prompts for common tasks
📁 Resource Management: Access templates and documentation
🔄 Streaming Support: Real-time token generation
🔧 Highly Configurable: Environment-based configuration
📊 Structured Logging: Comprehensive debugging support
🧪 Fully Tested: Pytest test suite included

Python-Specific Features

🐼 Data Science Integration: Works with Pandas, NumPy
🤖 ML Framework Compatible: Integrate with PyTorch, TensorFlow
📈 Analytics Built-in: Performance metrics and monitoring
🔌 Plugin System: Easy to extend with Python packages
🎯 Type Safety: Pydantic models for validation
🔒 Security: Built-in sanitization and validation

💻 System Requirements

Minimum Requirements

Component	Minimum	Recommended	Optimal
Python	3.9+	3.11+	Latest
CPU	4 cores	8 cores	16+ cores
RAM	8GB	16GB	32GB+
Storage	10GB SSD	50GB SSD	100GB NVMe
OS	Linux/macOS/Windows	Ubuntu 22.04	Latest Linux

Model Requirements

Model	Parameters	RAM	Use Case
`tinyllama`	1.1B	2GB	Testing, quick responses
`llama3:7b`	7B	8GB	General purpose
`llama3:13b`	13B	16GB	Advanced tasks
`llama3:70b`	70B	48GB	Professional use
`codellama`	7-34B	8-32GB	Code generation

🚀 Quick Start

# Clone the repository
git clone https://github.com/yobieben/llama4-maverick-mcp-python.git
cd llama4-maverick-mcp-python

# Run setup (handles everything)
python setup.py

# Start the server
python -m llama4_maverick_mcp.server

That's it! The server is now running and ready to connect to Claude Desktop.

📦 Detailed Installation

Step 1: Python Setup

# Check Python version
python --version  # Should be 3.9+

# Create virtual environment (recommended)
python -m venv venv

# Activate virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows:
venv\Scripts\activate

Step 2: Install Dependencies

# Install the package in development mode
pip install -e .

# For development with testing tools
pip install -e .[dev]

Step 3: Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com/download/windows

Step 4: Configure Environment

# Copy example configuration
cp .env.example .env

# Edit configuration
nano .env  # or your preferred editor

Step 5: Download Models

# Start Ollama service
ollama serve

# In another terminal, pull models
ollama pull llama3:latest
ollama pull codellama:latest
ollama pull tinyllama:latest

Step 6: Configure Claude Desktop

Add to Claude Desktop configuration:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "llama4-python": {
      "command": "python",
      "args": ["-m", "llama4_maverick_mcp.server"],
      "cwd": "/path/to/llama4-maverick-mcp-python",
      "env": {
        "PYTHONPATH": "/path/to/llama4-maverick-mcp-python/src",
        "LLAMA_MODEL_NAME": "llama3:latest"
      }
    }
  }
}

⚙️ Configuration

Environment Variables

Create a .env file:

# Ollama Configuration
LLAMA_API_URL=http://localhost:11434
LLAMA_MODEL_NAME=llama3:latest
LLAMA_API_KEY=  # Optional

# Server Configuration
MCP_LOG_LEVEL=INFO
MCP_SERVER_HOST=localhost
MCP_SERVER_PORT=3000

# Features
ENABLE_STREAMING=true
ENABLE_FUNCTION_CALLING=true
ENABLE_VISION=false
ENABLE_CODE_EXECUTION=false  # Security risk
ENABLE_WEB_SEARCH=true

# Model Parameters
TEMPERATURE=0.7  # 0.0-2.0
TOP_P=0.9        # 0.0-1.0
TOP_K=40         # 1-100
REPEAT_PENALTY=1.1
SEED=42  # For reproducibility

# File System
FILE_SYSTEM_BASE_PATH=/safe/path
ALLOW_FILE_WRITES=true

# Performance
MAX_CONTEXT_LENGTH=128000
MAX_CONCURRENT_REQUESTS=10
REQUEST_TIMEOUT_MS=30000
CACHE_TTL=3600
CACHE_MAX_SIZE=1000

# Debug
DEBUG=false
VERBOSE_LOGGING=false

Configuration Classes

from llama4_maverick_mcp.config import Config

# Create custom configuration
config = Config(
    llama_model_name="codellama:latest",
    temperature=0.3,
    enable_code_execution=True
)

# Access configuration
print(config.llama_model_name)
print(config.get_model_params())

🛠️ Available Tools

Built-in Tools

Tool	Description	Example
`calculator`	Mathematical calculations	`2 + 2`, `sqrt(16)`
`datetime`	Date/time operations	Current time, date math
`json_tool`	JSON manipulation	Parse, extract, transform
`web_search`	Search the web	Query for information
`file_read`	Read files	Access local files
`file_write`	Write files	Save data locally
`list_files`	List directories	Browse file system
`code_executor`	Run code	Execute Python/JS/Bash
`http_request`	HTTP calls	API interactions

Creating Custom Tools

# src/llama4_maverick_mcp/tools/custom/my_tool.py
from pydantic import BaseModel, Field
from ..base import BaseTool, ToolResult

class MyToolParams(BaseModel):
    """Parameters for my custom tool."""
    input_text: str = Field(..., description="Text to process")
    option: str = Field(default="default", description="Processing option")

class MyCustomTool(BaseTool):
    @property
    def name(self) -> str:
        return "my_custom_tool"
    
    @property
    def description(self) -> str:
        return "Performs custom processing on text"
    
    @property
    def parameters(self) -> type[BaseModel]:
        return MyToolParams
    
    async def execute(self, input_text: str, option: str = "default") -> ToolResult:
        # Your custom logic here
        result = f"Processed: {input_text} with option: {option}"
        
        return ToolResult(
            success=True,
            data={"result": result, "length": len(input_text)}
        )

📊 Usage Examples

Basic Usage

import asyncio
from llama4_maverick_mcp import MCPServer, Config

async def main():
    # Create server with custom config
    config = Config(
        llama_model_name="llama3:latest",
        temperature=0.7
    )
    server = MCPServer(config)
    
    # Run the server
    await server.run()

if __name__ == "__main__":
    asyncio.run(main())

Direct API Usage

from llama4_maverick_mcp import LlamaService, Config

async def generate_text():
    config = Config()
    llama = LlamaService(config)
    await llama.initialize()
    
    # Simple completion
    result = await llama.complete(
        prompt="Explain quantum computing",
        temperature=0.5,
        max_tokens=200
    )
    print(result)
    
    # Chat completion
    messages = [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is Python?"}
    ]
    response = await llama.complete_chat(messages)
    print(response)

Tool Execution

from llama4_maverick_mcp.tools import ToolManager

async def use_tools():
    manager = ToolManager(Config())
    await manager.initialize()
    
    # Execute calculator
    result = await manager.execute_tool(
        "calculator",
        {"expression": "factorial(5) + sqrt(16)"}
    )
    print(result)
    
    # Read file
    content = await manager.execute_tool(
        "file_read",
        {"path": "config.json"}
    )
    print(content)

🌟 Real-World Applications

1. Document Analysis Pipeline

class DocumentAnalyzer:
    def __init__(self):
        self.config = Config(temperature=0.3)
        self.llama = LlamaService(self.config)
        self.tools = ToolManager(self.config)
    
    async def analyze_documents(self, directory: str):
        # List all documents
        files = await self.tools.execute_tool(
            "list_files",
            {"path": directory, "recursive": True}
        )
        
        results = []
        for file in files['data']['files']:
            if file.endswith(('.txt', '.md', '.pdf')):
                # Read document
                content = await self.tools.execute_tool(
                    "file_read",
                    {"path": file}
                )
                
                # Analyze with Llama
                analysis = await self.llama.complete(
                    prompt=f"Summarize and extract key points: {content['data']}",
                    max_tokens=500
                )
                
                results.append({
                    "file": file,
                    "analysis": analysis
                })
        
        return results

2. Code Review System

class CodeReviewer:
    async def review_code(self, code: str, language: str = "python"):
        prompt = f"""
        Review this {language} code for:
        1. Security vulnerabilities
        2. Performance issues
        3. Best practices
        4. Potential bugs
        
        Code:
        ```{language}
        {code}
        ```
        
        Provide specific suggestions for improvement.
        """
        
        review = await llama_service.complete(
            prompt=prompt,
            model="codellama:latest",
            temperature=0.3
        )
        
        return self.parse_review(review)

3. Research Assistant

class ResearchAssistant:
    async def research_topic(self, topic: str):
        # Search for information
        search_results = await self.tools.execute_tool(
            "web_search",
            {"query": topic, "max_results": 10}
        )
        
        # Analyze sources
        analysis = await self.llama.complete(
            prompt=f"Analyze these sources about {topic}: {search_results}",
            temperature=0.5
        )
        
        # Generate report
        report = await self.llama.complete(
            prompt=f"Write a comprehensive report on {topic} based on: {analysis}",
            temperature=0.7,
            max_tokens=2000
        )
        
        # Save report
        await self.tools.execute_tool(
            "file_write",
            {
                "path": f"research_{topic}_{datetime.now().strftime('%Y%m%d')}.md",
                "content": report
            }
        )
        
        return report

🧪 Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=llama4_maverick_mcp

# Run specific test
pytest tests/test_llama_service.py

# Run with verbose output
pytest -v

Code Quality

# Format code with Black
black src/

# Lint with Ruff
ruff check src/

# Type checking with mypy
mypy src/

# All quality checks
make quality

Creating Tests

# tests/test_my_tool.py
import pytest
from llama4_maverick_mcp.tools.custom.my_tool import MyCustomTool

@pytest.mark.asyncio
async def test_my_custom_tool():
    tool = MyCustomTool()
    
    result = await tool.execute(
        input_text="Hello, world!",
        option="uppercase"
    )
    
    assert result.success
    assert "Hello, world!" in result.data["result"]
    assert result.data["length"] == 13

🚀 Performance Optimization

1. Use uvloop (Linux/macOS)

# Automatically enabled if available
# 2-4x performance improvement for async operations
pip install uvloop

2. Model Optimization

# Use smaller models for simple tasks
config = Config(
    llama_model_name="tinyllama:latest",  # 1.1B params, very fast
    max_context_length=4096,  # Reduce context for speed
    temperature=0.1  # Lower temperature for consistency
)

3. Caching Strategy

from functools import lru_cache
from cachetools import TTLCache

class CachedLlamaService(LlamaService):
    def __init__(self, config):
        super().__init__(config)
        self.cache = TTLCache(maxsize=1000, ttl=3600)
    
    async def complete(self, prompt: str, **kwargs):
        cache_key = f"{prompt}:{kwargs}"
        
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        result = await super().complete(prompt, **kwargs)
        self.cache[cache_key] = result
        return result

4. Batch Processing

import asyncio

async def batch_process(prompts: list):
    # Process multiple prompts concurrently
    tasks = [
        llama_service.complete(prompt, temperature=0.5)
        for prompt in prompts
    ]
    
    # Limit concurrency to avoid overwhelming the system
    semaphore = asyncio.Semaphore(5)
    
    async def limited_task(task):
        async with semaphore:
            return await task
    
    results = await asyncio.gather(*[limited_task(t) for t in tasks])
    return results

🔧 Troubleshooting

Common Issues

Issue	Solution
ImportError	Check Python path: `export PYTHONPATH=$PYTHONPATH:$(pwd)/src`
Ollama not found	Install: `curl -fsSL https://ollama.com/install.sh \| sh`
Model not available	Pull model: `ollama pull llama3:latest`
Permission denied	Check file permissions and base path configuration
Memory error	Use smaller model or increase system RAM
Timeout errors	Increase `REQUEST_TIMEOUT_MS` in configuration

Debug Mode

# Enable detailed logging
config = Config(
    debug_mode=True,
    verbose_logging=True,
    log_level="DEBUG"
)

# Or via environment
export DEBUG=true
export MCP_LOG_LEVEL=DEBUG
export VERBOSE_LOGGING=true

Health Check

async def health_check():
    """Check system health."""
    checks = {
        "python_version": sys.version,
        "ollama_connected": config.validate_ollama_connection(),
        "models_available": await llama_service.list_models(),
        "tools_loaded": len(await tool_manager.get_tools()),
        "memory_usage": psutil.virtual_memory().percent,
        "disk_usage": psutil.disk_usage('/').percent
    }
    
    return {
        "status": "healthy" if all(checks.values()) else "degraded",
        "checks": checks,
        "timestamp": datetime.now().isoformat()
    }

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas for Contribution

🛠️ New tools and integrations
📝 Documentation improvements
🐛 Bug fixes
🚀 Performance optimizations
🧪 Test coverage
🌐 Internationalization

Development Workflow

# Fork and clone
git clone https://github.com/YOUR_USERNAME/llama4-maverick-mcp-python.git

# Create branch
git checkout -b feature/your-feature

# Make changes and test
pytest

# Commit with conventional commits
git commit -m "feat: add new amazing feature"

# Push and create PR
git push origin feature/your-feature

📄 License

MIT License - See LICENSE file

👨‍💻 Author

Yobie Benjamin
Version 0.9
August 1, 2025

🙏 Acknowledgments

Anthropic for the MCP protocol
Ollama team for local model hosting
Meta for Llama models
Python community for excellent libraries

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki

Ready to experience the power of local AI? Start with Llama 4 Maverick MCP Python today! 🦙🐍🚀

Llama 4 Maverick MCP Server

README

🦙 Llama 4 Maverick MCP Server (Python)

📚 Table of Contents

🎯 What Would You Use This Llama MCP Server For?

The Revolution of Local AI + Claude Desktop

1. Privacy-First AI Operations 🔒

2. Custom Model Deployment 🎯

3. Hybrid Intelligence Systems 🔄

4. Offline and Edge Computing 🌐

5. Experimentation and Research 🧪

6. Cost-Effective Scaling 💰

7. Real-Time Processing ⚡

8. Custom Tool Integration 🛠️

9. Compliance and Governance 📋

10. Educational Environments 🎓

🐍 Why Python?

Advantages Over TypeScript/Node.js

✨ Features

Core Capabilities

Python-Specific Features

💻 System Requirements

Minimum Requirements

Model Requirements

🚀 Quick Start

📦 Detailed Installation

Step 1: Python Setup

Step 2: Install Dependencies

Step 3: Install Ollama

Step 4: Configure Environment

Step 5: Download Models

Step 6: Configure Claude Desktop

⚙️ Configuration

Environment Variables

Configuration Classes

🛠️ Available Tools

Built-in Tools

Creating Custom Tools

📊 Usage Examples

Basic Usage

Direct API Usage

Tool Execution

🌟 Real-World Applications

1. Document Analysis Pipeline

2. Code Review System

3. Research Assistant

🧪 Development

Running Tests

Code Quality

Creating Tests

🚀 Performance Optimization

1. Use uvloop (Linux/macOS)

2. Model Optimization

3. Caching Strategy

4. Batch Processing

🔧 Troubleshooting

Common Issues

Debug Mode

Health Check

🤝 Contributing

Areas for Contribution

Development Workflow

📄 License

👨‍💻 Author

🙏 Acknowledgments

📞 Support

推荐服务器