MCP 服务器

Veo 3.1 MCP Server

Enables high-quality AI video generation using Google's Veo 3.1 model for text-to-video, style-guided, and frame-interpolation tasks. It features token-efficient reference image handling, batch processing, and video extension capabilities with built-in cost estimation.

README

🎬 Veo 3.1 MCP Server

Token-Efficient AI Video Generation with Google's Veo 3.1

🎯 What is This?

An MCP server for Google's Veo 3.1 - the state-of-the-art AI video generation model. Generate stunning videos from text prompts, reference images, or interpolate between first/last frames.

Key Features

✅ Text-to-Video - Generate videos from descriptions
✅ Reference Images - Up to 3 images for style guidance
✅ Frame Interpolation - First + last frame → coherent video
✅ Video Extension - Extend Veo-generated videos
✅ Batch Generation - Generate multiple videos with concurrency control
✅ Cost Estimation - Know costs before generating
✅ Token-Efficient - Auto-upload refs to Files API (97% token savings!)

🚀 Quick Start

1. Installation

cd veo-mcp
npm install
npm run build

2. Get API Key

Go to Google AI Studio
Create API key
Enable Veo 3.1 in your project (billing required)

3. Configure

cp environment.template .env
# Edit .env and add your key

4. Add to Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "veo": {
      "command": "node",
      "args": ["C:\\Users\\woute\\Githubs\\MCP\\veo-mcp\\dist\\index.js"],
      "env": {
        "GEMINI_API_KEY": "your_api_key_here"
      }
    }
  }
}

Restart Cursor. Done! ✅

🛠️ Tools

1. `start_video_generation` - Generate Video

Basic text-to-video:

{
  "prompt": "A serene Zen garden at sunrise, cherry blossoms falling, cinematic"
}

With reference images (token-efficient!):

{
  "prompt": "A futuristic cityscape at night, neon lights",
  "referenceImages": [{
    "source": "url",
    "url": "https://example.com/style.jpg"
  }],
  "durationSeconds": 8,
  "resolution": "1080p"
}

First/last frame interpolation:

{
  "prompt": "Smooth transition between these scenes",
  "firstFrame": {
    "source": "file_path",
    "filePath": "C:\\first.jpg"
  },
  "lastFrame": {
    "source": "file_path",
    "filePath": "C:\\last.jpg"
  }
}

Parameters:

model - veo-3.1-generate-001 (quality) or veo-3.1-fast-generate-001 (speed)
durationSeconds - 4, 6, or 8
aspectRatio - 16:9 or 9:16
resolution - 720p or 1080p
generateAudio - Include synchronized audio (2x cost)
seed - For reproducibility
sampleCount - Generate 1-4 videos

2. `get_video_job` - Check Status

{
  "operationName": "operations/xyz"
}

Returns status and video URLs when complete.

3. `upload_image` - Pre-Upload References

{
  "source": "file_path",
  "filePath": "C:\\style-ref.jpg"
}

Returns fileUri valid for 48 hours. Reuse across multiple generations!

4. `extend_video` - Extend Videos

{
  "videoFileUri": "files/abc123",
  "additionalSeconds": 7,
  "prompt": "Continue with the character walking into the sunset"
}

5. `start_batch_video_generation` - Batch Generate

{
  "jobs": [
    {"key": "scene1", "request": {"prompt": "..."}},
    {"key": "scene2", "request": {"prompt": "..."}}
  ],
  "concurrency": 3
}

6. `estimate_veo_cost` - Cost Estimation

{
  "model": "veo-3.1-fast-generate-001",
  "durationSeconds": 8,
  "sampleCount": 1,
  "generateAudio": false
}

Returns estimated cost in USD.

💰 Pricing

Model	Video Only	Video + Audio
veo-3.1-generate-001 (quality)	$0.20/sec	$0.40/sec
veo-3.1-fast-generate-001 (speed)	$0.10/sec	$0.15/sec

Example Costs:

8s video (fast, no audio): $0.80
8s video (quality, with audio): $3.20
4s video (fast, no audio): $0.40

📊 Limits & Constraints

Parameter	Limit
Duration	4, 6, or 8 seconds
Reference images	0-3 images
Sample count	1-4 videos
Resolutions	720p, 1080p
Aspect ratios	16:9, 9:16
Rate limit	~50 requests/min

💡 Usage Examples

Simple Text-to-Video

Generate an 8-second video of a peaceful forest scene with morning mist

With Style Reference

Create a video of a tech startup office, using this image for style: C:\ref.jpg

Frame Interpolation

Generate a smooth transition between first.jpg and last.jpg, 8 seconds, cinematic camera movement

Batch Generation

Generate 5 different video variations of a product showcase with different angles

🔍 How Token Efficiency Works

❌ Naive Approach (Base64)

{
  "referenceImages": [{
    "base64": "iVBORw0KGgo..." // 500KB → ~50,000 tokens!
  }]
}

Cost: Massive token usage per call

✅ Token-Efficient (This MCP)

{
  "referenceImages": [{
    "source": "url",
    "url": "https://example.com/ref.jpg" // ~20 tokens
  }]
}

What Happens:

Server downloads image (no tokens)
Computes SHA-256 hash
Checks cache (48h validity)
Uploads to Files API if needed (~1s)
Uses short files/abc123 URI (~5 tokens)

Savings: 97%+ fewer tokens! 🎉

⏱️ Generation Times

Configuration	Typical Time
4s, 720p, no audio	30-60 sec
8s, 1080p, no audio	60-120 sec
8s, 1080p, with audio	90-150 sec
With references	+10-30 sec
Frame interpolation	+20-40 sec

Note: Times vary based on prompt complexity and server load.

🎨 Best Practices

1. Start Small, Scale Up

Step 1: Generate 1 video at 720p
Step 2: If good, regenerate at 1080p
Step 3: Use batch for variations

2. Use Fast Model for Testing

{
  "model": "veo-3.1-fast-generate-001",  // Testing
  "resolution": "720p"
}

Switch to quality model for final:

{
  "model": "veo-3.1-generate-001",  // Final
  "resolution": "1080p"
}

3. Pre-Upload Frequently Used References

// Step 1: Upload once
upload_image {"source": "file_path", "filePath": "brand-style.jpg"}
// Returns: files/xyz123

// Step 2: Reuse many times
{
  "referenceImages": [{"source": "file_uri", "fileUri": "files/xyz123"}]
}

4. Leverage Batch for Variations

{
  "jobs": [
    {"key": "v1", "request": {"prompt": "Scene 1...", "seed": 1}},
    {"key": "v2", "request": {"prompt": "Scene 1...", "seed": 2}},
    {"key": "v3", "request": {"prompt": "Scene 1...", "seed": 3}}
  ]
}

5. Monitor Costs

Always estimate before large batches:

estimate_veo_cost {
  "model": "veo-3.1-fast-generate-001",
  "durationSeconds": 8,
  "sampleCount": 10
}
// Returns: $8.00 estimate

🎬 Async Operation Flow

Veo uses async long-running operations:

1. start_video_generation
   ↓ Returns operationName immediately
   
2. get_video_job (poll every 10-30s)
   ↓ Returns {done: false, status: "RUNNING"}
   
3. get_video_job (after 60-120s)
   ↓ Returns {done: true, videos: [{videoUri: "..."}]}
   
4. Download video from videoUri

Tip: Don't poll too frequently (< 10s intervals).

🆘 Troubleshooting

"API not enabled" (403)

Go to Google Cloud Console
Enable "Generative Language API"
Enable billing
Wait 5-10 minutes for propagation

"Rate limit exceeded"

Veo allows ~50 requests/min
Use batch tool with concurrency: 3
Add delays between requests

"Invalid aspect ratio with references"

9:16 may not work with reference images
Use 16:9 for reference mode
Check Veo 3.1 docs for updates

"Video extension failed"

Only Veo-generated videos can be extended
Cannot extend arbitrary MP4s
Input must be from previous Veo job

Long generation times

1080p takes longer than 720p
Audio generation adds time
Reference images add processing
Frame interpolation is slowest

📚 Resources

🎯 Status: Production Ready ✅

✅ All 6 tools implemented
✅ Token-efficient file handling
✅ Async operation support
✅ Batch generation with concurrency control
✅ Cost estimation
✅ Comprehensive validation
✅ Error handling
✅ Full documentation

Ready to generate amazing videos! 🚀

Built with 🎬 for AI video generation