VideoLingo
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组 - CXL-edu/VideoLingo
README
<div align="center">
<img src="/docs/logo.png" alt="VideoLingo Logo" height="140">
Connect the World, Frame by Frame
English|简体中文|繁體中文|日本語|Español|Русский|Français
</div>
🌟 Overview (Try VL Now!)
VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.
Key features:
-
🎥 YouTube video download via yt-dlp
-
🎙️ Word-level and Low-illusion subtitle recognition with WhisperX
-
📝 NLP and AI-powered subtitle segmentation
-
📚 Custom + AI-generated terminology for coherent translation
-
🔄 3-step Translate-Reflect-Adaptation for cinematic quality
-
✅ Netflix-standard, Single-line subtitles Only
-
🗣️ Dubbing with GPT-SoVITS, Azure, OpenAI, and more
-
🚀 One-click startup and processing in Streamlit
-
🌍 Multi-language support in Streamlit UI
-
📝 Detailed logging with progress resumption
Difference from similar projects: Single-line subtitles only, superior translation quality, seamless dubbing experience
🎥 Demo
<table> <tr> <td width="50%">
Russian Translation
https://github.com/user-attachments/assets/25264b5b-6931-4d39-948c-5a1e4ce42fa7
</td> <td width="50%">
GPT-SoVITS Dubbing
https://github.com/user-attachments/assets/47d965b2-b4ab-4a0b-9d08-b49a7bf3508c
</td> </tr> </table>
Language Support
Input Language Support(more to come):
🇺🇸 English 🤩 | 🇷🇺 Russian 😊 | 🇫🇷 French 🤩 | 🇩🇪 German 🤩 | 🇮🇹 Italian 🤩 | 🇪🇸 Spanish 🤩 | 🇯🇵 Japanese 😐 | 🇨🇳 Chinese* 😊
*Chinese uses a separate punctuation-enhanced whisper model, for now...
Translation supports all languages, while dubbing language depends on the chosen TTS method.
Installation
You don't have to read the whole docs, here is an online AI agent to help you.
Note: For Windows users with NVIDIA GPU, follow these steps before installation:
- Install CUDA Toolkit 12.6
- Install CUDNN 9.3.0
- Add
C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6
to your system PATH- Restart your computer
Note: FFmpeg is required. Please install it via package managers:
- Windows:
choco install ffmpeg
(via Chocolatey)- macOS:
brew install ffmpeg
(via Homebrew)- Linux:
sudo apt install ffmpeg
(Debian/Ubuntu)
- Clone the repository
git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo
- Install dependencies(requires
python=3.10
)
conda create -n videolingo python=3.10.0 -y
conda activate videolingo
python install.py
- Start the application
streamlit run st.py
Docker
Alternatively, you can use Docker (requires CUDA 12.4 and NVIDIA Driver version >550), see Docker docs:
docker build -t videolingo .
docker run -d -p 8501:8501 --gpus all videolingo
APIs
VideoLingo supports OpenAI-Like API format and various TTS interfaces:
- LLM:
claude-3-5-sonnet-20240620
,deepseek-chat(v3)
,gemini-2.0-flash-exp
,gpt-4o
, ... (sorted by performance) - WhisperX: Run whisperX locally or use 302.ai API
- TTS:
azure-tts
,openai-tts
,siliconflow-fishtts
,fish-tts
,GPT-SoVITS
,edge-tts
,*custom-tts
(You can modify your own TTS in custom_tts.py!)
Note: VideoLingo works with 302.ai - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!
For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: English | 中文
Current Limitations
-
WhisperX transcription performance may be affected by video background noise, as it uses wav2vac model for alignment. For videos with loud background music, please enable Voice Separation Enhancement. Additionally, subtitles ending with numbers or special characters may be truncated early due to wav2vac's inability to map numeric characters (e.g., "1") to their spoken form ("one").
-
Using weaker models can lead to errors during intermediate processes due to strict JSON format requirements for responses. If this error occurs, please delete the
output
folder and retry with a different LLM, otherwise repeated execution will read the previous erroneous response causing the same error. -
The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages, as well as the impact of the translation step. However, this project has implemented extensive engineering processing for speech rates to ensure the best possible dubbing results.
-
Multilingual video transcription recognition will only retain the main language. This is because whisperX uses a specialized model for a single language when forcibly aligning word-level subtitles, and will delete unrecognized languages.
-
Cannot dub multiple characters separately, as whisperX's speaker distinction capability is not sufficiently reliable.
📄 License
This project is licensed under the Apache 2.0 License. Special thanks to the following open source projects for their contributions:
whisperX, yt-dlp, json_repair, BELLE
📬 Contact Me
- Submit Issues or Pull Requests on GitHub
- DM me on Twitter: @Huanshere
- Email me at: team@videolingo.io
⭐ Star History
<p align="center">If you find VideoLingo helpful, please give me a ⭐️!</p>
推荐服务器
mult-fetch-mcp-server
一个多功能的、符合 MCP 规范的网页内容抓取工具,支持多种模式(浏览器/Node)、格式(HTML/JSON/Markdown/文本)和智能代理检测,并提供双语界面(英语/中文)。
Hyperbrowser
欢迎来到 Hyperbrowser,人工智能的互联网。Hyperbrowser 是下一代平台,旨在增强人工智能代理的能力,并实现轻松、可扩展的浏览器自动化。它专为人工智能开发者打造,消除了本地基础设施和性能瓶颈带来的麻烦,让您能够:
MCP Web Research Server
一个模型上下文协议服务器,使 Claude 能够通过集成 Google 搜索、提取网页内容和捕获屏幕截图来进行网络研究。

YouTube Translate MCP
一个模型上下文协议服务器,可以通过文字稿、翻译、摘要和各种语言的字幕生成来访问 YouTube 视频内容。
Fetch MCP Server
提供以各种格式(包括 HTML、JSON、纯文本和 Markdown)获取 Web 内容的功能。
Jina AI
Contribute to JoeBuildsStuff/mcp-jina-ai development by creating an account on GitHub.
Web Research Server
MCP web research server (give Claude real-time info from the web) - oneshot-engineering/mcp-webresearch

Mcp Server Chatsum
Please provide me with the chat message you want me to summarize and translate into Chinese. I need the text of the message to be able to help you.
MCP Web Research Server
MCP Web Research Server 通过集成 Google 搜索、捕获网页内容和屏幕截图以及跟踪研究会话,从而能够使用 Claude 进行实时网络研究。
MCP Deep Web Research Server
一个模型上下文协议服务器,使 Claude 能够执行高级网络研究,具备智能搜索队列、增强的内容提取和深度研究能力。