type
status
date
slug
summary
tags
category
icon
password
Linly-Dubbing is an open source multi-language AI dubbing and video translation tool.
It can automatically translate videos into other languages and generate subtitles, clone the voice of the speaker in the video and automatically dub it, and perform lip-syncing.
Key Features
- Automatically download videos:
- support downloading videos from websites such as YouTube
- Multi-language support: Supports dubbing and subtitle translation in Chinese and many other languages.
- AI Speech Recognition: Accurate speech recognition, speech-to-text conversion and speaker identification.
- LLM Translation: Combined with leading large language models (such as GPT), it can translate quickly and accurately, ensuring the professionalism and naturalness of the translation.
- Voice cloning: Through voice cloning technology, a voice that is highly similar to the original video dubbing is generated, maintaining consistency in emotion and tone.
- Lip Sync: By keeping the lip sync, the dubbing can be highly consistent with the video screen, improving the authenticity and interactivity of the video.
- Flexible upload and translation: Users can upload videos and choose the translation language and standard to ensure personalization and flexibility.
Technical Details
Speech Recognition
- WhisperX: An extension of the OpenAI Whisper speech recognition system that can transcribe speech content into text, accurately align it with video frames, generate subtitle files with timestamps, and support multi-speaker recognition.
- FunASR: A comprehensive speech recognition toolkit that provides speech recognition, voice activity detection, punctuation recovery and other functions, especially optimized for Chinese speech.
Speech synthesis
Integrates multiple advanced speech synthesis tools such as Edge TTS, XTTS and CosyVoice.
- Edge TTS: A high-quality text-to-speech conversion service provided by Microsoft that supports multiple languages and voice styles and generates natural and fluent speech output.
- XTTS: An advanced deep learning text-to-speech toolkit provided by Coqui, focusing on voice cloning and multilingual speech synthesis, which can achieve voice cloning through short audio clips and generate realistic speech output.
- CosyVoice: A multilingual speech understanding and synthesis model developed by Alibaba Tongyi Laboratory that supports high-quality speech synthesis and cross-language voice cloning in multiple languages.
Subtitle Translation
Use OpenAI API and Qwen model for multi-language subtitle translation.
- OpenAI API: Use OpenAI's GPT-4 and GPT-3.5-turbo for high-quality subtitle translation. These models are known for their natural language understanding and text generation capabilities, and are suitable for dialogue generation and text analysis.
- Qwen: An open source localized large-scale language model that supports multilingual translation and can process texts in multiple languages cost-effectively.
- Google Translate: Integrate Google Translate as a supplement to the translation function, providing wide language support and good translation quality.
Voice separation
Use Demucs and UVR5 technology to separate vocals from accompaniment.
- |Demucs: A sound separation model developed by the Facebook research team that can separate different sound sources in mixed audio, including musical instruments, voices, and background sounds. It is widely used in music production and film and television post-production.
- UVR5 (Ultimate Vocal Remover): An efficient vocal accompaniment separation tool that can extract accompaniment close to the original stereo, outperforming other similar tools such as RX9, RipX and SpectraLayers 9.
Lip Sync
- Drawing on Linly-Talker, we focus on digital human lip syncing technology, combining computer vision and speech recognition technology to accurately match the virtual character's lip sync with the dubbing, achieving a highly natural synchronization effect. This technology is suitable for a variety of scenarios such as animated characters, virtual anchors, and narrators in educational videos.
Video Processing
- Linly-Dubbing provides functions such as adding subtitles, inserting background music, adjusting volume and playback speed, so users can customize video content to make it more attractive and personalized.
- Integration of yt-dlp: yt-dlp is a powerful open source command line tool designed for downloading videos and audio from YouTube and other websites. The tool has a wide range of parameter options, allowing users to fine-tune the download behavior according to their needs. Whether it is selecting a specific format, resolution, or extracting audio, yt-dlp provides a flexible solution.
Demo Video
- Author:KCGOD
- URL:https://kcgod.com/linly-dubbing-open-source-multi-languague-video-translation-tool
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones