๐ฅ Translate videos easily with SoniTranslate! ๐ฝ๏ธ
Upload a video, subtitle, audio file or provide a URL video link. ๐ฝ๏ธ Gets the updated notebook from the official repository.: SoniTranslate!
See the tab Help
for instructions on how to use it. Let's start having fun with video translation! ๐๐
This is the original language of the video
Select the target language and also make sure to choose the corresponding TTS for that language.
Select how many people are speaking in the video.
Select the voice you want for each speaker.
Replicate a person's voice across various languages.
While effective with most voices when used appropriately, it may not achieve perfection in every case. Voice Imitation solely replicates the reference speaker's tone, excluding accent and emotion, which are governed by the base speaker TTS model and not replicated by the converter. This will take audio samples from the main audio for each speaker and process them.
Active Voice Imitation: Replicates the original speaker's tone
Select a method for Voice Imitation process
Dereverb: Applies vocal dereverb to the audio samples.
Remove previous samples: Remove the previous samples generated, so new ones need to be created.
Upload an audio file of maximum 10 seconds with a voice. Using XTTS, a new TTS will be created with a voice similar to the provided audio file.
Dereverb audio: Applies vocal dereverb to the audio
Generate voice xtts automatically: You can use _XTTS_/AUTOMATIC.wav
in the TTS selector to automatically generate segments for each speaker when generating the translation.
Acceleration Rate Regulation: Adjusts acceleration to accommodate segments requiring less speed, maintaining continuity and considering next-start timing.
Overlap Reduction: Ensures segments don't overlap by adjusting start times based on previous end times; could disrupt synchronization.
Mix original and translated audio files to create a customized, balanced output with two available mixing modes.
Voiceless Track: Remove the original audio voices before combining it with the translated audio.
Soft Subtitles: Optional subtitles that viewers can turn on or off while watching the video.
Burn Subtitles: Embed subtitles into the video, making them a permanent part of the visual content.
Config transcription.
Literalize Numbers: Replace numerical representations with their written equivalents in the transcript.
Sound Cleanup: Enhance vocals, remove background noise before transcription for utmost timestamp precision. This operation may take time, especially with lengthy audio files.
It converts spoken language to text using the 'Whisper model' by default. Use a custom model, for example, by inputting the repository name 'BELLE-2/Belle-whisper-large-v3-zh' in the dropdown to utilize a Chinese language finetuned model. Find finetuned models on Hugging Face.
Choosing smaller types like int8 or float16 can improve performance by reducing memory usage and increasing computational throughput, but may sacrifice precision compared to larger data types like float32.
Divide text into segments by sentences, words, or characters. Word and character segmentation offer finer granularity, useful for subtitles; disabling translation preserves original structure.
Task Status Sound: Plays a sound alert indicating task completion or errors during execution.
Retrieve Progress: Continue process from last checkpoint.
Preview cuts the video to only 10 seconds for testing purposes. Please deactivate it to retrieve the full video duration.
Edit generated subtitles: Allows you to run the translation in 2 steps. First with the 'GET SUBTITLES AND EDIT' button, you get the subtitles to edit them, and then with the 'TRANSLATE' button, you can generate the video
VIDEO | Media link. | Video Path. | HF Token | Preview | Whisper ASR model | Batch size | Compute type | Source language | Translate audio to | Min speakers | Max speakers | TTS Speaker 1 | TTS Speaker 2 |
---|
It can be PDF, DOCX, TXT, or text
This is the original language of the text
Select the target language and also make sure to choose the corresponding TTS for that language.
Videobook config
1. To enable its use, mark it as enable.
Check this to enable the use of the models.
2. Select a voice that will be applied to each TTS of each corresponding speaker and apply the configurations.
Depending on how many you will use, each one needs its respective model. Additionally, there is an auxiliary one if for some reason the speaker is not detected correctly.
TTS Speaker 01
TTS Speaker 02
TTS Speaker 03
TTS Speaker 04
TTS Speaker 05
TTS Speaker 06
TTS Speaker 07
TTS Speaker 08
TTS Speaker 09
TTS Speaker 10
TTS Speaker 11
TTS Speaker 12
๐ฐ Instructions for use:
๐ค Upload a video, subtitle file, audio file, or provide a ๐ URL link to a video like YouTube.
๐ Choose the language in which you want to translate the video.
๐ฃ๏ธ Specify the number of people speaking in the video and assign each one a text-to-speech voice suitable for the translation language.
๐ Press the 'Translate' button to obtain the results.
๐งฉ SoniTranslate supports different TTS (Text-to-Speech) engines, which are:
- EDGE-TTS โ format
en-AU-WilliamNeural-Male
โ Fast and accurate. - FACEBOOK MMS โ format
en-facebook-mms VITS
โ The voice is more natural; at the moment, it only uses CPU. - PIPER TTS โ format
en_US-lessac-high VITS-onnx
โ Same as the previous one, but it is optimized for both CPU and GPU. - BARK โ format
en_speaker_0-Male BARK
โ Good quality but slow, and it is prone to hallucinations. - OpenAI TTS โ format
>alloy OpenAI-TTS
โ Multilingual but it needs an OpenAI API key. - Coqui XTTS โ format
_XTTS_/AUTOMATIC.wav
โ Only available for Chinese (Simplified), English, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Spanish, Hungarian, Korean and Japanese.
๐ค How to Use R.V.C. and R.V.C.2 Voices (Optional) ๐ถ
The goal is to apply a R.V.C. to the generated TTS (Text-to-Speech) ๐๏ธ
In the
Custom Voice R.V.C.
tab, download the models you need ๐ฅ You can use links from Hugging Face and Google Drive in formats like zip, pth, or index. You can also download complete HF space repositories, but this option is not very stable ๐Now, go to
Replace voice: TTS to R.V.C.
and check theenable
box โ After this, you can choose the models you want to apply to each TTS speaker ๐ฉโ๐ฆฐ๐จโ๐ฆฑ๐ฉโ๐ฆณ๐จโ๐ฆฒAdjust the F0 method that will be applied to all R.V.C. ๐๏ธ
Press
APPLY CONFIGURATION
to apply the changes you made ๐Go back to the video translation tab and click on 'Translate' โถ๏ธ Now, the translation will be done applying the R.V.C. ๐ฃ๏ธ
Tip: You can use Test R.V.C.
to experiment and find the best TTS or configurations to apply to the R.V.C. ๐งช๐
๐ News
๐ฅ 2024/18/05: Overlap reduction. OpenAI API key integration for transcription, translation, and TTS. Output type: subtitles by speaker, separate audio sound, and video only with subtitles. Now you have access to a better-performing version of Whisper for transcribing speech. For example, you can use kotoba-tech/kotoba-whisper-v1.1
for Japanese transcription, available here. You can find these improved models on the Hugging Face Whisper page. Simply copy the repository ID and paste it into the 'Whisper ASR model' in 'Advanced Settings'. Support for ass subtitles and batch processing with subtitles. Vocal enhancement before transcription. Added CPU mode with app_rvc.py --cpu_mode
. TTS now supports up to 12 speakers. OpenVoiceV2 has been integrated for voice imitation. PDF to videobook (displays images from the PDF).
๐ฅ 2024/03/02: Preserve file names in output. Multiple archives can now be submitted simultaneously by specifying their paths, directories or URLs separated by commas. Added option for disabling diarization. Implemented soft subtitles. Format output (MP3, MP4, MKV, WAV, and OGG), and resolved issues related to file reading and diarization.
๐ฅ 2024/02/22: Added freevc for voice imitation, fixed voiceless track, divide segments. New languages support. New translations of the GUI. With subtitle file, no align and the media file is not needed to process the SRT file. Burn subtitles to video. Queue can accept multiple tasks simultaneously. Sound alert notification. Continue process from last checkpoint. Acceleration rate regulation
๐ฅ 2024/01/16: Expanded language support, the introduction of whisper large v3, configurable GUI options, integration of BARK, Facebook-mms, Coqui XTTS, and Piper-TTS. Additional features included audio separation utilities, XTTS WAV creation, use an SRT file as a base for translation, document translation, manual speaker editing, and flexible output options (video, audio, subtitles).
๐ฅ 2023/10/29: Edit the translated subtitle, download it, adjust volume and speed options.
๐ฅ 2023/08/03: Changed default options and added directory view of downloads..
๐ฅ 2023/08/02: Added support for Arabic, Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Korean, Persian, Polish, Russian, Turkish, Urdu, Hindi, and Vietnamese languages. ๐
๐ฅ 2023/08/01: Add options for use R.V.C. models.
๐ฅ 2023/07/27: Fix some bug processing the video and audio.
๐ฅ 2023/07/26: New UI and add mix options.