📽️ SoniTranslate 🈷️

🎥 Translate videos easily with SoniTranslate! 📽️

Upload a video, subtitle, audio file or provide a URL video link. 📽️ Gets the updated notebook from the official repository.: SoniTranslate!

See the tab Help for instructions on how to use it. Let's start having fun with video translation! 🚀🎉

Edit generated subtitles: Allows you to run the translation in 2 steps. First with the 'GET SUBTITLES AND EDIT' button, you get the subtitles to edit them, and then with the 'TRANSLATE' button, you can generate the video

Edit generated subtitles

DOWNLOAD TRANSLATED VIDEO

HF Token

One important step is to accept the license agreement for using Pyannote. You need to have an account on Hugging Face and accept the license to use the models: https://huggingface.co/pyannote/speaker-diarization and https://huggingface.co/pyannote/segmentation. Get your KEY TOKEN here: https://hf.co/settings/tokens

Examples

VIDEO	Media link.	Video Path.	HF Token	Preview	Whisper ASR model	Batch size	Compute type	Source language	Translate audio to	Min speakers	Max speakers	TTS Speaker 1	TTS Speaker 2

URLs

Automatically download the R.V.C. models from the URL. You can use links from HuggingFace or Drive, and you can include several links, each one separated by a comma. Example: https://huggingface.co/sail-rvc/yoimiya-jp/blob/main/model.pth, https://huggingface.co/sail-rvc/yoimiya-jp/blob/main/model.index

1. To enable its use, mark it as enable.

Check this to enable the use of the models.

ENABLE

2. Select a voice that will be applied to each TTS of each corresponding speaker and apply the configurations.

Depending on how many you will use, each one needs its respective model. Additionally, there is an auxiliary one if for some reason the speaker is not detected correctly.

🔰 Instructions for use:

📤 Upload a video, subtitle file, audio file, or provide a 🌐 URL link to a video like YouTube.
🌍 Choose the language in which you want to translate the video.
🗣️ Specify the number of people speaking in the video and assign each one a text-to-speech voice suitable for the translation language.
🚀 Press the 'Translate' button to obtain the results.

🧩 SoniTranslate supports different TTS (Text-to-Speech) engines, which are:

EDGE-TTS → format en-AU-WilliamNeural-Male → Fast and accurate.
FACEBOOK MMS → format en-facebook-mms VITS → The voice is more natural; at the moment, it only uses CPU.
PIPER TTS → format en_US-lessac-high VITS-onnx → Same as the previous one, but it is optimized for both CPU and GPU.
BARK → format en_speaker_0-Male BARK → Good quality but slow, and it is prone to hallucinations.
OpenAI TTS → format >alloy OpenAI-TTS → Multilingual but it needs an OpenAI API key.
Coqui XTTS → format _XTTS_/AUTOMATIC.wav → Only available for Chinese (Simplified), English, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Spanish, Hungarian, Korean and Japanese.

🎤 How to Use R.V.C. and R.V.C.2 Voices (Optional) 🎶

The goal is to apply a R.V.C. to the generated TTS (Text-to-Speech) 🎙️

In the Custom Voice R.V.C. tab, download the models you need 📥 You can use links from Hugging Face and Google Drive in formats like zip, pth, or index. You can also download complete HF space repositories, but this option is not very stable 😕
Now, go to Replace voice: TTS to R.V.C. and check the enable box ✅ After this, you can choose the models you want to apply to each TTS speaker 👩‍🦰👨‍🦱👩‍🦳👨‍🦲
Adjust the F0 method that will be applied to all R.V.C. 🎛️
Press APPLY CONFIGURATION to apply the changes you made 🔄
Go back to the video translation tab and click on 'Translate' ▶️ Now, the translation will be done applying the R.V.C. 🗣️

Tip: You can use Test R.V.C. to experiment and find the best TTS or configurations to apply to the R.V.C. 🧪🔍

📖 News

🔥 2024/18/05: Overlap reduction. OpenAI API key integration for transcription, translation, and TTS. Output type: subtitles by speaker, separate audio sound, and video only with subtitles. Now you have access to a better-performing version of Whisper for transcribing speech. For example, you can use kotoba-tech/kotoba-whisper-v1.1 for Japanese transcription, available here. You can find these improved models on the Hugging Face Whisper page. Simply copy the repository ID and paste it into the 'Whisper ASR model' in 'Advanced Settings'. Support for ass subtitles and batch processing with subtitles. Vocal enhancement before transcription. Added CPU mode with app_rvc.py --cpu_mode. TTS now supports up to 12 speakers. OpenVoiceV2 has been integrated for voice imitation. PDF to videobook (displays images from the PDF).

🔥 2024/03/02: Preserve file names in output. Multiple archives can now be submitted simultaneously by specifying their paths, directories or URLs separated by commas. Added option for disabling diarization. Implemented soft subtitles. Format output (MP3, MP4, MKV, WAV, and OGG), and resolved issues related to file reading and diarization.

🔥 2024/02/22: Added freevc for voice imitation, fixed voiceless track, divide segments. New languages support. New translations of the GUI. With subtitle file, no align and the media file is not needed to process the SRT file. Burn subtitles to video. Queue can accept multiple tasks simultaneously. Sound alert notification. Continue process from last checkpoint. Acceleration rate regulation

🔥 2024/01/16: Expanded language support, the introduction of whisper large v3, configurable GUI options, integration of BARK, Facebook-mms, Coqui XTTS, and Piper-TTS. Additional features included audio separation utilities, XTTS WAV creation, use an SRT file as a base for translation, document translation, manual speaker editing, and flexible output options (video, audio, subtitles).

🔥 2023/10/29: Edit the translated subtitle, download it, adjust volume and speed options.

🔥 2023/08/03: Changed default options and added directory view of downloads..

🔥 2023/08/02: Added support for Arabic, Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Korean, Persian, Polish, Russian, Turkish, Urdu, Hindi, and Vietnamese languages. 🌐

🔥 2023/08/01: Add options for use R.V.C. models.

🔥 2023/07/27: Fix some bug processing the video and audio.

🔥 2023/07/26: New UI and add mix options.