๐Ÿ“ฝ๏ธ SoniTranslate ๐Ÿˆท๏ธ

๐ŸŽฅ Translate videos easily with SoniTranslate! ๐Ÿ“ฝ๏ธ

Upload a video, subtitle, audio file or provide a URL video link. ๐Ÿ“ฝ๏ธ Gets the updated notebook from the official repository.: SoniTranslate!

See the tab Help for instructions on how to use it. Let's start having fun with video translation! ๐Ÿš€๐ŸŽ‰

Choose Video Source
Source language

This is the original language of the video

Translate audio to

Select the target language and also make sure to choose the corresponding TTS for that language.


Select how many people are speaking in the video.

1 12

Select the voice you want for each speaker.

TTS Speaker 1
TTS Speaker 2

Replicate a person's voice across various languages.

While effective with most voices when used appropriately, it may not achieve perfection in every case. Voice Imitation solely replicates the reference speaker's tone, excluding accent and emotion, which are governed by the base speaker TTS model and not replicated by the converter. This will take audio samples from the main audio for each speaker and process them.

Active Voice Imitation: Replicates the original speaker's tone

Method

Select a method for Voice Imitation process

1 10

Dereverb: Applies vocal dereverb to the audio samples.

Remove previous samples: Remove the previous samples generated, so new ones need to be created.

Upload an audio file of maximum 10 seconds with a voice. Using XTTS, a new TTS will be created with a voice similar to the provided audio file.

Dereverb audio: Applies vocal dereverb to the audio

Generate voice xtts automatically: You can use _XTTS_/AUTOMATIC.wav in the TTS selector to automatically generate segments for each speaker when generating the translation.

1 2.5

Acceleration Rate Regulation: Adjusts acceleration to accommodate segments requiring less speed, maintaining continuity and considering next-start timing.

Overlap Reduction: Ensures segments don't overlap by adjusting start times based on previous end times; could disrupt synchronization.


Audio Mixing Method

Mix original and translated audio files to create a customized, balanced output with two available mixing modes.

0 2.5
0 2.5

Voiceless Track: Remove the original audio voices before combining it with the translated audio.


Subtitle type

Soft Subtitles: Optional subtitles that viewers can turn on or off while watching the video.

Burn Subtitles: Embed subtitles into the video, making them a permanent part of the visual content.


Config transcription.

Literalize Numbers: Replace numerical representations with their written equivalents in the transcript.

Sound Cleanup: Enhance vocals, remove background noise before transcription for utmost timestamp precision. This operation may take time, especially with lengthy audio files.

1 30
Whisper ASR model

It converts spoken language to text using the 'Whisper model' by default. Use a custom model, for example, by inputting the repository name 'BELLE-2/Belle-whisper-large-v3-zh' in the dropdown to utilize a Chinese language finetuned model. Find finetuned models on Hugging Face.

Compute type

Choosing smaller types like int8 or float16 can improve performance by reducing memory usage and increasing computational throughput, but may sacrifice precision compared to larger data types like float32.

1 32

Text Segmentation Scale

Divide text into segments by sentences, words, or characters. Word and character segmentation offer finer granularity, useful for subtitles; disabling translation preserves original structure.


Diarization model
Translation process

Output type

Task Status Sound: Plays a sound alert indicating task completion or errors during execution.

Retrieve Progress: Continue process from last checkpoint.

Preview cuts the video to only 10 seconds for testing purposes. Please deactivate it to retrieve the full video duration.

Edit generated subtitles: Allows you to run the translation in 2 steps. First with the 'GET SUBTITLES AND EDIT' button, you get the subtitles to edit them, and then with the 'TRANSLATE' button, you can generate the video


Examples
VIDEO Media link. Video Path. HF Token Preview Whisper ASR model Batch size Compute type Source language Translate audio to Min speakers Max speakers TTS Speaker 1 TTS Speaker 2