SeamlessM4T translates speech and dubs into another language

SeamlessM4T translates speech and dubs into another language

Generative models and algorithms in general artificial intelligence they are increasingly taking over the lion’s share of voice input management and real-time translation from one language to another. SeamlessM4T is a Meta project, presented as the first multilingual and multimodal translation and transcription model using artificial intelligence. It allows you to easily communicate using speech and text in different languages.

Build universal translatorLooks like Babylon fish featured in the humorous science fiction novel “The Hitchhiker’s Guide to the Galaxy“, is a truly formidable challenge, as existing systems voice translation AND voice to text they cover only a small part of the languages ​​spoken in the world. However, SeamlessM4T is a huge step forward as it reduces errors and delays by increasingefficiency And quality translation process. This allows people who speak different languages ​​to communicate much more effective.

Meta Seamless M4T

Because SeamlessM4T is multimodal

In the case of SeamlessM4T, we are talking about experience. multimodal since the model is not limited to one communication mode, such as text or voice, but is able to handle various input and output modes in an integrated way. In other words, SeamlessM4T allows you to translate and decode not only text, but also speech into various formats. combinations AND different languages.

This multimodal feature is relevant because it reflects how people communicate in real life using both speech and text in different situations. Consider, for example, situations in which translate speech in a foreign language by turning it into written text to share it with someone who doesn’t speak or understand the same language. Again, think of times when you would like to translate text written in a language you don’t know by automatically doing doubting help someone who can’t read.

Thus, SeamlessM4T’s multi-modal experience allows users to choose communication method best suited to their needs and allows you to easily translate or transcribe texts and speech. This helps create smoother and more natural communication between people who speak different languages, eliminating language barriers through various communication channels.

What doubting

doubting it is a practice used in the entertainment industry, especially in film and television, in which the original voices of actors or characters are replaced with voices translated or dubbed into another language. This process allows the audience to speak another language from the original movie or series to understand the content without having to read the subtitles.

Voice actors re-edit and cover their voices to the original dialogues, trying to synchronize them with the movements of the lips and the intonation of the actors in order to make the viewing experience as realistic as possible.

Here SeamlessM4T provides dubbing to all users, recognizing the original speech, intonation and nuances of the voice. create audio in another language this is as close as possible to the starting version.

What does SeamlessM4T look like?

SeamlessM4T supports Voice recognition speech in almost 100 languages, speech-to-text translation for almost 100 input and output languages, speech-to-speech translation supporting almost 100 input languages ​​and 36 output languages.

The platform introduced by Meta also offers the ability to translate text to text in nearly 100 languages; text-to-speech translation is also provided, supporting almost 100 languages input language and 35 output languages.

For now, as Meta already does for other projects, license with which it is distributed. SeamlessM4T allows for research initiatives. Sorry, use for is not allowed commercial purposes. Mark Zuckerberg’s idea is to make work easier researchers And Developers that they can use SeamlessM4T as the basis for building their projects.

The meta-engineers also released and made public the metadata Seamless alignment, the largest open multimodal translation dataset known to date: it contains 270,000 hours of speech and text extracted from the same speeches, providing an invaluable basis for the implementation of derivative projects. For example, foreducation artificial intelligence.

How to translate with SeamlessM4T without installing anything locally

To see for yourself the results you can achieve with SeamlessM4T, simply launch the Seamless Communication Translation Demo from your web browser. To continue, you must first click on the button Start demo then tick the “I have read and agree to be bound by the Terms of Use“.

One click Start Recordingso you have to enable the meta declaration microphone access device in use.

Seamless M4T test

The next step is language selection: then press Transferafter a few seconds of waiting you will get both Text translation what file audio the result of the overwrite operation.

Interpreting from one language to another

At the top of the page, SeamlessM4T first displays the text generated with motor speech to text starting with the speech received through the microphone. On the other hand, the following two frameworks offer Text translation and an audio track created in another language (speech translation).

Speech to text and dubbing

How to download an audio translation

Considering that Meta does not allow the use of information generated by SeamlessM4T for commercial purposes and that any use of the data must strictly comply with the terms of service, it is possible download audio translation with a simple trick. Let’s see how to do it with Google Chrome.

After generating the translation, you can press the key combination CTRL+MAIUSC+I open Developer Tools from Chrome. Here you need to choose a card Net and finally click on the button “play” in the box Speech translation by SeamlessM4T.

As the last entry in the tab Netthere will be a link starting with the line blob:. You need to right click on it and select Open in new tab.

Export audio translation (dubbing)

Chrome shows audio player pretty spartan: by clicking on the three dots, then on Downloadcan be stored locally in WAV format a translation created with the Meta application.

Download SeamlessM4T audio translation using Chrome

How to install and use SeamlessM4T on your systems

For install SeamlessM4T on a system within your infrastructure or in the cloud, you must first ensure that python AND paragraph installed correctly. In the terminal window, you need to navigate to the folder where you downloaded the SeamlessM4T code. Here you need to enter the following command to install SeamlessM4T and its addictions:

pip install .

If you were in an environment condathe library must also be installed libsndfile with the following command:

conda install -y -c conda-forge libsndfile

Libraries used in the project

It is important to remember that SeamlessM4T bases its work on three bookstores developed by Meta:

  • fair2: is an open source sequencing library that provides components for machine translation, language modeling, and other sequencing tasks. At the moment fair2 it is only supported on Linux and macOS.
  • SONAR AND BLAZER 2.0: SONAR allows you to manage multilingual offerings with a multi-modal approach. It also offers a text and speech encoder for many languages. BLASER 2.0 is a multimodal translation assessment metric.
  • stops: is a data mining library used to train translation models, including speech translation.

SeamlessM4T Usage Examples

To start an S2ST action (speech to speech) from the command line, just type the following command:

m4t_predict <percorso_input_audio> s2st <lingua_destinazione> --output_path <percorso_output_audio>

To perform a T2TT action (text to text) you can use the following statement instead:

m4t_predict <testo_input> t2tt <lingua_destinazione> --src_lang <lingua_sorgente>

The help contained in the README file contains detailed instructions for performing other operations. conclusion use of artificial intelligence SeamlessM4T.

Source link

Leave a Comment