The Natural Language Processing (NLP) domain is experiencing remarkable growth in many areas, including search engines, machine translation, chatbots, home assistants and many more. One such application of S2ST (speech-to-speech translation) is breaking language barriers globally by allowing speakers of different languages to communicate. It is therefore extremely valuable to humanity in terms of science and cross-cultural exchange. 

Automatic S2ST systems are typically made up of a series of subsystems for speech recognition, machine translation, and speech synthesis. However, such cascade systems may experience longer latency, information loss (particularly paralinguistic and non-linguistic information), and compounding errors between subsystems.

Google’s recent study presents the improved version of Translatotron, which significantly enhances performance. Translatotron 2 employs a new way for transferring the voices of the source speakers to the translated speech. Even when the input speech involves numerous speakers speaking in turn, the updated technique to voice transference is successful while also decreasing the potential for misuse and better complying with our AI Principles. 



Source link