On Wednesday, Google unveiled Translatotron, an in-development speech-to-speech translation system.
It’s not the first system to translate speech from one language to another, but Google designed Translatotron to do something other systems can’t: retain the original speaker’s voice in the translated audio.
In other words, the tech could make it sound like you’re speaking a language you don’t know — a remarkable step forward on the path to breaking down the global language barrier.
According to Google’s AI blog, most speech-to-speech translation systems follow a three-step process. First they transcribe the speech. They then translate the transcription into the target language, before finally generating audio of the translated speech.
Translatotron, however, skips the text part of this process altogether and instead converts the speech into a spectrogram, an image that depicts audio frequencies. The system then creates a new spectrogram, this time in the target language, which it uses to produce the new audio.
In theory, Translatotron will be much faster than other speech-to-speech translation systems since it only has to complete one process rather than three. The use of spectrograms also makes it easier to retain elements of the original audio, such as the speaker’s voice and cadence.
The system isn’t ready to roll out just yet — the examples shared on Google’s GitHub page still sound fairly robotic, and the translations are far from perfect — but the tech offers an exciting glimpse at the future of communications.
READ MORE: Google’s Translatotron can translate speech in the speaker’s voice [Engadget]
More on translation tech: English’s Reign as the “Global Language” Might End, Says Expert