Amazing Google AI Speaks Another Language In Your Voice

Google is working on a speech translation system that retains the original speaker's voice and cadence in the outputted audio. — *Image: Google/Victor Tangermann*

Tech Talk

On Wednesday, Google unveiled Translatotron, an in-development speech-to-speech translation system.

It’s not the first system to translate speech from one language to another, but Google designed Translatotron to do something other systems can’t: retain the original speaker’s voice in the translated audio.

In other words, the tech could make it sound like you’re speaking a language you don’t know — a remarkable step forward on the path to breaking down the global language barrier.

Streamlined Speech

According to Google’s AI blog, most speech-to-speech translation systems follow a three-step process. First they transcribe the speech. They then translate the transcription into the target language, before finally generating audio of the translated speech.

Translatotron, however, skips the text part of this process altogether and instead converts the speech into a spectrogram, an image that depicts audio frequencies. The system then creates a new spectrogram, this time in the target language, which it uses to produce the new audio.

First Steps

In theory, Translatotron will be much faster than other speech-to-speech translation systems since it only has to complete one process rather than three. The use of spectrograms also makes it easier to retain elements of the original audio, such as the speaker’s voice and cadence.

The system isn’t ready to roll out just yet — the examples shared on Google’s GitHub page still sound fairly robotic, and the translations are far from perfect — but the tech offers an exciting glimpse at the future of communications.

More on translation tech: English’s Reign as the “Global Language” Might End, Says Expert