This New Tech Can Copy Anyone's Voice Using Just a Minute of Audio

lyrebird speech synthesis voice security — Detail of a computer screen with sound waves in stereo. *Image: mrtom-uk/Getty*

Taking Your Word

We regularly hear about new technologies for editing images in a unique way or better algorithms for visual recognition software. Clearly, a lot of work is being done to improve image generation techniques, but very rarely, however, does news about new voice-editing tech emerge. Adobe’s Project VoCo software is one of just a few exciting examples, but now, Montreal-based startup Lyrebird believes it’s done something even more impressive.

[infographic postid=”19357″][/infographic]

Like VoCo, Lyrebird’s latest application program interface (API) synthesizes speech using anyone’s voice. Unlike VoCo, which requires 20 minutes of audio to generate its replication, Lyrebird’s tech only needs a minute-long sample of the voice it’ll synthesize.

And, as if that’s not impressive enough, Lyrebird’s new service doesn’t require a speaker to say any of the actual words it needs. It can learn from noisy recordings and put different intonations into the generated audio to indicate varied emotions, also.

A Concerned Voice

Lyrebird’s new tech is revolutionary, indeed. It doesn’t just edit audio recordings — it makes it easy for someone to generate a new recording that truly sounds like it was spoken by a particular person and not created by a computer.

This raises some rather interesting questions, and not only does Lyrebird acknowledge these, the company actually wants everyone else to as well:

Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud, and more generally any other problem caused by stealing the identity of someone else […] We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.

In short, Lyrebird want people to know they can easily be duped by audio, and hopes this knowledge will actually prevent fraud: “By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks.”

Being aware of the potential to be bamboozled by audio is one thing, but protecting oneself from potential fraud is another. Still, the value of Lyrebird’s technology can’t be denied. Whether its usefulness for things like creating more realistic-sounding virtual assistants outweighs its potential for nefarious endeavors remains to be seen.