If humans can do it, it seems that computers can, too — perhaps even better. As artificial intelligence (AI) systems improve, so does their ability to do human tasks, and now one system has proven particularly adept at something most humans can’t even do: read lips.
A team of researchers from the University of Oxford and Google’s DeepMind applied deep learning to a dataset of BBC TV shows to develop a lip-reading system that outperformed even a professional. The AI was trained on a Lip Reading Sentences (LRS) dataset of some 5,000 hours of English TV programs featuring a total of 118,000 natural sentences, according to the published study.
After the training, the system was presented with a dataset comprising shows that aired on the network between March and September 2016. It was able to correctly annotate 46.8 percent of all the words without any mistakes. This outperformed other lip-reading systems and even a human lip-reading professional, who correctly annotated only 12.4 percent of words without error from 200 randomly selected clips from the same dataset.
The DeepMind and Oxford group plans to make its BBC dataset into a training resource that’s going to be used to help other researchers in the field push their own systems to new heights. Ziheng Zhou at the University of Oulu, Finland, thinks this will be a huge step toward developing fully automated lip-reading systems. “Without that huge data set, it’s very difficult for us to verify new technologies like deep learning,” he tells New Scientist.
Even as the technology improves, don’t expect an AI lip-reading system to be used in situations similar to what you’d find in spy movies or crime scene investigation flicks — it has more practical applications than that. “We believe that machine lip readers have enormous practical potential, with applications in improved hearing aids, silent dictation in public spaces, and speech recognition in noisy environments,” says Yannis Assael, who is working on LipNet, another lip-reading AI that worked on a significantly smaller dataset.
In other words, you may be able to one day silently ask Siri a question in a quiet theater or ask your Nest to lower the temperature in your house from across the room during a noisy party. Another level of communication with our devices in on the way.