Microsoft’s Speech Recognition Tech Is Officially as Accurate as Humans

Cortana is about to become more powerful.

10. 20. 16 by Dom Galeon
Image by Microsoft

“Human parity” achieved

A study published last Monday, heralded as an historic achievement by Microsoft, details a new speech recognition technology that’s able to transcribe conversational speech as well as humans — or at least, as best as professional human transcriptionists (which is better than most humans).

The technology scored a word error rate (WER) of 5.9%, which was lower than the 6.3% WER reported just last month. “[I]t’s the lowest ever recorded against the industry standard Switchboard speech recognition task,” Microsoft reports. The rate is the same as (or even lower than) the human professional transcriptionists who transcribed the same conversation.

“We’ve reached human parity,” says Xuedong Huang, Microsoft’s chief speech scientist. The new technology uses neural language models that allow for more efficient generalization by grouping similar words together.

The achievement comes decades after speech pattern recognition was first studied in the 1970s. With Google’s DeepMind making waves in speech and image recognition (and speaking like humans do), the technology is Microsoft’s timely contribution to the fast-paced artificial intelligence (AI) research and development.


The achievement was unlocked using the Computational Network Toolkit, Microsoft’s homegrown system for deep learning.

Next step: Understanding

The applications for the new technology are bound to improve user experience for Microsoft’s personal voice assistant for Windows and Xbox One. “This will make Cortana more powerful, making a truly intelligent assistant possible,” says an excited Harry Shum, the executive vice president heading the Microsoft Artificial Intelligence and Research group. Of course, it will also develop better speech-to-text transcription software.

Credits: The Verge

Microsoft clarifies, however, that parity does not mean perfection. The computer did not recognize every word clearly, which is something not even humans could do perfectly (nor can Siri or other existing voice assistants).

Impressive as it is, there remains room for improvement. The next goal: making computers understand human conversation. “The next frontier is to move from recognition to understanding,” says Geoffrey Zweig, Speech & Dialog research group manager.


Care about supporting clean energy adoption? Find out how much money (and planet!) you could save by switching to solar power at By signing up through this link, may receive a small commission.

Share This Article

Keep up.
Subscribe to our daily newsletter to keep in touch with the subjects shaping our future.
I understand and agree that registration on or use of this site constitutes agreement to its User Agreement and Privacy Policy


Copyright ©, Camden Media Inc All Rights Reserved. See our User Agreement, Privacy Policy and Data Use Policy. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with prior written permission of Futurism. Fonts by Typekit and Monotype.