New technology improves speech recognition
Speech recognition and natural language processing technology are used in personal assistants like Apple’s Siri and the advanced version of Google voice search. The software which does the language processing is derived by analyzing large databases of speech and text. As the amount of data available to analyze and the computing power to analyze it grow, the speech recognition and language interpretation can become more accurate and handle broader contexts. The bottom line for consumers is that applications that use natural language interaction will continue to get better.
There are indications that improvements based on more complex software models are already reaching the marketplace. This month, Microsoft blogged that it is using “Deep Neural Network” technology in updates to the voice search and text-message transcription in its Bing search. As the name suggests, the underlying models for recognition processing are a mathematical representation of how the networks of neurons in our brains might work. Microsoft reported that the new technology has reduced error rates by 15%, response time by half, and increased robustness in noise. The Microsoft posting cited the contribution of “the large datasets provided by Bing’s massive index.”
Nuance Communications (which provides the speech recognition part of Apple’s Siri and the Samsung S-Voice personal assistant on Galaxy smartphones, mentioned the use of Deep Neural Networks in a Nuance FY13 Financial Analyst Day presentation last December. They noted that the amount of “training data” used in speech recognition had risen from 1,000s of hours five years ago to 100,000s of hours recently and that computation had risen from one workstation (one CPU core) to 10,000s of processing cores using IBM’s Blue Gene supercomputer. (IBM and Nuance partner in speech recognition research.)
Other companies, including Google, as well as research efforts at universities such as Stanford, are exploring neural network technology for modeling complex data. The net result for all of us may be an increasingly intuitive interaction with computers.