18 12 2013
Is speech recognition on mobile phones a big deal, or just a gimmick?
Having a conversation with your mobile phone gets lots of attention, with even a movie this holiday season about a guy falling in love with his mobile personal assistant. Apple and Google have competing entries for your conversation, with Google’s voice search evolving into a direct competitor to Apple’s Siri. (And it is competitive: Apple is even being sued by Google’s Motorola Mobility for supposedly violating a Motorola patent that seems to relate directly to Siri.) Samsung thinks the personal assistant functionality is important enough to team with Nuance Communications to provide an alternative personal assistant, S-Voice, on its best-selling Galaxy phones. And Microsoft is rumored to be preparing a mobile assistant called Cortana that will debut with Windows Phone 8.1.
The attention that voice-interactive personal assistants are getting isn’t surprising. First, they have the potential to be a ubiquitous user interface that can work similarly on any device (“Just tell me what you want”), easing the frustration of navigating on small devices or discovering how to use new features. They can even eventually unify our digital experience, for example, by letting us text someone from our mobile phone by telling our Smart TV to do so from the comfort of our living room couch.
Second, a personal assistant app is an extension of search, with all the advertising revenue that search represents. This is particularly true if one can enter a natural-language inquiry as text as well as speaking, as is the case for today’s Google search. Expect improvements in natural language understanding technology to be accelerated by the deep motivation that money provides. (Google blogged that the speech recognition in the latest version of Android makes 25% fewer word errors than the previous version.)
But much discussion of Siri and similar offerings in the press seems to minimize the functionality, typically focusing on speech recognition errors. This seems unbalanced with respect to discussion of other forms of interacting with mobile devices. For example, how many errors do you make using a virtual keyboard on a touch screen? As the natural language technology gets better, the frustration o typing and navigating through multiple screens on a small device will make the tradeoff in favor of natural language more pronounced.
The bias against speech recognition probably stems in part from a sense that computers are competing in areas we think of as particularly human, such as understanding language. And, as this blog has opined, it is appropriate to be concerned with automation taking over too many jobs.
But personal assistants don’t compete with humans. They augment human abilities by allowing us to use our language skills to access computer power. Without people, personal assistants wouldn’t have anyone to assist!
The tendency to characterize natural-language interaction technology as a sideshow is shortsighted. Its use is a major trend, one likely to dominate the future of our interaction with digital technology.