Notes from the Mobile Voice Conference
Bill Meisel, the author of The Software Society, organizes the program for the annual Mobile Voice Conference, held March 3-5, 2014, in San Francisco in his role as Executive Director, Applied Voice Input Output Society.
The annual Mobile Voice Conference, held March 3-5 in San Francisco, suggested that the innovation in mobile devices was occurring in improved user interaction with the devices, rather than the hardware. The conference (rated over 4.2 out of 5 in a survey of attendees) addressed the question of making applications, services, and devices more usable on the smaller devices. “Digital overload” is likely to limit upgrades to new devices, software, and services if the user is overwhelmed with more and more devices, features, applications, and services. Developers must address the learning curve and inefficiency created by “too much of a good thing,” as I highlighted in a talk at the conference.
The conference addressed the convergence of two major trends: (1) the spread of mobile technology; and (2) the maturing of speech and natural language technology beyond the “tipping point.” The conference also covered supporting multimodal interfaces and other related technology, as well as a wide range of applications fitting the theme. This summary will emphasize a few major themes.
General personal assistants such as Siri that try to do everything have brought natural language interaction by speech to the general public, and many talks mentioned general personal assistants. Sean Brown, Senior Manager, Design and Innovation, Nuance Communications, discussed this trend and options other than Siri in his talk. Andy Peart, CMO, Artificial Solutions, discussed making the “ubiquitous” personal assistant (available on multiple devices) a reality.
Another theme was the growing use and capabilities of specialized personal assistants (“virtual assistants”) that use natural-language speech or text to interact with users. The potential for virtual assistants used for customer service was highlighted in a keynote talk, “Virtual assistants: Transforming the customer experience for the enterprise,” by Robert Weideman, Executive Vice President and General Manager, Enterprise Division, Nuance. A number of companies have deployed such virtual assistants using Nuance technology. Weideman said that 83% of users of such interactive assistants for customer service report their experience was either “fantastic” or “good,” with 98% reporting repeat usage. The use of natural speech recognition in customer service was also discussed in a number of other talks, including a talk by Bruce Pollock, Vice-President, West Interactive. Sebastien Bratieres, Speech Evangelist, dawin, discussed voice-enabled Customer Relationship Management. Howard Lee, CEO, Spoken Communications, emphasized the change in expectations in his talk, “How Siri has changed the call center: new best practices.” Valentine Matula, Sr. Director, Multimedia Technologies, Avaya, discussed how HTML5 and WebRTC will change the use of speech technology in customer service.
Customer service assistants are consumer-oriented. Another area for natural-language assistants is internal use by employees at companies, e.g., by a mobile sales force. Talks emphasizing internal use were presented, including two presentations each from Openstream (Raj Tumuluri, CEO) and Oracle (Brent White, Mobile User Experience Architect, and Anna Wichansky, Senior Director, Applications User Experience). Bachir Halimi, President & CEO, Speech Mobility Communications, also discussed virtual business assistants. Silke Witt, VP of Speech Solutions, Fluential, discussed “An Intelligent Assistant for Wellness.” Marsal Gavalda, Director of Research, Expect Labs, discussed the provocative topic, “The Evolution of Personal Assistants: from Science Fiction to Reality to the Transparent Brain.” Yoryos Yeracaris, CTO, Interactions, discussed how speech recognition could be augmented with human assistance in mobile applications. Robert Harris, President, Communications Advantage, discussed the role of BYOD in Unified Communications. And a trend toward integrating customer service channels, including mobile phones, was discussed by Mike Monegan, Vice President, Applications, 7.
Of course, customers will still contact call centers, but with an increasing expectation that automation will allow natural speech, rather than navigation of complex phone trees. James A. Larson, Vice President, Larson Technical Services, discussed “The Future of IVR Systems in the Mobile Era.” Peter Leppik, CEO, Vocal Laboratories, discussed how companies could measure how well these systems are working. Navdeep Alam, Director of Engineering – Analytics and Prediction, Empirix, discussed transforming raw data to profitable intelligence in mobile networks. Joachim Stegmann, R&I Director Future Communication, Deutsche Telekom, discussed voice analytics in customer service.
Voice interaction need not stand alone. It can be supported by the other capabilities of the device, particularly the visual and touch interface. Many talks demonstrated this interaction. Deborah Dahl, Principal, Conversational Technologies, and Sean Brown, Senior Manager, Design and Innovation, Nuance, discussed, in separate talks, best practices in taking advantage of multimodal interfaces. Matt Yuschik, Mobile Solutions Architect, R&D, CTO Group, Global Consumer Technology, Citicorp, discussed multimodality in mobile banking.
Speech recognition in automobiles is essentially a requirement for communications and infotainment for safety reasons, and a growing part of how automobile manufacturers distinguish their offerings. Benjamin Ao, Chief Engineer, Alpine Electronics Research, discussed “Automotive Speech Application in the Post-Siri Era,” pointing out the expectations Siri has raised for a more natural interaction. Thomas Schalk, Vice President, Voice Technology, Sirius XM, discussed what it would take to make speech interaction in cars a more satisfying experience. Thomas Scheerbarth, Senior Expert, Deutsche Telekom Innovation Laboratories, discussed how users can benefit from smart voice applications, both at home and underway. Paul Liu, Manager of Connected Products and Services, Clarion, further discussed the role of “intelligent voice” in automotive systems. Brian Radloff, Director, Worldwide Sales Engineering, Nuance, discussed voice-only solutions for the connected car.
Hands-free use of mobile devices (including automobiles) raises questions about an always-listening wake-up and its potential power usage issues. In addition, a “wake-up” phrase can act as a privacy gateway (only listening when you explicitly address it). Jeff Rogers, VP Sales, Sensory, discussed current options and how they will evolve. Nick Roche, Technical and FAE Director for the Americas, Wolfson Microelectronics, discussed semiconductor device support for such always-listening technology.
Biometric authentication by voice is a hot topic recently. Alexey Khitrov, President, Speechpro, discussed the case for bi-modal authentication on mobile devices. Julia Webb, VP Sales and Marketing, VoiceVault, discussed global bank mobile app voice authentication deployments.
Smart TVs are another emerging area for natural speech interaction, in particular to find the increasing range of entertainment options available. Addressing “Talking to My TV,” Jeanine Heck, Senior Director, Product Development, Comcast, discussed the importance of the trend. Alexandros Tsilfidis, CEO, Accusonus, in a related talk, discussed the need for specialized voice engines for indoor voice processing.
Wearables were another topic that received attention, with a keynote panel on the subject that included Andrian Lee-Kwen, VP Engineering, Genesys; Jeff Harris, Product Manager, Google; Sunil Vemuri, Co-founder and Chief Product Officer, ReQall; Steven Holmes, vice president, New Devices Group, and general manager, Smart Device Innovation team, Intel; and Eric Migicovsky, Founder and CEO, Pebble. Stan Kinsey, President, Martian Watches, gave his views of the category in a separate talk. Most current examples of wearables require a Bluetooth connection to a smartphone to do more than basic functions (like telling the time!), with Google Glasses one exception. Voice interaction is used by most of the devices, with the speech supported through the smartphone rather than on the device. The issue of fashion was discussed, with a bit of a debate on whether an obvious wearable like Google Glasses or a large smartwatch would become a trend, or whether the devices had to evolve to be smaller and more stylish.
A stimulating keynote panel moderated by Patti Price, Principal, PPRICE Speech and Language Technology Consulting, on “A Conversation about Conversational Interfaces” included panelists Adam Cheyer, co-founder of Siri (before the company was bought by Apple); David Israel, Program Director, AI Center, SRI International; and Barney Pell, Chairman & CEO, QuickPay. The topics ranged widely, largely expressing an optimistic view about not only the continuing improvement, but the expanding utility of conversational interfaces.
Roberto Pieraccini, Computer Speech and Natural Language Technologist, provided a research update on speech and natural language technology. Francesco Cutugno, Professor, LUSI-Lab @University Federico II of Naples Italy, discussed research on adding emotive feedback to mobile speech interfaces (on our way to Samantha in the movie “Her”?).
Some topics less easily categorized:
- William Meisel presented two talks: (1) Natural language as a means of unifying applications and features (“just say or type what you want”); and (2) the potential for interactive voice ads to engage the consumer in an era where classical ads (particularly on mobile phones) are less effective.
- Brian Garr, CEO, LinguaSys, discussed how companies can make natural language multilingual without redeveloping it for every language supported.
- Jordan Cohen, Technologist and Chief Scientist, Speech Morphing, discussed an innovative way to convert one voice to another, demonstrating changing Michelle Obama’s voice to a male voice with the prosody and naturalness preserved.
- Moshe Kogos, Director of Product Management, Nuance, said that voicemail-to-text is a much-desired feature by users, and a growing market.
- Ajay Juneja, CEO/Founder, Speak With Me, discussed the potential synergy of speech recognition executed on the device cooperating with speech recognition in the cloud.
- Egor Naumov, Head of AI Product Group, i-Free, emphasized the importance of voice interaction APIs for mobile apps and smart devices.
- Bradley Music, VP Business Development, Appen, discussed the challenge of collecting speech data globally to develop natural-language speech recognition.
- Wolf Paulus, Staff Software Engineer, Intuit, discussed the potential for “emotional prosody” to improve the naturalness of text-to-speech synthesis.
- A talk prepared by Richard Wallace, Chief Science Officer, Pandorabots, and Mike McTear, Professor, University of Ulster, discussed the use of AIML (Artificial Intelligence Markup Language, supported by the ALICE A.I. Foundation) in virtual assistants.
- Phil Shinn, Speech Scientist, Morgan Stanley, explained the significance of the remarkable fact that almost any language obeys Zipf’s law, and the challenge of understanding why. (Zipf’s law states that given a corpus of natural language utterances, the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, and so on.)
- Peter Lasensky, CEO, NoteVault, discussed a cloud-based application for taking secure notes by voice to document business conversations.
Hopefully, the importance of natural language in the evolution of our use of mobile devices, and its translation into other areas, is evident from these examples. The area is dynamic and evolving rapidly.