Microsoft's 'Audible,' Apple's Siri Are the Future of UIs: Analysts

By Robert J. Mullins  |  Posted 2012-06-22

Microsoft's 'Audible,' Apple's Siri Are the Future of UIs: Analysts

The next version of the Windows Phone operating system will add voice command technology similar to €œSiri€ on the Apple iPhone 4S, both of which signal the growing sophistication of voice-recognition technology, according to industry observers.

The Microsoft technology was demonstrated for software developers attending the Windows Phone Summit June 20 in San Francisco where Microsoft unveiled the Windows 8 Platform Preview, a major update of the Windows Phone 7 operating system rolled out in 2010.

It will not just open applications, but it allows a user to execute voice commands within an application, as Kevin Gallo, developer platform general manager in Microsoft€™s Windows Phone Division, demonstrated at the event.

Gallo kept referring to the system as €œAudible,€ but that was just for demo purposes; the final Microsoft technology will go by another yet-to-be-chosen name, a Microsoft spokesperson explained.

Audible is an app already available in the Windows Phone Marketplace from the service, owned by Amazon, that lets users download audio books. Gallo said Microsoft€™s version of Audible is going to become available to every developer to create their own apps that use voice recognition and commands on Windows Phone 8 devices.

Gallo used Audible to open an audio book of €œGame of Thrones.€ Things didn€™t start out so well when Gallo opened with €œAudible, play €˜Game of Thrones,€™€ and Audible inexplicably replied, €œSearching for Saint Louis, Missouri.€ Gallo quickly recovered and successfully opened the audio book. He then said €œAudible, next chapter€ and the book skipped to the next chapter. After listening to the audio book for a few seconds, he then said, €œPause€ and the book playback stopped.

€œNot only was I able to launch the application using speech, but I was also able to give it a command, and control its behavior when I started it,€ Gallo told the audience. €œI basically had a conversation with my app €¦ and got to exactly what I wanted without having to touch the screen.€

Although voice command technology isn€™t perfect and doesn€™t work as flawlessly as it appears to in TV ads, it continues to improve, said Scott Ellison, a mobile industry analyst practice leader at IDC.

€œTouch was the last major innovation when it comes to a lot of mobile devices €¦ and now you€™re seeing a lot of focus on voice, driven by what Apple€™s been able to do with Siri,€ said Ellison.

Siri has similar capabilities to Audible, though Gallo said one differentiator is that Audible works within apps. Siri can help an iPhone 4S user make a call, respond to text messages, get directions and perform other tasks. And Siri has generated considerable buzz since it came out with the new phone in the fall of 2011, including popular TV ads featuring actors such as John Malkovitch, Zooey Deschanel and Samuel L. Jackson, prompting countless parody videos on You Tube.

While voice-command or speech-recognition technology is still evolving, Siri and Audible indicate that it is starting to come of age. According to the Website Quora, an online reference source, Siri is based on a project called CALO that was funded by the Defense Advanced Research Projects Agency and was part of DARPA's Personalized Assistant that Learns (PAL) initiative.

Voice Recognition Evolving From Many Development Efforts

Coincidentally, perhaps, Google in March hired a former director at DARPA, Regina Dugan, but it is not clear what her role will be or whether she worked on the CALO project.

While Microsoft and Apple have their voice command innovations, Google has long been involved in delivering voice-recognition technology, too, even prior to the launch of its Android mobile OS, said William Stofega, an IDC program director in Mobile Device Technology and Trends research.

Stofega was a longtime fan of Google 411, the directory-assistance service that was based on voice recognition. People would call G-O-O-G-4-1-1 on their mobile phones and get a phone number for a person or business and dial through to that number for free. Google launched it in 2007 but dropped it in 2010.

€œAt first, there were troubles, but then you started to see it start to learn and get better and better,€ Stofega said. €œBy the time they pulled it off the market, it was great; it was fantastic and I miss it.€

Other voice-recognition technology leaders include Nuance Communications, famous for the Dragon line of voice-to-text dictation software. Stofega says Nuance technology is behind the Siri service, although Apple also adds proprietary artificial intelligence technology so that the actress Deschanel can tell Siri, €œLet€™s get tomato soup delivered€ in one TV ad, and Siri can respond with a list of area restaurants that deliver tomato soup. The Siri technology comes from an Apple acquisition in 2010.

Microsoft€™s voice-recognition smarts come from its 2007 acquisition of Tellme Networks, a voice-recognition company that is operated as a wholly owned subsidiary of Microsoft.

Stofega is skeptical that any voice service can be as accurate as their portrayals in TV ads€”as they can fail to recognize heavy accents and be affected by ambient background noise€”but says the technology keeps improving.

Analyst Ellison says modern voice recognition in systems like Siri or Audible depends on three key factors. First is the quality of voice recognition, the ability for the software to understand the words being said.

Second is determining the meaning of the words, he said. On interactive voice-recognition systems (IVR) used in a customer service setting, there are prompts like €œSay €˜Balance€™€ if someone is calling their bank to see how much money is in their checking account. As IVR€™s evolved, however, the systems have learned to understand the meaning of a string of words in context, allowing a customer to instead ask €œHow much money is in my checking account?€

Third, Ellison explained, is the integration of the voice-recognition technology and the application with the database of information available to the Siri or Audible service.

€œThe [Siri] engine needs to understand and format this data in a way that another application provider can use. This service needs to be able to meta-tag their data appropriately to provide the appropriate responses,€ Ellison said.

A smartphone personal assistant, as the technology evolves, can be a real aid to public safety, said analyst Stofega. Traffic accidents caused by distracted driving can occur because drivers are fumbling with their touch-screen phones to find information or, worse, sending text messages. As voice services evolve, drivers will be able to dictate texts and ask for directions to that place that serves that great tomato soup.

Rocket Fuel