A common standard for developing multimodal applications—those that combine speech, text and graphics—is closer than ever, according to IBM Corp. executive Ozzie Osborne.
Osborne, an IBM vice president and segment manager of Big Blues Pervasive Computing group, delivered the opening keynote at the Friday session of the VoiceXML Planet Conference and Expo in Boston. He praised the SALT Forum, an industry group backing a multimodal development technology that uses speech application language tags for joining the Multimodal Working Group of the World Wide Web Consortium (W3C).
IBM is not a member of the group but belongs to a different industry alliance that favors a combination of VoiceXML and XHTML to develop multimodal applications, the so-called X+V group. The two groups, which have feuded in the past, are now contributing to the W3Cs Multimodal Working Group, currently meeting in Finland.
“Im pleased to say were now working together, trying to come to a single standard for multimodal applications,” said Osborne in his keynote.
After the speech Osborne noted the groups were not that far apart in their approaches, though some differences remain.
“There are some good similarities,” he said. “Theyre both based on XHTML.”
But he also noted that there were differences, such as SALT not supporting standalone voice applications for things like call flow and telephony as X+V does. He also said that SALT, which would add speech tags to existing markup languages, would require changes in applications business logic that X+V doesnt require.
“I think the differences are solvable,” said Osborne. “And well solve them in the optimum manner as opposed to compromising and causing harm.
“A single standard will grow the industry.”
: Osborne Keynote”>
At stake is the next generation of speech-enabled applications, which would combine speech with text and graphics and give users the choice of which interface to use—point-and-click, typing or voice.
Such applications would have the greatest appeal for sales-force and field-force automation applications as well as telematics applications that can be accessed while driving a car, Osborne said.
Multimodal applications could eventually gain acceptance in the consumer space as well, adding voice interfaces to televisions, refrigerators and microwaves. Users could select the way they interact with applications and devices, Osborne said.
“[Multimodal applications] are a way for humans to interact with technology,” he said. “Its what we have to do to get machines to do what we want them to do rather than what they want us to do.
“You cant just do voice, you cant just do graphics. We have to move technology forward to the way we live today.”
Wide deployment of multimodal applications would eventually herald a world of “transparent computing,” Osborne said.
“People would use computers and not know theyre using computers.”
Even if the W3C working group yields a common standard, challenges remain. Wireless networks remain slow and telecommunications companies, as well as most other technology companies are laboring under financial constraints in the hobbling tech economy.
And even in marketing, there will be challenges. While multimodal applications will have an almost endless number of possibilities, most would add convenience more than any Earth-shifting technological breakthroughs.
“Everyones looking for a killer app,” said Osborne. “I dont think that there really is a killer app.”