SAN FRANCISCO—IBM is barreling forward with its research on advanced technology to make speech applications and Interactive Voice Response (IVR) systems more powerful than ever, said Gary Cohen, general manager of IBMs Pervasive Computing group.
Customers are already productively using IBMs speech technology, and “we want to extend our 30-year history of investing in research in speech,” Cohen said during his keynote speech Friday at the AVIOS SpeechTEK 2004 conference here.
IBMs speech-technology research targets four areas: superhuman speech, expressive output, advanced speech tooling and conversational biometrics.
Superhuman speech capabilities would enable speech-recognition applications to work more robustly and in more sophisticated ways in a wider range of environments, Cohen said.
“We want to be able to enable speech recognition in all sorts of environments—natural environments, noisy environments—where the connection on the phone is not all that great all of the time,” Cohen said. This technology allows speech systems to adapt to conditions and extend voice recognition beyond its capabilities today, he said.
IBM is also researching ways to make speech systems capable of handling expressive output, not only when reacting to human speech but also responding in a conversational way.
“Emotion is part of the natural way that we communicate. But how do we express emotion when it is IT-generated, and how do we express it appropriately and automatically?” Cohen asked, saying research would address those hurdles.
Free-form dialog support, another area of research, would enable people to speak to machines in a more conversational way. “In fact, we are hoping to integrate that research into the 3.0 [specification] of VoiceXML,” Cohen said. IBM hopes to offer advanced development tools that support free-form dialog, he said.
Based on its experience in building enterprise-class speech applications, IBM is convinced that it can build the common class libraries and building-block assemblies to support advanced speech applications, Cohen said.
Research on conversational biometrics could yield improved effectiveness of user authentication by combining a voice print with other data related to that user, Cohen said. This technology has the potential of driving down the odds of an authentication failure by a factor of 50, he said.
Cohen noted that IBM has accumulated a portfolio of more than 250 patents related to speech and voice technology. IBM intends to use the portfolio to deliver future value “to our partners, our developers and to enterprises,” he said.
David Nahamoo, head of IBMs Human Language Technologies group, demonstrated how users could combine voice and text responses to access and make changes to a personal retirement account portfolio.
Showing that the technology remains in a developmental state, the demo halted when it failed to recognize the personal authentication data he entered by voice.