Advanced Speech Research Sounding Sweet to IBM

Making speech capabilities more natural and conversational and improving user authentification are some of the company's goals, Pervasive Computing general manager says.

SAN FRANCISCO—IBM is barreling forward with its research on advanced technology to make speech applications and Interactive Voice Response (IVR) systems more powerful than ever, said Gary Cohen, general manager of IBMs Pervasive Computing group.

Customers are already productively using IBMs speech technology, and "we want to extend our 30-year history of investing in research in speech," Cohen said during his keynote speech Friday at the AVIOS SpeechTEK 2004 conference here.

IBMs speech-technology research targets four areas: superhuman speech, expressive output, advanced speech tooling and conversational biometrics.

Superhuman speech capabilities would enable speech-recognition applications to work more robustly and in more sophisticated ways in a wider range of environments, Cohen said.

"We want to be able to enable speech recognition in all sorts of environments—natural environments, noisy environments—where the connection on the phone is not all that great all of the time," Cohen said. This technology allows speech systems to adapt to conditions and extend voice recognition beyond its capabilities today, he said.

IBM is also researching ways to make speech systems capable of handling expressive output, not only when reacting to human speech but also responding in a conversational way.

"Emotion is part of the natural way that we communicate. But how do we express emotion when it is IT-generated, and how do we express it appropriately and automatically?" Cohen asked, saying research would address those hurdles.

Free-form dialog support, another area of research, would enable people to speak to machines in a more conversational way. "In fact, we are hoping to integrate that research into the 3.0 [specification] of VoiceXML," Cohen said. IBM hopes to offer advanced development tools that support free-form dialog, he said.

Based on its experience in building enterprise-class speech applications, IBM is convinced that it can build the common class libraries and building-block assemblies to support advanced speech applications, Cohen said.

/zimages/2/28571.gifClick here to read more about IBMs latest speech technology offerings.

Research on conversational biometrics could yield improved effectiveness of user authentication by combining a voice print with other data related to that user, Cohen said. This technology has the potential of driving down the odds of an authentication failure by a factor of 50, he said.

Cohen noted that IBM has accumulated a portfolio of more than 250 patents related to speech and voice technology. IBM intends to use the portfolio to deliver future value "to our partners, our developers and to enterprises," he said.

David Nahamoo, head of IBMs Human Language Technologies group, demonstrated how users could combine voice and text responses to access and make changes to a personal retirement account portfolio.

Showing that the technology remains in a developmental state, the demo halted when it failed to recognize the personal authentication data he entered by voice.

/zimages/2/28571.gifCheck out eWEEK.coms Enterprise Applications Center at for the latest news, reviews, analysis and opinion about productivity and business solutions. Be sure to add our enterprise applications news feed to your RSS newsreader or My Yahoo page: /zimages/2/19420.gif

John Pallatto

John Pallatto

John Pallatto has been editor in chief of QuinStreet Inc.'s since October 2012. He has more than 40 years of experience as a professional journalist working at a daily newspaper and...