For years people who have struggled to work with mindless interactive voice response (IVR) systems have wondered when computers might become smart enough to react like conversational human beings.
The answer is not very soon and possibly never unless computer scientists can invent a machine with something resembling conscious intelligence.
But that still leaves plenty of room for improvement for the IVR systems we have to work with from time to time. It can get a lot more sophisticated than simply pushing telephone buttons in response to voice commands and questions.
Speech system developers, led by IBM, are conducting research into increasing the natural language and near-conversational capabilities of this technology.
But people essentially have to stop thinking about speech technology in terms of Star Trek androids, suggested a panel of industry experts who discussed the present state of the art in natural language speech technology at the AVIOS SpeechTEK 2004 in San Francisco this week.
You dont need humanoid machine intelligence to build an IVR system that can help you make a travel reservation, check your stocks, manage your financial accounts, retrieve email or any of a thousand other basic tasks, the panel agreed. Enterprises are already implementing speech systems for a wide range of purposes.
The belief that using speech systems "should be exactly like working with a human being is an interesting research goal," said panelist Deborah Dahl, principal of the consulting firm Conversational Technologies, based in Norristown, Pa.
"But in the meantime we need to look at how to make our systems more natural and more usable," she said. Speech system developers are going to do this by developing more sophisticated user interface design conventions that are also practical and effective, she said.
"They arent totally like working with a human being," she said, but "theyre things that people can learn easily, and theyll adapt to this in the same way that we adapt to the conventions of a Windows user interface."
One of the key problems in getting machines to understand natural language, the panelists agreed, is their ability to recognize the highly varied intonations of human speech and the rhythmic prosody that give humans instant understanding of speech.
While the basic technology is there for machines to react to prosodic statements, Dahl said, its still not enough for machines to react with anything resembling conversational speech.
One particularly difficult problem to solve is the human tendency to use the "mixed initiative," such as when we speak a single word as if it were a question.
For example, suggested panel moderator Moshe Yudkowsky, head of Chicago-based speech technology consulting firm Disaggregate, if an IVR system asked "Do you want to fly Tuesday?," a human might respond by indignantly barking "Tuesday?" The IVR system could either recognize that as an affirmative response or as an inappropriate response; both reactions are likely to arouse frustration in the human.