For years people who have struggled to work with mindless interactive voice response (IVR) systems have wondered when computers might become smart enough to react like conversational human beings.
The answer is not very soon and possibly never unless computer scientists can invent a machine with something resembling conscious intelligence.
But that still leaves plenty of room for improvement for the IVR systems we have to work with from time to time. It can get a lot more sophisticated than simply pushing telephone buttons in response to voice commands and questions.
Speech system developers, led by IBM, are conducting research into increasing the natural language and near-conversational capabilities of this technology.
But people essentially have to stop thinking about speech technology in terms of Star Trek androids, suggested a panel of industry experts who discussed the present state of the art in natural language speech technology at the AVIOS SpeechTEK 2004 in San Francisco this week.
You dont need humanoid machine intelligence to build an IVR system that can help you make a travel reservation, check your stocks, manage your financial accounts, retrieve email or any of a thousand other basic tasks, the panel agreed. Enterprises are already implementing speech systems for a wide range of purposes.
The belief that using speech systems “should be exactly like working with a human being is an interesting research goal,” said panelist Deborah Dahl, principal of the consulting firm Conversational Technologies, based in Norristown, Pa.
“But in the meantime we need to look at how to make our systems more natural and more usable,” she said. Speech system developers are going to do this by developing more sophisticated user interface design conventions that are also practical and effective, she said.
“They arent totally like working with a human being,” she said, but “theyre things that people can learn easily, and theyll adapt to this in the same way that we adapt to the conventions of a Windows user interface.”
One of the key problems in getting machines to understand natural language, the panelists agreed, is their ability to recognize the highly varied intonations of human speech and the rhythmic prosody that give humans instant understanding of speech.
While the basic technology is there for machines to react to prosodic statements, Dahl said, its still not enough for machines to react with anything resembling conversational speech.
One particularly difficult problem to solve is the human tendency to use the “mixed initiative,” such as when we speak a single word as if it were a question.
For example, suggested panel moderator Moshe Yudkowsky, head of Chicago-based speech technology consulting firm Disaggregate, if an IVR system asked “Do you want to fly Tuesday?,” a human might respond by indignantly barking “Tuesday?” The IVR system could either recognize that as an affirmative response or as an inappropriate response; both reactions are likely to arouse frustration in the human.
Sticking to the Script
However, Bruce Balentine, vice president of speech technologies with Enterprise Integration Group, a design consulting firm in San Ramon, Calif., responded that humans usually intuitively know thats not an optimal way to respond to an IVR system. Designers also understand that IVR systems need to give the humans a limited range of possible responses.
One approach to increasing the interactivity of speech systems is to carefully script them so they can react to a wider range of questions and responses but at the same time provide a clear guidance for the human users, he said.
Designers shouldnt try to add conversational aspects to speech systems if they are only going to encourage users to get off the beaten path of giving and getting information, he suggested.
However, the deployment of speech systems wont be restricted by the current state of the art, Balentine said. In the short term, natural language speech systems will make steady progress without sophisticated conversational capabilities, he said.
“We are going to see a rapid penetration over the next three years or so” of natural language speech applications, primarily in telephone-based IVR systems where they havent typically been seen, Balentine said. Most IVR systems are more primitive systems that offer a limited number of simple command options.
“What will happen as a result of that is that more and more people will become accustomed to simple machine behavior,” and it will provide a base to build on for the future, Balentine said.
Its certainly true that people dont expect much when they are forced to deal with IVR systems. In fact, they dont want to deal with them at all.
It will be a major advance if these systems work well enough so that we impatient and arrogant humans arent instantly prompted to pound the telephones zero key in quest of a human voice as soon as we hear an IVR system kick in.
Be sure to add our eWEEK.com Linux news feed to your RSS newsreader or My Yahoo page: