What You Mean, Not What You Say

 
 
By Timothy Dyck  |  Posted 2003-05-05 Email Print this article Print
 
 
 
 
 
 
 

The voice recognition packages available today base their recognition process around a probabilistic table of word pairs and triples used to map spoken phonemes to written text.

The voice recognition packages available today base their recognition process around a probabilistic table of word pairs and triples used to map spoken phonemes to written text.

This knowledge of likely word relationships allows the software to predict that (to use a phrase from one of our test documents) "sagebrush strewn" is more often written than "sagebrush sewn" and so is more often likely to be a correct transcription.

Voice recognition and other complex pattern recognition software tasks such as computer vision or document search are just in the beginning stages of understanding and taking advantage of query context.

This is something that humans do instinctively. We unconsciously rely on clues such as speaker identity, location, current activities and the general topic of conversation surrounding a specific utterance to help us fill in missing sounds and words.

This isnt all that different from the "frame model" thats been part of artificial intelligence discussions since the 1960s.

For example, there are things you can expect of someone whos in the "paying for dinner" frame thats a subset of the "sitting in a restaurant" frame thats nested in the "social situation" frame. And those expectations help you understand that persons speech more accurately than if you did not have those contexts.

It follows that major progress in the voice recognition field could incorporate context such as face recognition, calendar coordination and other contextual data, such as contents of the chart that the conference room projector is displaying at the moment or the headings in the current document.

Advances in metadata integration and ubiquitous networking may have as much to do with what comes next in voice recognition as advances in acoustics or probability.

 
 
 
 
Timothy Dyck is a Senior Analyst with eWEEK Labs. He has been testing and reviewing application server, database and middleware products and technologies for eWEEK since 1996. Prior to joining eWEEK, he worked at the LAN and WAN network operations center for a large telecommunications firm, in operating systems and development tools technical marketing for a large software company and in the IT department at a government agency. He has an honors bachelors degree of mathematics in computer science from the University of Waterloo in Waterloo, Ontario, Canada, and a masters of arts degree in journalism from the University of Western Ontario in London, Ontario, Canada.
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Rocket Fuel