Speech Server Launch Re-asks Old Questions

 
 
By Peter Coffee  |  Posted 2004-03-31 Email Print this article Print
 
 
 
 
 
 
 

Is speech ready for prime time, and are future markets safe for innovators?

Is speech recognition a stupid computer trick, or a much-needed feature that finally works? Thats the question that developers need to answer in the wake of last weeks launch of Microsofts Speech Server 2004. The question must quickly be narrowed down into more specific terms. Speech recognition can mean almost anything. At one extreme, it might be only the speaker-independent recognition of a tiny vocabulary of individual words: "Press or say the number 1, " as the voice-response systems on call-handling systems tell us. At the other extreme is the still unrealized goal of a system that can carry on a conversation with anyone who can be understood by another human being, like the fictional HAL 9000. (See the MIT Press book, "Hals Legacy"). Customers appear to be growing in their acceptance, and even approval, of speech recognition technology at their point of first contact with an enterprise. Given the alternatives of long waiting times for a live attendant, or cumbersome keypad interaction, more than two-thirds of customers now appear to find speech recognition both convenient and effective.
The most common criticism of speech recognition as an enterprise productivity tool is a matter of scale. One person demonstrating speech-command systems to co-workers may be impressive; several dozen people in a crowded office bay, all talking to their computers, may be oppressive or even intolerable. Fortunately, the most recent research suggests that it may not be necessary to speak out loud—not even at the volume required by a headset microphone—to use speech as a command interface.
Workers are also spending more of their time, thanks to wireless networks, away from their desks and closer to the problems that theyre solving, whether that means being on the road or on the factory floor. Using voice input in these environments may be not only convenient, but also mandatory as a matter of safety. If background noise is overwhelming, lip-reading algorithms may make the difference. Location information, derived from GPS or other technologies, can also provide valuable clues to what a user might be intending to say. Even so, I met last week with representatives of Applied Voice & Speech Technologies Inc., who made a worthwhile point: being able to recognize speech accurately is not the same thing as having a well-designed speech-driven application, any more than having accurate keyboard or mouse-based input means having a good user interface design. Microsofts Speech Server, like Microsofts Windows, certainly gives developers a rich environment in which to ply their craft, and Im certain that well see Microsoft offering industry-leading tools toward that end. Open industry standards will also play an important role (the W3C now defines a syntax for representing grammars), but Im certain that Microsofts own applications will set a high bar that independent developers will be challenged to jump—and that alternative speech-oriented platform providers, like Apple, Opera and IBM, will have to face in the marketplace.
And that reminds me of a comment I made concerning Microsofts proposal to settle its U.S. antitrust suit, back in 2001: "What about the proposed clause that defines a personal computer as one that has a keyboard? Did no one involved in this settlement ever see a Tablet PC demonstration? Look, Ma, no keys!… Three years from now, how many people will use voice-response systems in their cars as their mobile gateways to the Internet?" That clause is still there in the Final Judgment of the case: "VI. Definitions… Q. Personal Computer means any computer configured so that its primary purpose is for use by one person at a time, that uses a video display and keyboard… Servers, television set top boxes, handheld computers, game consoles, telephones, pagers, and personal digital assistants are examples of products that are not Personal Computers within the meaning of this definition." Those issues associated with the Microsoft antitrust litigation come bubbling to the surface with last weeks European Union action: for whatever its worth, my past analyses of key documents and decisions in that case are now linked from a single page with some introductory comments as part of eWEEKs ongoing special report on the companys legal trials. I wish this sort of thing werent important to our technology choices, but it looks as if speech-based devices may compel us to ask and answer some of these questions all over again. Speak to me about your platform hopes and fears at peter_coffee@ziffdavis.com.
 
 
 
 
Peter Coffee is Director of Platform Research at salesforce.com, where he serves as a liaison with the developer community to define the opportunity and clarify developers' technical requirements on the company's evolving Apex Platform. Peter previously spent 18 years with eWEEK (formerly PC Week), the national news magazine of enterprise technology practice, where he reviewed software development tools and methods and wrote regular columns on emerging technologies and professional community issues.Before he began writing full-time in 1989, Peter spent eleven years in technical and management positions at Exxon and The Aerospace Corporation, including management of the latter company's first desktop computing planning team and applied research in applications of artificial intelligence techniques. He holds an engineering degree from MIT and an MBA from Pepperdine University, he has held teaching appointments in computer science, business analytics and information systems management at Pepperdine, UCLA, and Chapman College.
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel