As the speech technologys major trade show kicks off this week in New York, dont expect to hear grandiose announcements of computers and humans speaking in harmony.
Expect instead a steady stream of product updates and new launches from companies such as ScanSoft Inc., Intel Corp. and IBM focused on honing speech recognition and text-to-speech voices as the industry enters a phase of realism promising steady growth. Whether it be supporting more languages, new voice styles or emerging standards, vendors seem more focused on showing off targeted applications than futuristic possibilities.
“The kind of interaction that Data has on Star Trek is still 20 years away,” said James Larson, program chairman for the 9th Annual SpeechTEK International Educational Conference and Exposition, which opened Monday and runs through Thursday.
This years SpeechTEK comes as analysts predict a rebound in the speech recognition market. Gartner Dataquest predicts that after declining in 2002, the market will grow worldwide from about $130 million in revenue this year to $258 million in 2007. Use in call centers and in business portals will account for 76 percent of all speech recognition product shipments, according to Gartner.
The speech industry knows that businesses and consumers are skeptical of speech technology, having witnessed such software as dictation programs that typically miss one out of 20 words, said Larson, manager of advanced human input/output at Intel. He expects the big push at SpeechTEK to be on two-way conversation voice systems, where callers into a call center respond verbally to menus or answer simple questions.
“Users should be skeptical of how much computers can understand,” Larson said. “These conversational systems dont try to understand every word in the English language but only the words in a menu or the words one speaks to fill in the blank.”
In that vein, leading vendors plan to highlight their continued push toward improved accurate speech recognition and fuller support for the languages and speaking styles.
- ScanSoft, of Peabody, Mass., which recently acquired leading speech recognition vendor SpeechWorks, will be launching Version 3.0 of the SpeechWorks Speechify text-to-speech engine. The update will include a new voice for reading back data in a database and new dictionary management capabilities for looking up information. It also improves pronunciation with a focus on distinguishing the way a set of numbers is read such as between a phone number and a Social Security number or ZIP code, said spokeswoman Marie Ruzzo.
- Intel on Monday opened the show by announcing its key software piece for Microsoft Corp.s upcoming Speech Server release expected in the first quarter of 2004. Intel announced that beta testing has begun on the Intel NetMerge Call Manager, one of two choices Microsoft will offer for integrating into an existing telephony network through its Speech Server. The software allows application developers to construct telephone services without focusing on the telephone networks complexities.
The Santa Clara, Calif., company also announced its NetStructure Host Media Processing 1.1 software for voice-enabling enterprise applications. Developers can use it to build IP media servers for interactive voice response services, voice mail, conferencing, fax servers and other telephony applications, and the software will support as many as 120 ports per server. Available now, it costs between $20 and $112 per port depending on the functionality needed.
- IBM, of Somers, N.Y., will be demonstrating at the show the newest versions of its WebSphere speech products, all with support for VoiceXML 2.0, one of two proposed standards in the speech market. They are speech recognition and text-to-speech engine WebSphere Voice Server 4.2, WebSphere Voice Application Access 4.1 for integrating enterprise application for voice access and WebSphere Voice Response 3.1.5 for integrated voice response. All were announced last week.
The release builds natural language extensions into Voice Server and Voice Application Access to provide context in conversation so that a customer asking for a stock quote, for example, could also buy that stock without having to restate the stocks name a second time. Voice Server also has added Korean and Dutch to its stable of 18 supported languages.
- Cepstral LLC, of Pittsburgh, is expanding its base of voices for text-to-speech this week by introducing two French Canadian ones, the male Jean-Pierre and the female Isabelle, and two playful U.S English voices, Damien and Duchess, said Chief Technology Officer and Co-Founder Kevin Lenzo. Cepstral specializes in creating voices for text-to-speech systems, particularly for running on smaller devices such as handhelds and smart phones with lower memory and processing requirements. The company expects to launch at least one new voice per quarter and to focus on North American voices, Lenzo said.
Cepstral also will be announcing the latest version of its voice engine, Theta 2.4. It will include improved voices, better support for markup languages for more natural sounding pronunciation of addresses, updated lexicons and new APIs including support for Microsofts Speech Application Programming Interface (SAPI).
- Speech recognition vendor LumenVox LLC, of San Diego, this week is launching Version 4.0 of its Speech Driven Information System and Speech Recognition Engine. LumenVox is adding VoiceXML export capabilities to its Speech Driven Information System so an application can be ported to a VoiceXML file for Web accessibility. The latest version of the software also has improved client/server functionality and new support for Spanish, among other features, the company said. In addition, the Speech Recognition Engine adds support for Spanish as well as improved performance and support for the a-law 8-kilohertz audio format.