I attended my fifth or sixth SpeechTEK expo in New York on Monday, this time early enough to catch the 8:30 a.m. keynote. Steve Mills, senior vice president of the IBM Software Group, announced IBMs intention to foster the “speech development ecosystem” by offering up IBMs RDCs (reusable dialog components) as open source through the Apache Software Foundation, and its speech application development tools through the Eclipse Foundation.
He also announced that IBM would work with Avaya to more closely integrate its tools, RDCs and its WebSphere application server platform with Avayas call-center and telephony pieces.
I dont know quite how jaded to be about this announcement. Certainly, there is no news in “growing the speech ecosystem” by embedding the specialized expertise of speech experts in modular containers. Neither is there much news in offering these components through templates or through the same IDEs (integrated development environments) used by legions of J2EE programmers.
This is something thats been done before by other speech app-generator vendors, both in CPE platforms and via the Web. Consider Audium, Fluency and others.
Indeed, one of the main ideas behind VoiceXML itself–the markup language of interactive voice response (IVR)–was to make the technology more accessible and attractive to XML programmers. (The other was to eliminate applications dependence on proprietary host platforms.)
Certainly, there is no news in offering free trials of VoiceXML development tools. BeVocal, TellMe and many other platform providers offer free, Web-based sandboxes, at least for limited trials.
Many of these also offer some speech components for common speech-rec tasks, such as getting dates and Social Security numbers. These freebies are no small thing, because anticipating all the myriad ways that people may say dates and even Social Security numbers is a true art and science. But again, the giveaway is not new.
Open source is not a new concept to the speech marketplace, either, as Mark Plakias, senior analyst at speech-focused Opus Research, noted in a media briefing. Carnegie Mellon University has offered its core speech recognizer as open source, and SpeechWorks, before acquired by Scansoft, open-sourced a VoiceXML interpreter/browser.
Using J2EE application servers–such as IBMs WebSphere, BEA WebLogic, Tomcat and others–to dynamically generate VoiceXML pages is not new; its a development that mirrors the evolution of HTML. The technology of HTML progressed from write-page-once, use-forever to app servers that dynamically assemble user-specific Web pages during a browser-customer interactive session.
The VoiceXML markup language of IVR–the tags that tell a VoiceXML interpreter/browser to play a prompt or listen for a response, for example–has changed similarly, from merely presenting voice applications to assembling them on the fly, using servers that decide the flow and composition of voice applets.
Neither is it news that the ecosystem needs to expand to nourish all of its inhabitants. The same employee names keep showing up on new business cards as speech companies rise and fall.
Even Plakias remarked in one panel session that on its 10th anniversary, SpeechTEK is still showcasing an industry of less than $1 billion dollars. While growing, speech recognition only comes into play in 5 percent of all ports of IVR applications in use today, according to 2004 figures from Tern Systems.
Next Page: Where the news may be.
Wheres the News
What may be news here–and Im eager to hear other opinions–is free access to a whole store of prebuilt speech-recognizing modules. I can only recall seeing starter-kit subsets in the past. This also may be the first open-source access to a toolset geared for the dynamically generated VoiceXML environment.
Audium (which I know a little better, since its within easy traveling range) has offered limited free trials of its J2EE tools and IDE in the past. Audium also has a great spokesman in its twenty-something CEO, Michael Bergelson, and the company is not going to set the entry bar very high for the chance to sell you its builder and runtime platform. But Audium is a little, 4-year-old company in a building in Chinatown. IBM is IBM.
IBM has had a VoiceXML speech application tool for years, has its own speech recognition technology in ViaVoice and has hitched its WebSphere app server to a range of IVR/telephony servers including Cisco VOIP platforms, Genesys Voice Server and its own Direct Talk, whose name has changed more times than I can remember.
In terms of actual speech-enabled IVR deployments, however, it doesnt compare all that favorably to relative pipsqueaks such as TellMe (which automates call-center interactions for Fidelity, Verizon and others) or Genesys (which does 1-800-Flowers). IBMs biggest announced speech customers have been in car navigation or assistance apps: OnStar for General Motors and Honda.
But IBM is very much about professional services now, and it gets into a lot of enterprises for a lot of different projects. Some of its most recent customer wins have been, if not about speech technology, at least telephony-related—see Dow Chemical. That gives it a lot of potential weight in the speech marketplace. Indeed, it has gotten 20 companies to sign on to this announcement, Audium among them.
By announcing particular integrations with Avaya, it may be looking for access to more call-center projects and the speech sale potential therein. IBM also has the clout to promote a standard speech toolset, among Java programmers. In this regard, it competes with Microsoft, which announced its .NET approach to voice application development and its Visual Studio speech SDK. On this pre-expo day of SpeechTek 2004, Microsoft was conspicuous in its absence.
Perhaps cynicism is not the best response to IBMs song, just because Ive heard it before. In fact, to have it sung with a range as large as IBMs–with ownership of so many pieces and manpower—can only be a good thing for speech. (How many of us have seen a commercial for Audium on prime-time television lately?)
If anything is going to inspire Java programmers to write speech apps, and enterprises to pay the extra money to speech-enable touch-tone IVRs, it may be IBMs and Avayas imprimaturs and professional services arms.
Ellen Muraskin can be reached at Ellen_Muraskin@ziffdavis.com.
VOIP & Telephony Center Editor Ellen Muraskin has been observing and illuminating the murky intersection of computer intelligence and telephony since 1993. She reaches for her VOIP line when the rain makes her POTS line buzz.