Opinion: There's not a lot of news in IBM's open-source gift to the speech application developer community, but behind it are a new level of horsepower and more access to potential customers.
I attended my fifth or sixth SpeechTEK expo in New York on Monday, this time early enough to catch the 8:30 a.m. keynote. Steve Mills, senior vice president of the IBM Software Group, announced IBMs intention to foster the "speech development ecosystem" by offering up IBMs RDCs (reusable dialog components) as open source through the Apache Software Foundation, and its speech application development tools through the Eclipse Foundation.
He also announced that IBM would work with Avaya to more closely integrate its tools, RDCs and its WebSphere application server platform with Avayas call-center and telephony pieces.
I dont know quite how jaded to be about this announcement. Certainly, there is no news in "growing the speech ecosystem" by embedding the specialized expertise of speech experts in modular containers. Neither is there much news in offering these components through templates or through the same IDEs (integrated development environments) used by legions of J2EE programmers.
This is something thats been done before by other speech app-generator vendors, both in CPE platforms and via the Web. Consider Audium,
Fluency and others.
Indeed, one of the main ideas behind VoiceXML itselfthe markup language of interactive voice response (IVR)was to make the technology more accessible and attractive to XML programmers. (The other was to eliminate applications dependence on proprietary host platforms.)
Certainly, there is no news in offering free trials of VoiceXML development tools. BeVocal, TellMe and many other platform providers offer free, Web-based sandboxes, at least for limited trials.
Many of these also offer some speech components for common speech-rec tasks, such as getting dates and Social Security numbers. These freebies are no small thing, because anticipating all the myriad ways that people may say dates and even Social Security numbers is a true art and science. But again, the giveaway is not new.
Open source is not a new concept to the speech marketplace, either, as Mark Plakias, senior analyst at speech-focused Opus Research, noted in a media briefing. Carnegie Mellon University has offered its core speech recognizer as open source, and SpeechWorks, before acquired by Scansoft, open-sourced a VoiceXML interpreter/browser.
Read more here about IBMs open-source announcement.
Using J2EE application serverssuch as IBMs WebSphere, BEA WebLogic, Tomcat and othersto dynamically generate VoiceXML pages is not new; its a development that mirrors the evolution of HTML. The technology of HTML progressed from write-page-once, use-forever to app servers that dynamically assemble user-specific Web pages during a browser-customer interactive session.
The VoiceXML markup language of IVRthe tags that tell a VoiceXML interpreter/browser to play a prompt or listen for a response, for examplehas changed similarly, from merely presenting voice applications to assembling them on the fly, using servers that decide the flow and composition of voice applets.
Neither is it news that the ecosystem needs to expand to nourish all of its inhabitants. The same employee names keep showing up on new business cards as speech companies rise and fall.
Even Plakias remarked in one panel session that on its 10th anniversary, SpeechTEK
is still showcasing an industry of less than $1 billion dollars. While growing, speech recognition only comes into play in 5 percent of all ports of IVR applications in use today, according to 2004 figures from Tern Systems.
Where the news may be.