Speech Applications Voice New Strengths

VXML and SALT can strengthen speech applications, but they will require new expertise.

Companies that want to broaden the range of features in telephony-based applications now have more options through speech recognition, but the new technology will require the development of greater levels of expertise.

With Version 2.1 of VXML (Voice XML), the World Wide Web Consortium is about to standardize a number of improvements that consolidate several common vendor implementations of features that previously werent inside the scope of VXML. This will allow developers to build more robust applications and handle exceptions more effectively. Version 2.1 of VXML is currently under final review by the W3C (www.w3c.org/voice).

When organizations evaluate IVR (Interactive Voice Response) systems, one of the biggest choices they have is whether to build their speech platform on products based on VXML or those based on Microsoft Corp.s competing specification, SALT (Speech Application Language Tag).

VXML, Version 2.0 of which was ratified in October 2001, is the better choice for building a richer application with deeper hooks into underlying corporate applications and data. SALT, established in August 2002, gives companies a way to leverage more-common programming skills and eventually to build IVR applications that extend beyond the telephone.

/zimages/2/28571.gifClick here to read about how one bank built and tested an IVR system.

VXML and SALT are poised to significantly change the way telephony-based applications work by combining speech recognition, DTMF (dual-tone multifrequency) and TTS (text-to-speech) technologies in a single application design-and-development paradigm. Whereas early-generation IVR and touch-tone-based applications required developing to specific telephony hardware, VXML and SALT abstract development using high-level Web application markup languages.

VXML and SALT deliver the same capability at their cores: They separate the speech interface from business logic and data. Both define how an application will manage interaction between the user and application by determining voice interaction through the use of grammars and telephone keypad input through DTMF.

Both use the Web application development techniques of building pages using XML and tags, respectively, to define how the application will flow and to validate speech or touch-tone input from the user.

From an architectural perspective, SALT differs from VXML in that it defines a way to build multimodal applications so that a voice application can exist in another form through another interface, such as a Web browser.

A more practical difference is the tight integration between SALT and Microsoft developer tools—namely, Visual Studio .Net. This integration will allow anyone familiar with Visual Basic or other Visual Studio development languages to write a speech application. (This can be seen in applications such as Pronexus Inc.s VBSALT, reviewed here, which can be used to accelerate the development of speech applications in Visual Basic.)

VXML, on the other hand, is supported by most of the traditional IVR and telephony application vendors, giving companies broader platform and development language choices.

Next page: Development paradigms.