Development paradigms

By Michael Caton  |  Posted 2004-08-30 Print this article Print

VXML organizes an application into a set of documents that define how the application works through dialog states, typically consisting of menus and forms. Grammars then function as the input an application expects for each dialog state, using either speech or DTMF input and organized as a list of valid responses in a file.

These documents are written in XML, so developers can define how a user will interact with an application. For example, in a travel-booking application, a developer might build a set of documents that will step the user through the processes and subdialogs necessary to capture information about travel times, airlines, frequent-flier data and payment. VXML lets the developer choose speech or DTMF input as appropriate while providing a way for the user to make choices within a grammar the application understands.

SALT uses a variety of elements to organize applications, with Prompt, Listen and DTMF defining the flow of the applications.

Prompt defines how the application queries the user for input and can call inputs from text files and variables that will be converted to speech to actual audio files.

Listen controls the speech-based input for the application or grammars, which can be referenced in line in the application or in a separate file. Listen supports a number of tags for controlling the interaction of the speech input with the underlying application logic. It can also be used to capture speech input to help diagnose faults in the application or speech recognition engine for developers tuning the application.

The DTMF element functions similarly to Listen in that it captures keypad input for the application.

Microsoft has designed SALT to work in conjunction with other interfaces. This lets developers reuse elements of an application on a variety of interfaces, from a Web browser to a PDA.

In one of these multimodal applications, SALT tags would be embedded directly in a Web page. The user accessing that page. The user acessing that page from a PC would be able to interact with the page using traditional input or speech that is recognized locally. On a less powerful device, such as a PDA, users could interact via speech that is recognized on the server using a server-based speech recognition engine in much the same way telephone-based access to that application handles speech input.

There is an effort under way to bring SALT into the VXML sphere. Microsoft is part of the W3Cs Voice Browser Working Group, as are vendors such as Intel Corp. that support both VXML and SALT platforms. Version 3 of VXML is expected to include elements of SALT. The first working draft of VXML 3 is expected at the end of next year; a final version of the standard is slated for 2007.

Technical Analyst Michael Caton can be reached at

Check out eWEEK.coms VOIP & Telephony Center at for the latest news, views and analysis on voice over IP and telephony.


Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel