VoiceXML: Bringing the Web to the Phone

Rarely has a new standard seen such broad and swift adoption by the industry at large as VoiceXML.

The global impact and universal penetration of the Web have been driven predominantly by the simplicity of the open HTML standard. The Web development model brought vendor and network independence to distributed applications and significantly reduced the costs and skills required to deliver powerful solutions quickly.

Similarly, Voice eXtensible Markup Language, or VoiceXML, is an emerging open standard that brings this Web development model to the interactive voice response (IVR) and call center markets. Now existing HTTP gateways to enterprise services and data, built using technologies like Secure Sockets Layer (SSL) and cookies, can be seamlessly extended to the phone. VoiceXML has rapidly received adoption and support from all corners of the voice technology industry: As of this month, dozens of companies and more than 15,000 developers have announced VoiceXML-compatible products and have actively begun developing VoiceXML applications.

VoiceXML is designed for creating dynamic, Internet-powered voice applications that feature synthesized speech, digitized audio, recognition of spoken and DTMF (touch-tone) key input and telephony. Its major goal is to bring the advantages of Web-based development and content delivery to IVR applications.

A caller places a phone call from any phone (which, of course, has no "browser" software in it) to a service provider that is essentially running a collection of "VoiceXML browsers." The VoiceXML browser has hardware and software that enables it to automatically answer the phone and manage the call, just like traditional IVR systems. The VoiceXML browser literally interacts with a Web server hosting a particular application exactly the way Netscape Navigator or Internet Explorer interacts with a Web server. Then the VoiceXML browser makes HTTP requests for specific VoiceXML documents and audio files and it posts data it collects from the caller back to the Web server over HTTP.

The key difference is that in the case of VoiceXML, "rendering" the application consists of playing a series of audio files over the phone line, recording whatever the caller says in response, then passing that recording through voice recognition software to get a result, and reacting appropriately.

In March 1999, voice technology leaders at AT&T, Lucent Technologies and Motorola came together to launch an industry consortium called the Voice eXtensible Markup Language Forum (VoiceXML Forum). Each of these companies had considerable resources invested in voice technology, including seminal efforts to produce a simple text-based markup language for voice applications. The stated mission of the VoiceXML Forum was to author and evangelize an Internet-based open standard to make the resources of the World Wide Web accessible by telephone.

One of the key challenges VoiceXML initially met in the marketplace was quality and expressiveness. While companies such as Nuance and SpeechWorks International had clearly proven that it was possible to design and deploy commercial-grade voice applications for companies like United Airlines and Charles Schwab & Co., it was not at all clear that a simple markup language could deliver sufficient functionality and flexibility to do the same. Additional key challenges included performance, security and, fundamentally, whether the new specification would gain enough industry momentum to become truly viability in the marketplace.

The Future of VoiceXML

Rarely has a new standard seen such broad and swift adoption by the industry at large as VoiceXML. In a little over a year, virtually every company working on voice technology has embraced VoiceXML and has committed to the open standards approach for the evolution of the standard. There are more than a dozen VoiceXML platforms commercially available today in a variety of form factors; customers can choose from stand-alone software, integrated hardware/software suites, network-based voice infrastructure services and even an open source toolkit.

As of this writing, VoiceXML platform vendors are aggressively implementing key additions and clarifications of the next version of the VoiceXML specification. Millions of phone calls to commercially deployed VoiceXML applications are handled every week.

Moving forward, the evolution of the VoiceXML standard will fall under the auspices of the World Wide Web Consortium (W3C), the same standards body responsible for such key Web standards as HTML and Lightweight Directory Access Protocol. With adoption by the enterprise market growing rapidly, the future for the VoiceXML standard and voice applications in general appears bright, even given the challenging economic landscape.

Jeff Kunins is senior manager of technical marketing at Tellme Networks. He is the co-creator of Tellme Studio, the first and largest VoiceXML developer community in the world. Jeff speaks at industry events around the world and is currently at work on one of the first books on VoiceXML. Feel free to reach him at [email protected].

If youve got crisp, original thinking on a cutting-edge topic, we will print your views on the Interactions page and on our Web site. E-mail column proposals to [email protected] and [email protected]