The World Wide Web Consortium (W3C) on Tuesday announced that it has published two key pieces of the Speech Interface Framework— VoiceXML 2.0 and the Speech Recognition Grammar Specification.
Officials at Cambridge, Mass-based W3C said VoiceXML 2.0 is targeted at combining Web-based development and content delivery with interactive voice response capabilities, while the Speech Recognition Grammar Specification (SRGS) is needed to ensure VoiceXMLs support for speech recognition.
The W3Cs Speech Interface Framework will enable users to use ordinary telephones to interact with Web services by keying in or speaking commands, W3C officials said.
In addition to VoiceXML and SRGS, the W3Cs Speech Interface Framework includes the Speech Synthesis Markup Language (SSML) for handling spoken prompts, SRGS for supporting speech recognizer elements, and Voice Browser Call Control (also known as Call Control XML or CCXML) for delivering telephony call control support for XML, W3C officials said.
SRGS enables applications to establish what words users will be prompted to say in voice systems, but also has been applied to other applications such as handwriting recognition, the W3C said.
Meanwhile, although VoiceXML 2.0 is a significant milestone, the leaders of the effort are not planning on letting up.
“VoiceXML work doesnt stop with Version 2.0,” said Janet Daly, a spokeswoman for the W3C. “The Voice Browser Working Group already has a 2.1 spec in the pipeline and theyll be working on [Version] 3.0. Theyre looking toward including new work that takes into account contributions from W3C members, like XHTML plus Voice [Extensible Hypertext Markup Language plus Voice or XHTML+Voice] and SALT [Speech Application Language Tags].”
Microsoft Corp. supports SALT and IBM Corp. supports XHTML+Voice, although both are members of the W3C Voice Browser Working Group and IBM is a founder of the work on VoiceXML.
Meanwhile, at next weeks SpeechTEK show in San Francisco, which will run in conjunction with VSLive! and the Microsoft Mobile Developers Conference, Microsoft chairman Bill Gates is slated to announce Microsoft Speech Server 2004 in his keynote speech.
“W3C Speech Interface Framework is crucial to Microsofts vision of making speech mainstream,” said Xuedong Huang, general manager of Microsofts Speech Technologies Group, in a statement. “We have implemented SRGS, SSML and SI into Microsoft Speech Server 2004 that integrates speech into HTML through SALT. Such a seamless integration with HTML has enabled our customers to extend their existing investments from desktop to multimodal and telephony voice access in a single, cost effective step. Microsoft Speech Server also provides development tools in the popular Visual Studio .NET environment, paving the way for Speech Interface Framework to be adopted by the mainstream Web developers,” Huang continued.
Igor Jablokov, program director for pervasive computing at IBM, also in a statement, said “VoiceXML 2.0, which has been key in the growth of speech applications by providing a standards-based framework, allows businesses to deploy applications today that leverage existing development skills and resources. Because it allows speech deployments to be built over a standard Web-application infrastructure, VoiceXML also provides a clear upgrade path as applications grow—unlike closed, proprietary languages.
VoiceXML Forum and Analysts
Tout New Standards”>
Meanwhile, the VoiceXML Forum on Tuesday announced its support for the W3Cs move to standardize VoiceXML 2.0. The Piscataway, N.J.-based organization said the more than 370 companies in the VoiceXML Forum support the specification and are creating and offering VoiceXML applications and services.
In addition, the VoiceXML Forum announced its plans to launch a VoiceXML Platform Certification Program, where VoiceXML vendors will be able to have their products tested for compliance with the latest versions of the VoiceXML standard.
“Just as important as interacting with visual user interfaces is the need to interact with voice, phone, and other verbal forms of interaction,” said Ron Schmelzer, an analyst with ZapThink LLC, of Waltham, Mass. “As such, the W3C has produced the VoiceXML specification, focused on allowing users to interact with a variety of voice-based interfaces including touch-tone keypads, spoken commands, prerecorded speech, synthetic speech, and music. VoiceXML enables users to create interfaces that can be used across Web browsers, telephones, and voice-activated systems.”
“VoiceXML allows users to create a description of a dialog between computer and user that can output text, graphics, synthesized speech, digitized audio, and also provides means to recognize inputs from all these sources,” he added. According to Schmelzer, a number of companies, including AT&T, Lucent, Hewlett-Packard, IBM, Motorola, and others are working on voice-based browsers, intelligent phone applications, and other applications based on VoiceXML.
The VoiceXML Forum said that more than 10,000 VoiceXML-based applications have been deployed throughout the world. For instance, Cingular Wireless uses VoiceXML services from BeVocal Inc. to provide voice-activated dialing and other services to its more than 22 million customers, officials at the VoiceXML Forum said.