XML-Journal
www.XML-Journal.com


by   Hitesh Seth

The previous article in this series, "Building VoiceXML Applications Using J2EE" (XML-J, Vol. 2, issue 2), discussed building dynamic and interactive voice applications using VoiceXML and J2EE. In this issue we'll focus on the tools available to aid development and testing of VoiceXML-based components and applications. We'll discuss how to use the tools to test and debug such applications from a normal Touch-Tone-based phone.

The tools reviewed are:

BeVocal Café
BeVocal Café by BeVocal, Inc., is a hosted VoiceXML platform that allows development and testing of VoiceXML-based applications. It includes tools (see Table 1) such as File Management for uploading VoiceXML, grammar, and audio files, VoiceXML Checker to validate VoiceXML content, a vocal player for replaying sessions, Trace Tool for tracing and debugging the application, Log Browser for viewing the call trace log, and Port Estimator to estimate the number of ports necessary to support a required level of service and concurrent callers. Figure 1 shows the File Management utility. Since Café allows the developer to upload VoiceXML application files, a Web server isn't immediately needed to serve VoiceXML content.

To test VoiceXML applications from a live Touch-Tone phone, developers can dial the BeVocal Café testing number (877-33-VOCAL) and use their personal user ID/PIN to test the currently active VoiceXML file or URL.

BeVocal Café contains a number of technical resources, including a Getting Started guide, a FAQ, and a VoiceXML tutorial and reference. A number of VoiceXML samples are provided to explain the features of its platform and/or serve as starter applications. Café contains a section called Audio Libraries, a large set of frequently used recorded prompts. To enhance collaboration between developers, Café includes Usenet-based threaded for- ums. Find BeVocal Café at http://cafe.bevocal.com/

WebSphere Voice Server SDK
IBM's WebSphere Voice Server SDK (see Table 2) leverages the multimedia capabilities of a workstation - speakers and microphone - and provides a desktop-based simulation environment for testing interactive voice-based applications developed using VoiceXML. Its components include a desktop-based speech recognition engine (IBM's ViaVoice Speech Recognition Engine) to recognize speech input through the microphone, a text-to-speech engine (IBM's ViaVoice Text-to-Speech Engine), a VoiceXML browser, and a DTMF simulator. Applications developed using Voice Server SDK can be deployed on the WebSphere Voice Server. The VoiceXML browser supports grammars specified using JSGF format.

WebSphere Studio 3.5 (see Figure 2), a separate Web development IDE, supports a syntax checking-enabled VoiceXML editor and wizards to create dynamic VoiceXML documents. Voice Server SDK is also integrated with WebSphere Studio, providing the capability to launch the speech browser from within the IDE.

To simulate input for an application, SDK supports voice input through a microphone attached to the workstation and text-simulated input through the keyboard. The DTMF simulator adds the capability to simulate recognition of DTMF inputs as well.

IBM's WebSphere Voice Server SDK is available at http://www.ibm.com/software/speech/enterprise/ep_11.html The site also provides technical resources, such as a programmer's guide, a FAQ, and access to Usenet-based discussion groups.

Mobile ADK
Motorola's Mobile ADK (see Table 3) builds on top of the Motorola Wireless IDE, a common tool for developing applications for phones, mobile phones, and interactive two-way pagers. Key components of Mobile ADK are an IDE that supports validation of VoiceXML applications, a desktop-based VoiceXML simulator, and a Microsoft agent-based application that allows interactive testing of VoiceXML applications. The VoiceXML Simulator supports both simulated text-based input (in the Simulator Response box) and speech input through a desktop microphone (see Figure 3). The simulator also supports DTMF input by clicking on buttons of the phone image.

The wireless IDE provides capabilities such as a color code editor. The IDE is tightly integrated, and a touch of a green toolbar button instantly brings up the VoiceXML Simulator.

At this writing the Mobile ADK 2.0 beta (the version that supports VoiceXML) is available from Motorola's Mobile Internet Exchange (MIX) at http://mix.motorola.com/audiences/developers/madk_intro_dev.asp The site provides technical information such as white papers, a FAQ, a searchable knowledge base, and a Web-based discussion group for collaboration.

V-Builder
Nuance Communications' V-Builder (see Table 4) is a visual IDE for developing VoiceXML-based applications. We're all used to the concept of assembling a visual dialog from fundamental elements such as text areas, text fields, buttons, and menus. V-Builder leverages this concept and represents VoiceXML tags as visual elements, which are contained in the elements palette and can be dragged and dropped to create a VoiceXML document/dialog (see Figure 4). V-Builder also incorporates Nuance SpeechObjects by providing a set of prebuilt speech objects for common dialogs, entry, database and Web queries, and natural language interfaces. SpeechObjects run on top of the Nuance Advanced Speech Recognition (ASR) engine and are integrated with the VoiceXML language using the <object> tag.

V-Builder also allows visual development of grammars that can be used by VoiceXML-based application dialogs. V-Builder supports grammar specification language (GSL) format for describing both online and external grammars.

VoiceXML application components (grammars/dialogs) can be tested using either native desktop audio or a telephone (with an appropriate hardware and/or audio provider). A third-party text-to-speech engine is required if the application incorporates TTS functionality. V-Builder and the other tools are available at http://extranet.nuance.com/developer/ The site provides access to technical documentation - development guides and tutorials - integration with text-to-speech engines and hardware interfaces, and collaboration tools, such as discussion groups and newsletters

Tellme Studio
Tellme Studio, a product of Tellme Networks, Inc., is a hosted VoiceXML platform (see Table 5). The tools in Tellme Studio are in two sections: MyExtensions and MyStudio. The former allows development and publication of applications to the Tellme platform. Enabling MyExtensions allows developers to access consumers.

MyStudio includes a series of development, testing, and debugging toolsets, such as a syntax checker and a record-by-phone option, that allow developers to record prompts for use in the application. As discussed in my previous article, grammars are an important component of a VoiceXML application. They define how the VoiceXML Interpreter and the inherent speech recognition engine should interpret input phrases. Tellme Studio provides tools to validate a grammar (from a scratchpad or external URL), a phrase checker to test a particular grammar, a phrase generator to generate the various phrases the grammar can recognize, and a DTMF Generator to generate DTMF entries for lists of words. Figure 5 shows the DTMF Generator that generates DTMF entries for HTML, XML, and VXML.

Tellme Studio also provides a tool to aid in converting prerecorded prompts to the supported audio file formats. It's available as a plug-in to a popular sound-editing tool (Sonic Foundry's Sound Forge).

Tellme Studio provides a series of technical resources, including a VoiceXML reference, grammar reference, a FAQ, guidelines for creating phone-based applications ("Designing for the Phone"), and technical white papers on VoiceXML. Example VoiceXML code, grammars, and audio prompts are available for easier development of applications. Tellme Studio is available on the Web at http://studio.tellme.com/ and provides Usenet-based threaded forums for communication between developers.

VoiceGenie Developer Workshop
VoiceGenie Technologies' VoiceGenie Developer Workshop (see Table 6) is a hosted VoiceXML platform that provides essential tools such as a call log explorer to view the logs for debugging, an online audio converter that converts prerecorded audio prompts into the supported .vox format, and an extension manager that allows a number of extensions to be defined to test individual applications. A VoiceXML validator converts VoiceXML applications into a series of "blocks" that allow easier visualization of the workings of the application (see Figure 6).

VoiceGenie Developer Workshop provides an extension management tool that allows the developer to manage up to 20 extensions (per account) to develop and test multiple applications. The extensions capability allows multiplexing of applications with a single telephone number.

Technical resources include VoiceXML reference, tutorials, how-to documents, examples, and a FAQ. There are also audio and grammar libraries. VoiceGenie Developer Workshop is available on the Web at http://developer.voicegenie.com. Web-based threaded discussions provide a collaborative environment for the developer community.

voxeo community
voxeo corporation's voxeo community (see Table 7) is a hosted VoiceXML platform that provides an online logger to assist in debugging a VoiceXML application. A URL mapping tool maps phone numbers and allows a developer to register unique phone numbers for multiple VoiceXML-based applications (see Figure 7).

Technical resources include a development guide, tutorials, notes, a FAQ, a Perl module (called vxml.pm) that facilitates generation of VoiceXML code through Perl, and examples to help jump-start your VoiceXML applications. Find them on the Web at http://community.voxeo.com/ Usenet-/Web-based threaded discussions provide a collaborative environment for the developer community.

Conclusion
VoiceXML is an emerging standard. Although it's only been around for a small duration, a number of tools have emerged to jump-start the development of the next generation interactive voice portals. Even though these tools are initial entries, given the enthusiasm and excitement that VoiceXML has created we can expect substantial improvements, developments, and compatibilities.

As we've seen thus far, an important aspect of developing VoiceXML applications is designing and developing grammars, and the strength and flexibility of an interactive VoiceXML application is based on the richness of the grammar. In the next part of this series we'll focus on developing grammars and review the upcoming standards/specifications being developed around it.

Author Bio
Hitesh Seth is chief technology evangelist for SeraNova, a global e-business and mobile solutions consulting firm. He has extensive experience in the technologies associated with Internet application development. Hitesh received his bachelor's degree from the Indian Institute of Technology Kanpur (IITK), India.


Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

All Rights Reserved
Copyright © 2004 SYS-CON Media
E-mail: info@sys-con.com