Log in ....Tribune

Dot.ComLatest in ITFree DownloadsOn hardware

Monday, December 10, 2001
Lead Article

After seeing & hearing, try talking to Website
Munish Jauhar

IMAGINE dialling into your favourite portal and asking, "Whatís the current price for Share A?" The portal responds: "Share-A at 11:45 a.m. is trading at Rs.2,349, with a day high of Rs. 2,444 and a day low of Rs.2,311." Or imagine checking individual movie timings and show listings, searching for specific items in the bank statement, and having the latest e-mail read to you as you drive to work. All this and much more will soon be possible in the near future using a new generation of user interfaces that makes it possible to access the Internet with your voice, the forerunner amongst them being VoiceXML. VoiceXML is a Web-based markup language for representing human-computer dialogue. In fact, it is just like HTML (Hyper Text Markup Language Ė the language used to build Websites). But while HTML requires a graphical Web browser (with display, a keyboard and a mouse), VoiceXML on the other hand requires a voice browser with audio output (computer-generated and recorded) and audio input (voice and keypad tones). VoiceXML leverages the Internet for voice application development and delivery, greatly simplifying these difficult tasks while at the same helping to create new opportunities. It builds upon the work of earlier technologies such as VoXML from Motorola and SpeechML from IBM to create a standardised way to interact with services through a voice interface. Summing it up, VoiceXML is a standard based on XML (Extensible Markup Language) that allows Web applications and content to be accessed by a phone. And facilitates the development of speech-based telephony applications.


Perhaps the first questions that come to mind are: "Do we really need something like this?" or perhaps, "Are we ready for another new technology fad?" The answer to these questions is becoming increasingly obvious, as many members of the technology community have expressed their displeasure with textual wireless interfaces such as WAP (Wireless Application Protocol). Wireless communication devices have the disadvantage of having small screens, limited input capabilities, and limited processing power (WAP is far harder than voice to use while driving). Theyíve obviously been huge successes as voice communication conduits. However it remains to be seen as to how the public will accept them as data delivery vehicles. One alternative to the textual interface offered by technologies such as WAP is what was originally known as an IVR (Interactive Voice Response) system. Historically, these systems have been found to be unsuitable for allowing access to Web-based content. VoiceXML basically allows you to define a "tree" that steps the user through a selection process - known as voice dialogues. The user interacts with these voice dialogues through the oldest interface known to mankind: the voice! Powerful speech recognition software resides on the server to convert the userís stated selection (i.e. "Yes" or "No") into textual selection. This process is akin to selecting a hyperlink on a traditional Web page. Dialogue selections result in the playback of audio response files (either pre-recorded or dynamically generated using some sort of server-side text-to-speech conversion).
From a business point of view, voice applications open up a host of new revenue opportunities. Perhaps the most obvious revenue opportunity comes from the increased number of minutes we all would be spending on our wireless phones. In addition, advertising will become as commonplace through these services as it currently is on traditional media (Web, TV, radio, etc.). As voice services are added to the traditional carrier plan, there will clearly be a market for pay-as-you-go premium services (information lookups, e-mail, contact databases, etc.). Itís not hard to imagine most consumers opting to listen to a 15-second advertisement in exchange for free access to these premium services!

Voice XML architecture model


Because VoiceXML is XML-based, it is yet another technology driving the move towards content distribution and management in XML. VoiceXML applications can be used in numerous ways and some examples are given below:

  • Voice portals: Just like Web portals, voice portals can be used to provide personalised services to access information like stock quotes, weather, restaurant listings, news, etc.

  • Location-based services: You can receive targeted information specific to the location you are dialling from. Applications use the telephone number you are dialling from.

  • Voice alerts (such as for advertising): VoiceXML can be used to send targeted alerts to a user. The user would sign up to receive special alerts informing him of upcoming events.

  • Commerce: VoiceXML can be used to implement applications that allow users to purchase over the phone. Because voice gives you less information than graphics, specific products that donít need a lot of description (such as tickets, CDs, office supplies, etc.) work well.

  • Unified messaging applications: E-mail messages can be read over the phone, outgoing e-mail can be recorded (and in the future transcribed) over the phone, and voice-oriented address information can be synchronised with personal organizers and e-mail systems. Pager messages can be originated from the phone, or routed to the phone.

There are many other areas where voice services will be used, such as checking the status of bids at electronic auction sites, bill payment authorisation, charitable goods pickup scheduling, wake up reminder services, and others we canít conceive of. And while all VoiceXML services will benefit visually impaired persons, it may be that other services will be specially crafted for this community.

This figure illustrates the components of the VoiceXML architecture model. The components include the following:

  • Document server: Processes requests received from the VoiceXML Interpreter and responds with VoiceXML documents.

  • VoiceXML Interpreter: Interprets the VoiceXML documents it receives from the document server.

  • Implementation platform: Controlled by the VoiceXML Interpreter context and VoiceXML Interpreter, the implementation platform generates events in response to user actions (for example, spoken or character input received) and system events (for example, timer expiration). The VoiceXML Interpreter context and VoiceXML Interpreter then handles the events.


Separate from the discussion of VoiceXML is a look at the benefits of voice processing technologies in general. Despite the advent of technologies such as WAP, the fact remains that accessing textual content over a small phone display is difficult and, in some applications, rather unnatural. When adding in any amount of data entry over the phone, it quickly becomes an impractical interface. Voice technologies, on the other hand, takes advantage of the very interface that phones were designed to serve and will undoubtedly be accepted more readily by the general public. VoiceXML, specifically, is a well-structured, uniform way to build logic trees that customers can use to access the information of interest to them. But on the other hand perhaps the biggest disadvantage of voice-based technologies is the rigid structure that they impose on the end user. While a textual interface (i.e. WML) can support popular tools such as search engines and online browsing of catalogues or information, voice technologies are much better at delivering a specific pinpointed bit of information to an end-user (i.e. a stock quote, a movie time, a restaurant location, etc.). One interesting combination of the textual interface and the voice interface is the tool known as a "voice browser". A voice browser allows the user to "speak" links to quickly traverse through textual content, which may be a great compromise...particularly in automobile ó or hands-free type of applications.

The Internet application voice commands are in their infancy. VoiceXML makes it easy to create and deploy this type of technology. Along with seeing and hearing a Website, we can now talk to a Website. It wonít be long before this technology is used worldwide. Voice-commerce is emerging, and soon weíll purchase stocks, inquire about flight schedules, check the weather in Mumbai or New York, send and receive e-mails, or hear the latest news all via Internet voice commands.

How Voice XML works