|
Perhaps the first questions that come to mind are: "Do we really
need something like this?" or perhaps, "Are we ready for
another new technology fad?" The answer to these questions is
becoming increasingly obvious, as many members of the technology
community have expressed their displeasure with textual wireless
interfaces such as WAP (Wireless Application Protocol). Wireless
communication devices have the disadvantage of having small screens,
limited input capabilities, and limited processing power (WAP is far
harder than voice to use while driving). They’ve obviously been huge
successes as voice communication conduits. However it remains to be seen
as to how the public will accept them as data delivery vehicles. One
alternative to the textual interface offered by technologies such as WAP
is what was originally known as an IVR (Interactive Voice Response)
system. Historically, these systems have been found to be unsuitable for
allowing access to Web-based content. VoiceXML basically allows you to
define a "tree" that steps the user through a selection
process - known as voice dialogues. The user interacts with these voice
dialogues through the oldest interface known to mankind: the voice!
Powerful speech recognition software resides on the server to convert
the user’s stated selection (i.e. "Yes" or "No")
into textual selection. This process is akin to selecting a hyperlink on
a traditional Web page. Dialogue selections result in the playback of
audio response files (either pre-recorded or dynamically generated using
some sort of server-side text-to-speech conversion).
From a business point of view, voice applications open up a host of new
revenue opportunities. Perhaps the most obvious revenue opportunity
comes from the increased number of minutes we all would be spending on
our wireless phones. In addition, advertising will become as commonplace
through these services as it currently is on traditional media (Web, TV,
radio, etc.). As voice services are added to the traditional carrier
plan, there will clearly be a market for pay-as-you-go premium services
(information lookups, e-mail, contact databases, etc.). It’s not hard
to imagine most consumers opting to listen to a 15-second advertisement
in exchange for free access to these premium services!
Voice XML architecture model |
 |
Because VoiceXML is XML-based,
it is yet another technology driving the move towards content
distribution and management in XML. VoiceXML applications can be used in
numerous ways and some examples are given below:
-
Voice portals: Just
like Web portals, voice portals can be used to provide personalised
services to access information like stock quotes, weather, restaurant
listings, news, etc.
-
Location-based
services: You can receive targeted information specific to the location
you are dialling from. Applications use the telephone number you are
dialling from.
-
Voice alerts (such
as for advertising): VoiceXML can be used to send targeted alerts to a
user. The user would sign up to receive special alerts informing him of
upcoming events.
-
Commerce: VoiceXML
can be used to implement applications that allow users to purchase over
the phone. Because voice gives you less information than graphics,
specific products that don’t need a lot of description (such as
tickets, CDs, office supplies, etc.) work well.
-
Unified messaging
applications: E-mail messages can be read over the phone, outgoing
e-mail can be recorded (and in the future transcribed) over the phone,
and voice-oriented address information can be synchronised with personal
organizers and e-mail systems. Pager messages can be originated from the
phone, or routed to the phone.
There are many other
areas where voice services will be used, such as checking the status of
bids at electronic auction sites, bill payment authorisation, charitable
goods pickup scheduling, wake up reminder services, and others we can’t
conceive of. And while all VoiceXML services will benefit visually
impaired persons, it may be that other services will be specially
crafted for this community.
This figure illustrates
the components of the VoiceXML architecture model. The components
include the following:
-
Document
server: Processes requests received from the VoiceXML Interpreter
and responds with VoiceXML documents.
-
VoiceXML
Interpreter: Interprets the VoiceXML documents it receives from the
document server.
-
Implementation
platform: Controlled by the VoiceXML Interpreter context and
VoiceXML Interpreter, the implementation platform generates events
in response to user actions (for example, spoken or character input
received) and system events (for example, timer expiration). The
VoiceXML Interpreter context and VoiceXML Interpreter then handles
the events.
Separate from the
discussion of VoiceXML is a look at the benefits of voice processing
technologies in general. Despite the advent of technologies such as WAP,
the fact remains that accessing textual content over a small phone
display is difficult and, in some applications, rather unnatural. When
adding in any amount of data entry over the phone, it quickly becomes an
impractical interface. Voice technologies, on the other hand, takes
advantage of the very interface that phones were designed to serve and
will undoubtedly be accepted more readily by the general public.
VoiceXML, specifically, is a well-structured, uniform way to build logic
trees that customers can use to access the information of interest to
them. But on the other hand perhaps the biggest disadvantage of
voice-based technologies is the rigid structure that they impose on the
end user. While a textual interface (i.e. WML) can support popular tools
such as search engines and online browsing of catalogues or information,
voice technologies are much better at delivering a specific pinpointed
bit of information to an end-user (i.e. a stock quote, a movie time, a
restaurant location, etc.). One interesting combination of the textual
interface and the voice interface is the tool known as a "voice
browser". A voice browser allows the user to "speak"
links to quickly traverse through textual content, which may be a great
compromise...particularly in automobile — or hands-free type of
applications.
The Internet application voice commands
are in their infancy. VoiceXML makes it easy to create and deploy this
type of technology. Along with seeing and hearing a Website, we can now
talk to a Website. It won’t be long before this technology is used
worldwide. Voice-commerce is emerging, and soon we’ll purchase stocks,
inquire about flight schedules, check the weather in Mumbai or New York,
send and receive e-mails, or hear the latest news all via Internet voice
commands.
How Voice XML works |
 |
|