Globalizing software
(creating software for multiple languages and locales),
and the follow-on process of localization, is challenging
enough for “normal” software products
and not-too-complex web sites. However, when it comes
to one of the “newest kids on the block,”
voice-enabled applications, the fun really begins.
There are only
a handful of voice technology providers who have
attempted to create globalized solutions, and Oracle
Corporation is one of them. Recently, LISA interviewed
Curtis Tuckey, Director, and Ashish Vora, Senior
Speech Applications Engineer, at Oracle’s
Voice Laboratory in Chicago in the U.S., to gain
insight into their vision for voice application
globalization. In installment one, the two men outline
Oracle’s voice applications strategy, as well
as the business and technical challenges that lie
ahead. In the second installment, to be published
in a future issue of the Globalization Insider,
they will:
- outline current trends in
voice applications standards;
- describe the very real challenges
presented by voice application globalization;
- and provide recommendations
for content creators and localization vendors
who are preparing to become preferred service
providers to voice applications developers.
If you would
like to meet Curtis Tuckey or Ashish Vora in person
to increase your knowledge of voice-enabled applications,
plan to attend their presentations at the LISA FORUM
EUROPE: “Managing
Content - Moving Markets: Streamlining Global Workflow
Through Content Management,” to be held
in London from June 30-July 3, 2003.
Please define the voice
applications space for our readers.
Most broadly defined,
the voice applications space deals with any type
of application that incorporates a speech-based
interface for input and output of application data.
Traditionally, most speech application development
(in industry parlance, these were interactive voice
response or IVR applications) required proprietary
hardware and software solutions. There was often
little to no interoperability between different
speech technology infrastructure vendors. As a result,
speech application deployments were extremely capital-intensive,
as well as somewhat monolithic in their structure.
The emergence of the
Internet, as well as several efforts to create standards
within the speech technology industry, has significantly
lowered the barriers to entry for speech application
developers. Indeed, the biggest paradigm shift occurred
when the speech technology industry embraced the
Internet model of software development, creating
a markup language called VoiceXML to speed development
of voice-enabled applications. VoiceXML is intended
to replace many of the proprietary software environments
that were originally used to write speech applications.
Leveraging the Internet
model has proved to have many benefits - foremost
among these is the opportunity to make use of components
and technologies that have already been established
as cornerstones of the visual Internet. Writing
voice applications with complex connectivity to
various backend data stores is significantly simplified
by being able to use open connectivity standards
such as ODBC/JDBC or LDAP. In the past, any speech
development requiring access to these kinds of data
stores would have required a custom integration
effort.
Similarly, the Internet
helped foster new mechanisms of application delivery
- namely the invention of web servers, and their
more powerful siblings, application servers. This
innovation has also been crucial to the growth of
the new generation of voice applications. By relying
on application servers to handle many of the details
of application load balancing, caching, authentication,
access, etc., speech technology vendors can focus
their efforts on the core speech recognition and
synthesis components.
There are several ways
to define the roles in this space. In the most generic
sense, there are four main actors that interact
with the voice application development and delivery
processes. The most low-level of these is (1) the
core voice technology vendors that build the
actual Automated Speech Recognition (ASR) and Text-to-Speech
Synthesis (TTS) software packages. These core components
are utilized by the second actor, (2) the speech
platform provider, or “voice gateway”
provider, who integrates the ASR and TTS components
with additional pieces of hardware and software,
such as a Computer Telephony Interface (CTI) and
a VoiceXML interpreter. The third actor in the speech
technology world is (3) the application platform
provider, who provides the application server
from which to deliver applications. The final actor
is (4) the application provider itself who
writes VoiceXML applications deployed through the
application server.
There are some different
deployment scenarios for voice applications that
correspond to a few additional roles in this space.
Specifically, within the role of the fourth actor
(application provider), there are actually three
sub-categories:
1. The Pre-Packaged
Application Space
This space represents
the so-called “out-of-the-box” voice
applications that can be deployed through an application
server with minimal configuration work.
Hosted Application
Space
The key difference
between the Hosted and the Pre-Packaged spaces is
that hosted application deployment runs on an ASP
model, and therefore, the entire infrastructure
for the voice application (hardware, software, etc.)
is managed by the hosting provider.
Custom Applications
Space
When customers need
a custom application solution that cannot be provided
by either the Pre-Packaged or Hosted Application
spaces, they can work with custom applications developers
to provide end-to-end voice solutions. Traditionally,
these custom solutions have been offered by IVR
vendors, but have the negative consequence of often
creating vendor lock-in. As VoiceXML becomes more
and more widely adopted, we expect this Custom Applications
Space to become more cost-effective and more accessible
to consultants not traditionally associated with
voice application development.
In
a nutshell, what is Oracle’s vision for the
market for globalized voice-enabled applications?
The market for Oracle
voice applications is exactly the same as the market
for all other Oracle applications. We are a global
company with offices in over 150 countries, and
versions of our products ship in 30 languages. Voice-enabled
applications in this respect are no different from
any other type of application.
Why
has Oracle made such a strong commitment to this
space?
As far as Oracle
is concerned, voice is simply another modality by
which to access data or content that is stored in
a database. There is all sorts of information that
people need access to while they’re on the
go. The bottom line is that there is content in
a database, and we want to be able to deliver it
to any type of device - be it a desktop web browser,
a PDA, a WAP-enabled cell phone, or even a landline
phone that only allows voice. It simply doesn’t
make sense to have one application server for your
web pages, another application delivery platform
for PDAs and WAP phones, and a third platform for
voice access. So, we designed a single platform
that can deliver applications to all of these devices.
We see voice as a critical
part of this offering because many of the wireless
devices such as WAP phones, and to a lesser extent
PDAs, suffer from serious shortcomings when it comes
to ease of input and overall user experience. Voice
is often a much more natural medium for applications
because it can allow for simplification of user
interactions which require many screentaps or button
presses to complete on a visual device. Additionally,
voice interfaces to applications allow for hands-free
operation, which is essential for many types of
applications - for example, in-vehicle applications
requiring a hands-free mode to conform to safety
laws. Finally, voice applications involve little
or no network connectivity or processing power for
end users. Indeed, since telephony networks (both
wired and wireless) are nearly ubiquitous worldwide,
it thus becomes possible to access Internet content
even from places where no Internet connectivity
is available.
What
are the biggest opportunities in today’s voice
applications market?
There are a tremendous
number of exciting opportunities in the voice applications
market today:
- Call Center Automation (Self-service
Call Centers)
- Enterprise Mobility Applications
(voice access to Personal Information Management,
or PIM)
- Voice-enabled messaging (Voicemail,
Email, SMS, IM)
- Voice dialing (Address Book +
Corporate Directory + White Pages + Yellow Pages)
- Collaborative Software
- Pre-packaged Voice Applications
What
are the most lucrative markets for small- to medium-sized
voice applications developers?
There are four main
markets in which we are seeing traction for voice
applications. First is the horizontal Business
Enterprise market - medium to large organizations
with a need to voice enable access to corporate
information like Personal Information Management
(PIM) data such as calendars, email, etc. This market
also includes applications such as CRM and ERP products.
The second major market
is the horizontal Telecommunications Carrier
market. Telco carriers use voice-enabled applications
to differentiate their services from one another
and to help attract and retain customers. Examples
of successful voice applications deployed in this
market include voice-enabled customer care applications
(a form of Call Center Automation), as well as voice
portals.
Third, we see great
potential for growth in voice application adoption
in the vertical Public Sector - specifically,
government, education and health. Potential applications
here include access to PIM data, voice notifications
for convenience, healthy, safety, and homeland security,
as well as specialized applications that make use
of biometrics such as voice authentication for security
purposes.
Finally, there is the
Consumer Market for voice applications -
this market includes services such as voice access
to news, weather, horoscopes, movie times, etc.
Typically, these applications are bundled together
as part of a consumer-oriented voice portal. Numerous
companies have offerings in this space, including
HeyAnita and BeVocal.
Of these four spaces,
Oracle is focusing its major efforts on creating
solutions for the first three markets.
Do
these opportunities vary in any significant way
by geographic region (the Americas vs. EMEA vs.
China vs. the Rest of Asia/Pacific) or by vertical
market?
In general, the opportunity
for voice access to enterprise data and PIM, and
self-service call centers is the same in the Americas,
EMEA and APAC due to the increased productivity
and fast ROI that voice applications offer. Certainly
there are some differences in the types of applications
that may make sense for a particular region, e.g.,
EMEA and APAC currently do much of their messaging
via SMS so requirements for voice-enabled messaging
in those regions may be quite different from the
requirements in North America.
What
are the top three language markets for localized
voice apps? Is it different from the GUI market?
The language markets
for localized voice applications are greatly influenced
by the availability of the core speech technology
components (ASR and TTS) in the target languages.
Hence, we see the top language markets currently
as:
- U.S. English
- Latin American Spanish
- Brazilian Portuguese
This is likely to be
somewhat different from the GUI market for two reasons.
First, as already mentioned, the lack of core speech
technology components in a particular language makes
deployment to certain languages impossible. GUI
applications simply do not have this problem to
deal with. Second, we expect to see more adoption
of voice and wireless applications in countries
with poor wired infrastructure, since it is these
areas that serve to benefit the most from voice-based
access to information.
What
business advice would you give to voice application
providers as they begin to design a new voice-enabled
product?
The most important
piece of advice is to start early! Globalization
activities really have to be taken into account
from day one of the design process, before the first
line of code is ever written. Ideally, the development
process should allot enough time not just for the
design phase, but also for a feedback phase on these
designs that allows for focus groups to interact
and provide comments.
It is very important
to have access to resources knowledgeable in the
target languages to steer designs away from concepts
that are difficult to internationalize. As a concrete
example of this, we sometimes make use of a spelling
interface that allows users to enter their input
one character at a time - this works fine for English,
but for other languages (particularly the Asiatic
languages), we will probably not be able to implement
this type of interface and are therefore trying
to minimize its usage.
The rest of our advice
for voice application providers is actually quite
simple, and in some ways can be applied to any type
of development effort. However, we really believe
that these guidelines are especially critical to
keep in mind for voice application development:
- Make sure there is a market for
the product being designed.
- Ensure that the requirements
for the product are being adequately captured
and communicated back to the development team.
- Accept that voice application
development is more complicated than screen-based
development. Allocate time, as necessary, to adjust
to the learning curve associated with these applications.
What
are the biggest business hurdles for voice application
developers today?
The main business
hurdles come from managing people’s perceptions
and expectations about voice. There is a perception
that voice applications are hard to deploy and prohibitively
expensive.
In fact, voice applications
are not substantially harder to deploy than any
other type of application and the expense of voice
applications actually becomes significantly less
when their true ROI is considered.
The final business
hurdle overlaps with the world of design and development,
and that is that designing a usable and enjoyable
voice application is very difficult. It is much
easier to design a bad user interface to an application
than to put the effort and time into designing a
voice interface well. From a business perspective,
it’s therefore necessary to be able to manage
customer expectations when it comes to application
functionality and delivery schedules.
What
was the effort associated with developing Oracle’s
Voice Globalization Framework? How much time is
this framework expected to save developers?
The development of
the Voice Globalization Framework was able to build
upon much of the groundwork that had already been
done within Oracle to create support for globalization.
Specifically, Oracle has several groups that have
been defining guidelines and processes for globalization,
including a group called Server Globalization Technologies
(SGT) that provides internationalization support
for core technology components and the Worldwide
Product Translation Group (WPTG), Oracle’s
localization group. Because much of the infrastructure
was already in place, development of the Voice Globalization
Framework was simplified somewhat, and the components
for voice support were added over the course of
about twelve months with several engineers involved
in design and implementation.
This investment is
considered worthwhile, however, because of the time
and money we expect to save creating globalized
voice applications. Essentially, even without the
framework in place, it would be possible to create
globalized applications in multiple languages, but
it would require close to a complete rewrite of
each application for each language. This becomes
cost-prohibitive fairly quickly. Our voice globalization
framework offers the possibility that at the same
time that one designs and develops a voice application
in one language, it can be simultaneously delivered
in thirty other languages with marginal extra effort
per language.
To express it in concrete
terms, writing a voice application takes x resources
(time, money, developers) and having to rewrite
an application 31 times would take 31x resources.
On the other hand, if we have a framework that allows
our extra effort per language to be about 10% of
the original application development resources (this
is a conservative estimate, in practice we think
we may see only 5% or less of original development
resources per language), then our total outlay is
one application with full x resources plus thirty
applications each of which cost 0.1x. Our total
effort then for delivering 31 language versions
of an application is 4x, a nearly 90% reduction
in overall development effort!
What
are the main business and technical reasons why
voice application providers should adopt Oracle’s
platform?
Before asking this
question, voice application providers need to ask
and understand the answers to the following questions.
- Why should a voice application
provider use an Internet-based application server
versus some other kind of delivery mechanism?
- Why should a voice application
provider use a Java-based J2EE application server?
The answers to these
preliminary questions are the biggest reason to
adopt the Oracle platform. An Internet-based application
server brings standards into a space where proprietary
hardware and software have long reigned. By leveraging
the Internet model of software development, application
developers can streamline development and maintenance
by relying on third-party components, instead of
building their own custom data connectors and adapters.
Using a J2EE application
server is just an extension of this story - Java
has a huge development community behind it, with
tremendous amounts of pre-defined functionality
as a core component of the language. Indeed, from
a globalization perspective, Java truly shines as
it has been designed from the ground up to support
globalization.
So, given these answers
to the preliminary questions, we can return to the
original question about why to choose Oracle’s
platform over any others. The answer to this is
two-fold. First, Oracle ships more pre-packaged
voice applications than any other company. With
our globalization support, we consequently ship
more applications in more languages than anyone.
This is a very powerful set of functionality to
be included as part of the application server offering,
and these pre-packaged applications can be incorporated
and modified by the end client. Our CEO, Larry Ellison,
has mandated that all Oracle applications offer
voice-enabled interfaces, so our entire applications
division is leveraging the wireless and voice capabilities
of the application server to create voice-enabled
versions of our CRM and ERP applications.
Secondly, for application
developers who plan on writing their own applications,
the Oracle platform provides the most complete application
delivery platform with the most development effort
invested in voice gateway platform interoperability.
Other application server platforms typically only
work with a couple of voice gateway platforms -
Oracle is committed to being completely platform-agnostic
and working with all VoiceXML gateways that go through
our acceptance process. And from the standpoint
of globalization, Oracle includes all elements of
the Voice Globalization Framework for use by third-party
developers so that they can realize the same efficiency
gains that Oracle achieves internally when developing
a global software offering.
How
did the Voice Laboratory at Oracle come about?
Voice technology
support in Oracle9iAS Wireless is the result
of Oracle’s Voice Laboratory, which was formed
in June 2000 as part of Oracle’s Wireless
and Voice Division’s mandate to provide leadership
in speech technology, application development tools
and speech applications.
In speech technology,
the Voice Laboratory is Oracle’s resource
for the evaluation and certification of voice gateways
and other technology components that are used for
internal corporate applications, hosted applications
and for customers of Oracle9iAS. In application
development tools, the Voice Laboratory provides
service markup languages and translation tools for
the development of speech-controlled services. These
tools have been developed to aid in the broader
context of developing mobile services for all types
of wireless devices as part of Oracle9iAS.
Finally, the Voice Laboratory is creating a host
of reference standard speech services for business
applications.
Curtis
Tuckey is Director of the Voice
Laboratory at Oracle Corporation. Before joining
Oracle, he held various research and development
positions at Motorola, Lucent Technologies, AT&T
and General Motors. He holds a Ph.D. in mathematics
from the University of Wisconsin and can be reached
at curtis.tuckey@oracle.com.
Ashish
Vora, is a Senior Speech Applications
Engineer in the Voice Laboratory at Oracle Corporation.
He has developed a set of voice applications that
ship with Oracle9i Application Server Wireless &
Voice, co-authored an integration and acceptance
process for voice gateway vendors and created an
architecture to simplify the globalization of voice
applications. He holds a B.S. degree in Computer
Science from Stanford University and can be reached
at ashish.vora@oracle.com.
Reprinted
by permission from the Globalization Insider,
4 June 2003, Volume XII, Issue 2.5.
Copyright
the Localization Industry Standards Association
(Globalization Insider: www.localization.org,
LISA: www.lisa.org)
and S.M.P. Marketing Sarl (SMP) 2004
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice
counts!