Globalization of Voice Applications: It’s Only the Beginning!

Globalizing software (creating software for multiple languages and locales), and the follow-on process of localization, is challenging enough for “normal” software products and not-too-complex web sites. However, when it comes to one of the “newest kids on the block,” voice-enabled applications, the fun really begins.

There are only a handful of voice technology providers who have attempted to create globalized solutions, and Oracle Corporation is one of them. Recently, LISA interviewed Curtis Tuckey, Director, and Ashish Vora, Senior Speech Applications Engineer, at Oracle’s Voice Laboratory in Chicago in the U.S., to gain insight into their vision for voice application globalization. In installment one, the two men outline Oracle’s voice applications strategy, as well as the business and technical challenges that lie ahead. In the second installment, to be published in a future issue of the Globalization Insider, they will:

outline current trends in voice applications standards;
describe the very real challenges presented by voice application globalization;
and provide recommendations for content creators and localization vendors who are preparing to become preferred service providers to voice applications developers.

If you would like to meet Curtis Tuckey or Ashish Vora in person to increase your knowledge of voice-enabled applications, plan to attend their presentations at the LISA FORUM EUROPE: “Managing Content - Moving Markets: Streamlining Global Workflow Through Content Management,” to be held in London from June 30-July 3, 2003.

Please define the voice applications space for our readers.

Most broadly defined, the voice applications space deals with any type of application that incorporates a speech-based interface for input and output of application data. Traditionally, most speech application development (in industry parlance, these were interactive voice response or IVR applications) required proprietary hardware and software solutions. There was often little to no interoperability between different speech technology infrastructure vendors. As a result, speech application deployments were extremely capital-intensive, as well as somewhat monolithic in their structure.

The emergence of the Internet, as well as several efforts to create standards within the speech technology industry, has significantly lowered the barriers to entry for speech application developers. Indeed, the biggest paradigm shift occurred when the speech technology industry embraced the Internet model of software development, creating a markup language called VoiceXML to speed development of voice-enabled applications. VoiceXML is intended to replace many of the proprietary software environments that were originally used to write speech applications.

Leveraging the Internet model has proved to have many benefits - foremost among these is the opportunity to make use of components and technologies that have already been established as cornerstones of the visual Internet. Writing voice applications with complex connectivity to various backend data stores is significantly simplified by being able to use open connectivity standards such as ODBC/JDBC or LDAP. In the past, any speech development requiring access to these kinds of data stores would have required a custom integration effort.

Similarly, the Internet helped foster new mechanisms of application delivery - namely the invention of web servers, and their more powerful siblings, application servers. This innovation has also been crucial to the growth of the new generation of voice applications. By relying on application servers to handle many of the details of application load balancing, caching, authentication, access, etc., speech technology vendors can focus their efforts on the core speech recognition and synthesis components.

There are several ways to define the roles in this space. In the most generic sense, there are four main actors that interact with the voice application development and delivery processes. The most low-level of these is (1) the core voice technology vendors that build the actual Automated Speech Recognition (ASR) and Text-to-Speech Synthesis (TTS) software packages. These core components are utilized by the second actor, (2) the speech platform provider, or “voice gateway” provider, who integrates the ASR and TTS components with additional pieces of hardware and software, such as a Computer Telephony Interface (CTI) and a VoiceXML interpreter. The third actor in the speech technology world is (3) the application platform provider, who provides the application server from which to deliver applications. The final actor is (4) the application provider itself who writes VoiceXML applications deployed through the application server.

There are some different deployment scenarios for voice applications that correspond to a few additional roles in this space. Specifically, within the role of the fourth actor (application provider), there are actually three sub-categories:

1. The Pre-Packaged Application Space

This space represents the so-called “out-of-the-box” voice applications that can be deployed through an application server with minimal configuration work.

Hosted Application Space

The key difference between the Hosted and the Pre-Packaged spaces is that hosted application deployment runs on an ASP model, and therefore, the entire infrastructure for the voice application (hardware, software, etc.) is managed by the hosting provider.

Custom Applications Space

When customers need a custom application solution that cannot be provided by either the Pre-Packaged or Hosted Application spaces, they can work with custom applications developers to provide end-to-end voice solutions. Traditionally, these custom solutions have been offered by IVR vendors, but have the negative consequence of often creating vendor lock-in. As VoiceXML becomes more and more widely adopted, we expect this Custom Applications Space to become more cost-effective and more accessible to consultants not traditionally associated with voice application development.

In a nutshell, what is Oracle’s vision for the market for globalized voice-enabled applications?

The market for Oracle voice applications is exactly the same as the market for all other Oracle applications. We are a global company with offices in over 150 countries, and versions of our products ship in 30 languages. Voice-enabled applications in this respect are no different from any other type of application.

Why has Oracle made such a strong commitment to this space?

As far as Oracle is concerned, voice is simply another modality by which to access data or content that is stored in a database. There is all sorts of information that people need access to while they’re on the go. The bottom line is that there is content in a database, and we want to be able to deliver it to any type of device - be it a desktop web browser, a PDA, a WAP-enabled cell phone, or even a landline phone that only allows voice. It simply doesn’t make sense to have one application server for your web pages, another application delivery platform for PDAs and WAP phones, and a third platform for voice access. So, we designed a single platform that can deliver applications to all of these devices.

We see voice as a critical part of this offering because many of the wireless devices such as WAP phones, and to a lesser extent PDAs, suffer from serious shortcomings when it comes to ease of input and overall user experience. Voice is often a much more natural medium for applications because it can allow for simplification of user interactions which require many screentaps or button presses to complete on a visual device. Additionally, voice interfaces to applications allow for hands-free operation, which is essential for many types of applications - for example, in-vehicle applications requiring a hands-free mode to conform to safety laws. Finally, voice applications involve little or no network connectivity or processing power for end users. Indeed, since telephony networks (both wired and wireless) are nearly ubiquitous worldwide, it thus becomes possible to access Internet content even from places where no Internet connectivity is available.

What are the biggest opportunities in today’s voice applications market?

There are a tremendous number of exciting opportunities in the voice applications market today:

Call Center Automation (Self-service Call Centers)
Enterprise Mobility Applications (voice access to Personal Information Management, or PIM)
Voice-enabled messaging (Voicemail, Email, SMS, IM)
Voice dialing (Address Book + Corporate Directory + White Pages + Yellow Pages)
Collaborative Software
Pre-packaged Voice Applications

What are the most lucrative markets for small- to medium-sized voice applications developers?

There are four main markets in which we are seeing traction for voice applications. First is the horizontal Business Enterprise market - medium to large organizations with a need to voice enable access to corporate information like Personal Information Management (PIM) data such as calendars, email, etc. This market also includes applications such as CRM and ERP products.

The second major market is the horizontal Telecommunications Carrier market. Telco carriers use voice-enabled applications to differentiate their services from one another and to help attract and retain customers. Examples of successful voice applications deployed in this market include voice-enabled customer care applications (a form of Call Center Automation), as well as voice portals.

Third, we see great potential for growth in voice application adoption in the vertical Public Sector - specifically, government, education and health. Potential applications here include access to PIM data, voice notifications for convenience, healthy, safety, and homeland security, as well as specialized applications that make use of biometrics such as voice authentication for security purposes.

Finally, there is the Consumer Market for voice applications - this market includes services such as voice access to news, weather, horoscopes, movie times, etc. Typically, these applications are bundled together as part of a consumer-oriented voice portal. Numerous companies have offerings in this space, including HeyAnita and BeVocal.

Of these four spaces, Oracle is focusing its major efforts on creating solutions for the first three markets.

Do these opportunities vary in any significant way by geographic region (the Americas vs. EMEA vs. China vs. the Rest of Asia/Pacific) or by vertical market?

In general, the opportunity for voice access to enterprise data and PIM, and self-service call centers is the same in the Americas, EMEA and APAC due to the increased productivity and fast ROI that voice applications offer. Certainly there are some differences in the types of applications that may make sense for a particular region, e.g., EMEA and APAC currently do much of their messaging via SMS so requirements for voice-enabled messaging in those regions may be quite different from the requirements in North America.

What are the top three language markets for localized voice apps? Is it different from the GUI market?

The language markets for localized voice applications are greatly influenced by the availability of the core speech technology components (ASR and TTS) in the target languages. Hence, we see the top language markets currently as:

U.S. English
Latin American Spanish
Brazilian Portuguese

This is likely to be somewhat different from the GUI market for two reasons. First, as already mentioned, the lack of core speech technology components in a particular language makes deployment to certain languages impossible. GUI applications simply do not have this problem to deal with. Second, we expect to see more adoption of voice and wireless applications in countries with poor wired infrastructure, since it is these areas that serve to benefit the most from voice-based access to information.

What business advice would you give to voice application providers as they begin to design a new voice-enabled product?

The most important piece of advice is to start early! Globalization activities really have to be taken into account from day one of the design process, before the first line of code is ever written. Ideally, the development process should allot enough time not just for the design phase, but also for a feedback phase on these designs that allows for focus groups to interact and provide comments.

It is very important to have access to resources knowledgeable in the target languages to steer designs away from concepts that are difficult to internationalize. As a concrete example of this, we sometimes make use of a spelling interface that allows users to enter their input one character at a time - this works fine for English, but for other languages (particularly the Asiatic languages), we will probably not be able to implement this type of interface and are therefore trying to minimize its usage.

The rest of our advice for voice application providers is actually quite simple, and in some ways can be applied to any type of development effort. However, we really believe that these guidelines are especially critical to keep in mind for voice application development:

Make sure there is a market for the product being designed.
Ensure that the requirements for the product are being adequately captured and communicated back to the development team.
Accept that voice application development is more complicated than screen-based development. Allocate time, as necessary, to adjust to the learning curve associated with these applications.

What are the biggest business hurdles for voice application developers today?

The main business hurdles come from managing people’s perceptions and expectations about voice. There is a perception that voice applications are hard to deploy and prohibitively expensive.

In fact, voice applications are not substantially harder to deploy than any other type of application and the expense of voice applications actually becomes significantly less when their true ROI is considered.

The final business hurdle overlaps with the world of design and development, and that is that designing a usable and enjoyable voice application is very difficult. It is much easier to design a bad user interface to an application than to put the effort and time into designing a voice interface well. From a business perspective, it’s therefore necessary to be able to manage customer expectations when it comes to application functionality and delivery schedules.

What was the effort associated with developing Oracle’s Voice Globalization Framework? How much time is this framework expected to save developers?

The development of the Voice Globalization Framework was able to build upon much of the groundwork that had already been done within Oracle to create support for globalization. Specifically, Oracle has several groups that have been defining guidelines and processes for globalization, including a group called Server Globalization Technologies (SGT) that provides internationalization support for core technology components and the Worldwide Product Translation Group (WPTG), Oracle’s localization group. Because much of the infrastructure was already in place, development of the Voice Globalization Framework was simplified somewhat, and the components for voice support were added over the course of about twelve months with several engineers involved in design and implementation.

This investment is considered worthwhile, however, because of the time and money we expect to save creating globalized voice applications. Essentially, even without the framework in place, it would be possible to create globalized applications in multiple languages, but it would require close to a complete rewrite of each application for each language. This becomes cost-prohibitive fairly quickly. Our voice globalization framework offers the possibility that at the same time that one designs and develops a voice application in one language, it can be simultaneously delivered in thirty other languages with marginal extra effort per language.

To express it in concrete terms, writing a voice application takes x resources (time, money, developers) and having to rewrite an application 31 times would take 31x resources. On the other hand, if we have a framework that allows our extra effort per language to be about 10% of the original application development resources (this is a conservative estimate, in practice we think we may see only 5% or less of original development resources per language), then our total outlay is one application with full x resources plus thirty applications each of which cost 0.1x. Our total effort then for delivering 31 language versions of an application is 4x, a nearly 90% reduction in overall development effort!

What are the main business and technical reasons why voice application providers should adopt Oracle’s platform?

Before asking this question, voice application providers need to ask and understand the answers to the following questions.

Why should a voice application provider use an Internet-based application server versus some other kind of delivery mechanism?
Why should a voice application provider use a Java-based J2EE application server?

The answers to these preliminary questions are the biggest reason to adopt the Oracle platform. An Internet-based application server brings standards into a space where proprietary hardware and software have long reigned. By leveraging the Internet model of software development, application developers can streamline development and maintenance by relying on third-party components, instead of building their own custom data connectors and adapters.

Using a J2EE application server is just an extension of this story - Java has a huge development community behind it, with tremendous amounts of pre-defined functionality as a core component of the language. Indeed, from a globalization perspective, Java truly shines as it has been designed from the ground up to support globalization.

So, given these answers to the preliminary questions, we can return to the original question about why to choose Oracle’s platform over any others. The answer to this is two-fold. First, Oracle ships more pre-packaged voice applications than any other company. With our globalization support, we consequently ship more applications in more languages than anyone. This is a very powerful set of functionality to be included as part of the application server offering, and these pre-packaged applications can be incorporated and modified by the end client. Our CEO, Larry Ellison, has mandated that all Oracle applications offer voice-enabled interfaces, so our entire applications division is leveraging the wireless and voice capabilities of the application server to create voice-enabled versions of our CRM and ERP applications.

Secondly, for application developers who plan on writing their own applications, the Oracle platform provides the most complete application delivery platform with the most development effort invested in voice gateway platform interoperability. Other application server platforms typically only work with a couple of voice gateway platforms - Oracle is committed to being completely platform-agnostic and working with all VoiceXML gateways that go through our acceptance process. And from the standpoint of globalization, Oracle includes all elements of the Voice Globalization Framework for use by third-party developers so that they can realize the same efficiency gains that Oracle achieves internally when developing a global software offering.

How did the Voice Laboratory at Oracle come about?

Voice technology support in Oracle9iAS Wireless is the result of Oracle’s Voice Laboratory, which was formed in June 2000 as part of Oracle’s Wireless and Voice Division’s mandate to provide leadership in speech technology, application development tools and speech applications.

In speech technology, the Voice Laboratory is Oracle’s resource for the evaluation and certification of voice gateways and other technology components that are used for internal corporate applications, hosted applications and for customers of Oracle9iAS. In application development tools, the Voice Laboratory provides service markup languages and translation tools for the development of speech-controlled services. These tools have been developed to aid in the broader context of developing mobile services for all types of wireless devices as part of Oracle9iAS. Finally, the Voice Laboratory is creating a host of reference standard speech services for business applications.

Curtis Tuckey is Director of the Voice Laboratory at Oracle Corporation. Before joining Oracle, he held various research and development positions at Motorola, Lucent Technologies, AT&T and General Motors. He holds a Ph.D. in mathematics from the University of Wisconsin and can be reached at curtis.tuckey@oracle.com.

Ashish Vora, is a Senior Speech Applications Engineer in the Voice Laboratory at Oracle Corporation. He has developed a set of voice applications that ship with Oracle9i Application Server Wireless & Voice, co-authored an integration and acceptance process for voice gateway vendors and created an architecture to simplify the globalization of voice applications. He holds a B.S. degree in Computer Science from Stanford University and can be reached at ashish.vora@oracle.com.

Reprinted by permission from the Globalization Insider,
4 June 2003, Volume XII, Issue 2.5.
Copyright the Localization Industry Standards Association
(Globalization Insider: www.localization.org, LISA: www.lisa.org)
and S.M.P. Marketing Sarl (SMP) 2004

Submit your article!