Translation Technology for Sale: Buyer Beware!

A developer of translation systems himself, John Chandioux’s advice is, “Caveat emptor… buyer beware!” Since some translation tool developers have been forced to diversify their activities by turning to human translation to ensure their survival, Chandioux questions whether the market is too small, or if their tools are too limited to satisfy the existing market. Read on for an insider’s view of how translation technology is really sold.

If one were to ask the average company executive about translation, ten-to-one the answer would be, “It is slow and expensive.” In fact, translation is an intricate and time-consuming creative process. Today’s professional translator usually has a university degree and produces an average of 2000 words of translation per day, provided the terminology to be used is stabilized and readily accessible. In the case of new technologies, the translator is often the one who coins new words or expressions in the target language, thus adversely affecting his productivity.

People who master more than one language readily understand that language enables each culture to grasp and express reality in its own way. Yet the common perception of translators as perfectionists who drag their feet lingers on and is particularly widespread in monolingual circles. Less scrupulous marketers of translation technology have long capitalized on this perception to sell their products.

Machine Translation (MT)

Machine translation is as old as computers. The first MT system was developed at Georgetown University in 1958, and systems flourished due to the Cold War until the ALPAC report concluded in 1966 that fully automatic high-quality translation (FAHQT) was not feasible.

In the seventies, short of obtaining quality results, major MT players made inroads by selling the technology for its speed and purported monetary savings to the appropriate management level. This meant going over the head of linguistic services, who knew how complex translation really was, and selling it directly to upper management.

Government departments and major corporations entered into pilot projects one after the other, and the investments involved were so massive that no one wanted to admit defeat. I remember once asking the publications manager of a major corporation what he truly thought of the translation system he had brought into the company at the time. His reply was, “Keep this to yourself, John. It isn’t cost-effective, but I’m two years from retirement, and I don’t want to make waves.”

The aura of success was reinforced by classifying companies in the process of evaluating systems as satisfied customers. The Canadian Translation Bureau evaluated just about every major system on the market. When the evaluation of one system turned out to be less than positive, the company threatened the government with a lawsuit if it published the results, and the report conveniently disappeared. A few years later, a test of a system from another company concluded that its use was three times more expensive than human translation. Here too, the report was never seen again after it became known that industry experts had advised against the multi-million dollar trial. In other words, machine translation did not work, but no one dared say so out loud.

In the mid-eighties, MT had a revival when PCs became powerful enough to run programs initially designed for mainframes, helping to give birth to the concept of “disposable ware” or “shelfware.” These terms refer to software inexpensive enough for thousands of people to purchase it and to forget about it once they realize that it is useless. One of the major players, whose technology was caught in the Lernhout & Hauspie debacle, would freely admit in private that his translation package did little for translators. However, he saw no reason to stop selling it, as long as there were thousands of gullible customers willing to try it.

In the nineties, people were better informed, and MT rode the Internet wave in the guise of a tool to determine subject matter (also known as “gisting”). It is true that when you receive an email in a language that you do not understand at all, running it through one of the MT engines available on the Internet may, perhaps, with a good dose of luck, provide a rough idea of the subject matter. Unfortunately, some people also use these engines to send messages in their correspondent’s language, leading to some interesting misunderstandings and involuntary insults.

It can safely be said that the principal use, as confirmed and documented, of general application MT systems has been for intelligence-gathering purposes. The raw machine output is manually sorted, and texts deemed of possible interest are then translated by humans. The situation is slightly different with targeted systems which sometimes yield high-quality translations, as demonstrated by the METEO® system which has been used in Canada to translate weather forecasts since 1977. There have also been attempts at controlling the quality of the input in order to improve machine translation by creating sub-languages like Caterpillar English to reduce ambiguities in the source text.

The most amazing thing is that industry experts agree that there has been no significant breakthrough in MT over the past thirty years, except for a better understanding of its limitations.

Translation Memory (TM)

Translation tools grew in popularity when it became obvious that there was little to be hoped for from machine translation. This was, to a certain degree, an admission of defeat.

In these circumstances, translation memory (which is based on the assumption that translation is repetitive and that, by recording sentences and their translations, there is that much less work to be done the next time around) was an intellectually satisfying concept. In fact, documents are far less repetitive than one might think, and results are satisfactory only when updating technical documentation or translating the few highly repetitive texts that do exist. Users quickly learned that it was better to create one memory per project and that shorter translation units generated more hits.

This did not prevent clever sales people from convincing translation departments and agencies that they should not pay outsourcers for translations that had already been done. As a result, translators get stuck with a bum deal since they are expected to charge only for new text translated. Little consideration is given to the fact that the contents of the translation memory may be of poor quality, and that it is no small task to translate sentences in a manner consistent with those translated by someone else.

In addition, translators resent being forced to purchase expensive translation memories in order to land translation contracts. It is only recently that one software publisher has developed and included in its solution a translation module that its clients can make available at no charge to outsourcers. Needless to say, the popularity of the product is on the rise.

One of the key issues in the TM debate is that of the ownership of the resulting memory. A few years back, a Trados distributor in the U.S. was so bold as to claim that the contents of the memories belonged to the software manufacturer. Previously, this ploy was long used by MT suppliers, who claimed that the dictionaries compiled by potential clients during evaluation periods were their property, thus enabling them to move from one trial to another with larger and larger dictionaries. This argument did not fly with translation memories, but there is still considerable infighting between those who use the tools and their customers.

In many cases, the proof remains to be seen as to whether a translation memory is really superior to a scheme that pastes the contents of a well-controlled terminology database into the text to be translated. As a matter of fact, this latter technique is the one used by the LEXIUM® system to translate the Canadian Trademarks Journal.

It is interesting to note that, at the other end of the spectrum, translators claim that a TM is only cost-effective if lengthy portions are similar enough to allow the reuse of the matching translation. In other words, processing at the sentence level is being challenged in favor of working at both the term and paragraph levels.

Indexing Software

Indexing is an alternative to TM software that is used to locate previously translated texts archived on a computer hard disk.

Why should one purchase indexing software for the Windows® environment when Index Server technology is now a standard feature of all recent versions of Windows? In addition, Index Server does not have the drawback of being continuously out-of-synch with document formats created by various applications since the availability of filters is linked to Microsoft certification. It must also be said that the Microsoft technology goes much further than the products it is replacing. Index Server deals with complex linguistic problems, such as stemming in French, compound-noun segmentation in German and word boundary in Chinese. In addition, it indexes on-the-fly and creates a separate index for each language it detects.

One may argue that Index Server does not solve the problem of creating a bitext or of matching a document and its translation. The truth is that most bitext algorithms are statistically based and only work well when a document and its translation are identical in structure and contain the same number of sentences or paragraphs.

Thus, companies selling indexing engines are in the same situation as publishers offering standalone disk defragmenters and spelling checkers were a few years ago. They are hitting clients hard before being driven out of business by Microsoft.

In addition, with the increasing tendency to concentrate on large chunks of text since mere sentences are not worth the effort, once the desired text has been located, it is just a question of opening the translated document. Therefore, it turns out that the most effective translation tools are a good filing system for documents and their translations in electronic format, and the use of a wordprocessor to compare different versions of a given text.

Terminological Databases

True databases remain the essential component in the management of terminology that is, in itself, the backbone of cost-effective translation and communication. Many translation tool suppliers are either offering one as a standalone product or as a component of their flagship product. However, some large corporations are wary of proprietary data formats and prefer to go the route of well-known database engines to develop products in-house. This simplifies support and access to data by multiple applications. A few years ago, Nortel designed such a multilingual terminology solution using Oracle®.

When suppliers offer a terminology management system based on a well-known database technology, multinationals like Dow Chemicals are sufficiently reassured and adopt customized solutions that allow their terminology to be made available on intranets and in CD format to branch offices worldwide.

Some suppliers try to blur the issue by claiming to have databases when this is not the case. This tendency first appeared in Quebec, when the designer of several text-indexing products suggested that translators create terminology records with their wordprocessors and index them like any other document. This approach is cumbersome, unsafe and promotes anarchy in the creation of terminology records.

Reversing the Paradigm

Eventually, a number of translation tool developers were forced to diversify their activities and turn to human translation to ensure their survival. Was the market for their technology too small or was the usefulness of their tools too limited to satisfy the existing market?

We are now witnessing an opposite trend: human translation companies developing translation tools to obtain an edge over an ever-expanding and globally based competition. It is a well-known fact that the best programs are developed by people who need to fill their own needs.

Today, however, software is far more complex than in the early years of PCs. Service companies, like translation agencies that sell their time, are not accustomed to selling products at a fixed price with the technical support they require.

In addition, these new players are often preoccupied with protecting their traditional customer base for translation services. Recently, there was a rather heated debate on Yahoo when a Canadian translation agency, through its sister company, tried to protect its market by preventing translators who purchased its software from providing the output generated by the product to their own clients. The same company has since announced that it will index the cost of its newest product to the volume of text throughput. Potential customers are already in an uproar.

Let the Buyer Beware

You may think this is a sad state of affairs. Come to think of it, it is probably no worse than in any other field. Marketers are there to sell, regardless of the deficiencies of their products, and the customer is left to be the sole judge of his requirements.

Some sales tactics do verge on the fraudulent, however. For example, in France, there was the case of a company selling as raw MT output translations done manually by unpaid and untrained interns. This same company had also managed to cheat on a test performed by an independent research facility to win first place in a list of best-quality MT systems. When the results were validated independently, first place became last.

Other companies use innovative or aggressive selling techniques, like selling, albeit at a low cost, translations produced by MT systems that are available free of charge on the Internet. They try to recoup by then offering “human editing” of the MT output, at regular rates, to make up for the poor quality.

In the final analysis, the problem is that translation is a complex mental process. No decision to buy translation tools should be made without involving the people directly concerned. One of the favorite arguments of sales people is that translators should be taken out of the loop because they fear for their jobs. In the present state of the art, not only is there no threat, but the demand for translation is growing at such a rate that competent translators may never be able to retire.

It is easy to take your wishes for reality. If in doubt, just remember the old saying: “If it looks to good to be true, it probably is.”

John Chandioux is a software developer and President of John Chandioux Consultants, Inc. He is the inventor of the GramR programming language, the developer of the METEO and LEXIUM translation systems, and the designer of the EDITerm line of terminology management products. He can be reached at john@chandioux.com.

Reprinted by permission from the Globalization Insider,
1 July 2003, Volume XII, Issue 3.1.
Copyright the Localization Industry Standards Association
(Globalization Insider: www.localization.org, LISA: www.lisa.org)
and S.M.P. Marketing Sarl (SMP) 2004