The Rise of the Machine! (Machine Translation, that is)
One of the original anticipated uses of computers was machine translation. By Machine Translation (MT), we mean the automation of the translation process. As early as the 1950’s a primitive experiment of translating sixty Russian sentences into English was deemed a success and resulted in a period of significant funding for research which through various ebbs and flows continues to this day. It was thought that with this new technology ‘oldstyle’ human translations would soon become obsolete. However, for several reasons this never happened and ‘old-style’ translation practices are still dominant in the localization industry.
The difficulties which restricted the development of MT are that natural language is essentially still too complicated to be broken down into a finite rule set. This is especially true for synonyms (e.g. in an example provided by Tony Hartley, ‘light’ in English could mean the following in French: ‘allumer’, ‘lumière’, ‘léger’, ‘clair’) and idioms.
Moreover, even if the software routines can maintain the meaning they will not be able to deliver a natural easy reading style. For example take this sentence:
Ces exemples ont été sélectionnés dans un grand nombre de sources différentes.
This was translated through a well know MT application as:
These examples were selected in a great different number of sources.
This is mostly understandable, but it’s not attractive or indeed correct English. When translated by a human it is rendered: “The examples have been selected from a wide variety of different sources”.
However the promise and allure of MT refuses to disappear and with many positive changes and opportunities facing the industry at large the stage is once again putting out a call for MT to be a main player in global content production. This means that it is also time for the localization community, buyers and sellers alike, to re-engage in the technology and understand exactly what application and benefit can be extracted. If proof of this necessity is needed then look to TAUS (the Translation Automation User Society) and it incredibly impressive list of members and you will have no doubt about why you need to understand the current state of the technology and its implications.
Why the sudden increase in interest?
The current resurgence in popular interest is largely due to very good timing. As with many of histories great shifts in thought and process, a number of different developments have all come together at once to create opportunity for all. An overview of the biggest of these factors is given below:
This final point on ‘better quality’ needs further clarification, as an understanding of this discussion is central to the future adoption of MT and underlies most of the reasons that there is such sudden interest in the technology. The reason for MT generally not being considered good enough in the past was because it was felt that the ‘quality’ was not high enough to be useful and often this was done in comparison to human translation as if one would replace the other at the touch of a button.
However, the question is now being asked, what exactly is ‘quality’? Perhaps most clearly for our consideration, is the question whether quality is more important than usability. If a Machine Translated knowledge base can solve 10,000 queries a quarter, regardless of grammatical awkwardness then many want to argue the quality is ‘good enough’ to justify the expense and time invested.
Arguably therefore, the current question of quality is not so much human translation versus machine translation— it is machine translation versus no translation at all. When considered from this angle the future is very promising indeed for the technology.
How Machine Translation Works
So with a focus and understanding of the goals of the current MT movement we should turn our attention to a look at the different types of tools and systems available. There are two main methods for machine translation, Rules Based Machine Translation (RBMT) and Statistics based (EBMT and SBMT).
Rules-Based Machine Translation (Transfer-based MT)
Rules based techniques rely on using linguistic rules for the source and target languages and trying to translate the meaning through dictionaries and grammatical understanding. This technique uses aspects of artificial intelligence.
The system starts with checking the terms against a dictionary and parsing (grammatical analysis) the source. The parsing text will break sentences into its parts, assigning a grammatical function (such as subject, predicate or object) to each word in the sentence. This parsing produces better results than simple word-by-word transfer as it makes better use of sentence context and can provide translations that follow the grammar of the target language.
After the sentence has been parsed, a series of rules are used to reorder words and otherwise alter the structure of the incoming sentence to produce a translation that is grammatically correct for the target language.
Finally, a generation stage produces inflections, contractions, etc. for the target language words. Transferbased systems have to be designed for specific language pairs, so new sets of rules must be created for each language pair and each direction of translation.
Statistical Machine Translation (Data-Driven MT)
The second main method is statistical. This method uses a body of pre-existing translation and then compares source strings to see if an existing translation exists. Consisting of EBMT (Example-Based Machine Translation: extraction of phrases for recombination) and SBMT (Statistics-Based Machine Translation: statistical translation model, based on word frequency). These systems analyze a large number of previously created bilingual sentence pairs to establish which words or expressions in one language are most frequently matched with words or expressions in the other.
This approach offers the potential of much faster development of an MT system and can take advantage of the large translation memory databases that many organizations have built up.
The drawback of this method is the difficulty in building a reliable corpus of translation. For example this is the main technique used by Google for their online translation service and they have had to input as much as 200,000,000,000 words of translation to improve the service provided.
The Hybrid system
Most machine technologies use a combination of the above two techniques (RBMT coupled with EBMT/SBMT plus standard translation memories). A hybrid system implies several tools/processes, mainly RBMT coupled with EBMT/SBMT plus TMs. For example, to translate a given text EBMT/SBMT will be applied and the text remaining untranslated after this process will be translated using RBMT.
A Proposed Process
The following is a potential process for using machine translation:
There are four options for integrating this into a company’s processes:
It is commonly agreed that the quality coming from MT ranges from poor to average. In any case, prior to any project, the notion of expected quality must be clearly defined with the client.
Pre and Post-Editing Process
Machine Translation is most attractive when used for short lived text such as Online News, Knowledge Bases, and anything that can be obtained freely. As soon as something incurs a cost then the question will be asked what the client is getting for his/her money. To get Machine translated raw text to an acceptable quality pre-editing of the source and post-editing is required and these expenses needs to be justified.
To get a reasonable level of quality, you need pre-edited files (by writers who know their output will be machine translated), an effective MT software package, and considerable (but variable) amount of work in post-editing using linguistic reviewers.
Machine Translation post-editing is a new task in the localization field that requires slightly different skills than for traditional translation. The post-editing localizer has to:
To the future!
So with an understanding of the aims and potential applications of MT and armed with knowledge of the different types available, the next question becomes what next? It is a question that the whole industry is struggling with right now. Everyone can see the benefit and application, but the exact delivery and business model is still up for debate.
The larger organizations such as Cisco and Microsoft are applying this technology not only on MT WARNING stamped FAQ content but within their general content creation process to great effect. However there is still significant pain and the business model is not yet ‘written in stone’. There is room from improvement and these organizations are active participants in the TAUS forum.
Google to has invested millions and has a lot of content and product capability to show for it, but still no distinct application business model.
Many of these are discussed in more detail in the full version of this whitepaper found at www.jonckers.com/en/whitepapers . Indeed there is much research going on right now to push the content capability at this end of the localization business, but it is equally true that a huge opportunity exists that has still yet to find an answer. Everyday more content piles up – locked up in its source language waiting to be discovered. Someone will find a way to unlock its meaning, could it be you?
Whatever the final answer is, one thing at this point is beyond question MT will have a role in solving the global content conundrum.
Jonckers, Localization Provider of the Year 2006, is focused on delivering software, eLearning, and multimedia localization services to the world’s best companies. Jonckers achieves localization excellence through an ERP controlled global network of wholly owned offices spanning Asia, Europe and the US allowing Jonckers to deliver low cost global resources without sacrificing quality. For more information please visit www.jonckers.com.
Jonckers Contact Information
News Magazine - www.clientsidenews.com
Please see some ads as well as other content from TranslationDirectory.com: