The Rise of the Machine! (Machine Translation, that is)
By Jonckers,
Localization Provider
http://www.jonckers.com/
Get the List of 4,500+ Translation Agencies Now! No Recurring Membership Fees!
One
of the original anticipated uses of computers was machine
translation. By Machine Translation (MT), we mean the automation
of the translation process. As early as the 1950’s a primitive
experiment of translating sixty Russian sentences into English
was deemed a success and resulted in a period of significant
funding for research which through various ebbs and flows
continues to this day. It was thought that with this new
technology ‘oldstyle’ human translations would soon become
obsolete. However, for several reasons this never happened
and ‘old-style’ translation practices are still dominant
in the localization industry.
The difficulties which restricted the development
of MT are that natural language is essentially still too
complicated to be broken down into a finite rule set. This
is especially true for synonyms (e.g. in an example provided
by Tony Hartley, ‘light’ in English could mean the following
in French: ‘allumer’, ‘lumière’, ‘léger’,
‘clair’) and idioms.
Moreover, even if the software routines
can maintain the meaning they will not be able to deliver
a natural easy reading style. For example take this sentence:
Ces exemples ont
été sélectionnés dans un grand
nombre de sources différentes.
This was translated through a well know
MT application as:
These examples were selected in a great
different number of sources.
This is mostly understandable, but it’s
not attractive or indeed correct English. When translated
by a human it is rendered: “The examples
have been selected from a wide variety of different sources”.
However the promise and allure of MT refuses
to disappear and with many positive changes and opportunities
facing the industry at large the stage is once again putting
out a call for MT to be a main player in global content
production. This means that it is also time for the localization
community, buyers and sellers alike, to re-engage in the
technology and understand exactly what application and benefit
can be extracted. If proof of this necessity is needed then
look to TAUS (the Translation Automation User Society) and
it incredibly impressive list of members and you will have
no doubt about why you need to understand the current state
of the technology and its implications.
Why the sudden increase in interest?
The current resurgence in popular interest
is largely due to very good timing. As with many of histories
great shifts in thought and process, a number of different
developments have all come together at once to create opportunity
for all. An overview of the biggest of these factors is
given below:
• to be translated, linked to the growth
in current, and future value of the global marketplace.
• The increased value and importance of huge volumes of
user created content. Wiki groups, crowd sourcing, chat
and/or knowledge bases all help to support potential buyers
around the world.
• The competitive nature of the online and technology
industries means that anything that offers even a hint
of ‘faster time to market’ as machine translation surely
does is guaranteed consideration.
• The understanding and recognition of term control and
the demand and value of consistency for the valuation
of a Global Brand.
• The ever present need for tactical costs-savings.
• The development of the market to the point of real global
competition means product alone is no longer a competitive
offering – driving more content demands than ever, often
with very little incremental budget increases available.
• Better technical output quality and rules complexity
from the MT tools Massive statistical and analytical capabilities
of computers
• The emergence of ‘necessary quality’ rather than absolute
quality
This final point on ‘better quality’ needs
further clarification, as an understanding of this discussion
is central to the future adoption of MT and underlies most
of the reasons that there is such sudden interest in the
technology. The reason for MT generally not being considered
good enough in the past was because it was felt that the
‘quality’ was not high enough to be useful and often this
was done in comparison to human translation as if one would
replace the other at the touch of a button.
However, the question is now being asked,
what exactly is ‘quality’? Perhaps most clearly for our
consideration, is the question whether quality is more important
than usability. If a Machine Translated knowledge base can
solve 10,000 queries a quarter, regardless of grammatical
awkwardness then many want to argue the quality is ‘good
enough’ to justify the expense and time invested.
Arguably therefore, the current question
of quality is not so much human translation versus machine
translation— it is machine translation versus no translation
at all. When considered from this angle the future is very
promising indeed for the technology.
How Machine Translation Works
So with a focus and understanding of the
goals of the current MT movement we should turn our attention
to a look at the different types of tools and systems available.
There are two main methods for machine translation, Rules
Based Machine Translation (RBMT) and Statistics based (EBMT
and SBMT).
Rules-Based Machine Translation
(Transfer-based MT)
Rules based techniques rely on using linguistic
rules for the source and target languages and trying to
translate the meaning through dictionaries and grammatical
understanding. This technique uses aspects of artificial
intelligence.
The system starts with checking the terms
against a dictionary and parsing (grammatical analysis)
the source. The parsing text will break sentences into its
parts, assigning a grammatical function (such as subject,
predicate or object) to each word in the sentence. This
parsing produces better results than simple word-by-word
transfer as it makes better use of sentence context and
can provide translations that follow the grammar of the
target language.
After the sentence has been parsed, a series
of rules are used to reorder words and otherwise alter the
structure of the incoming sentence to produce a translation
that is grammatically correct for the target language.
Finally, a generation stage produces inflections,
contractions, etc. for the target language words. Transferbased
systems have to be designed for specific language pairs,
so new sets of rules must be created for each language pair
and each direction of translation.
Statistical Machine Translation
(Data-Driven MT)
The second main method is statistical. This
method uses a body of pre-existing translation and then
compares source strings to see if an existing translation
exists. Consisting of EBMT (Example-Based Machine Translation:
extraction of phrases for recombination) and SBMT (Statistics-Based
Machine Translation: statistical translation model, based
on word frequency). These systems analyze a large number
of previously created bilingual sentence pairs to establish
which words or expressions in one language are most frequently
matched with words or expressions in the other.
This approach offers the potential of much
faster development of an MT system and can take advantage
of the large translation memory databases that many organizations
have built up.
The drawback of this method is the difficulty
in building a reliable corpus of translation. For example
this is the main technique used by Google for their online
translation service and they have had to input as much as
200,000,000,000 words of translation to improve the service
provided.
The Hybrid system
Most machine technologies use a combination
of the above two techniques (RBMT coupled with EBMT/SBMT
plus standard translation memories). A hybrid system implies
several tools/processes, mainly RBMT coupled with EBMT/SBMT
plus TMs. For example, to translate a given text EBMT/SBMT
will be applied and the text remaining untranslated after
this process will be translated using RBMT.
A Proposed Process
The following is a potential process for
using machine translation:
• an optional stage of creation of dictionaries
and translations memories;
• processing the translation through the machine translation
engine;
• a third optional stage of post-editing the files to
ensure desired quality.
There are four options for integrating this
into a company’s processes:
1. The client does the machine translation
(sets up the dictionaries and processes the files) and
the L10N vendor checks the translation to ensure that
translation is at the required quality (normally called
post-editing).
2. L10N Company takes the source files
from the customer and process the files and post-edits
the files themselves.
3. L10N Company takes the source files
from the customer and out-sources the machine translation
and post-edits the files themselves.
4. L10N Company takes the source text
and machine translation and delivers the machine translated
files without human involvement in the text; the only
human involvement would be in the processing of the files.
It is commonly agreed that the quality coming
from MT ranges from poor to average. In any case, prior
to any project, the notion of expected quality must be clearly
defined with the client.
Pre and Post-Editing Process
Machine Translation is most attractive when
used for short lived text such as Online News, Knowledge
Bases, and anything that can be obtained freely. As soon
as something incurs a cost then the question will be asked
what the client is getting for his/her money. To get Machine
translated raw text to an acceptable quality pre-editing
of the source and post-editing is required and these expenses
needs to be justified.
To get a reasonable level of quality, you
need pre-edited files (by writers who know their output
will be machine translated), an effective MT software package,
and considerable (but variable) amount of work in post-editing
using linguistic reviewers.
Machine Translation post-editing is a new
task in the localization field that requires slightly different
skills than for traditional translation. The post-editing
localizer has to:
a) Compare the source material with the
raw machine translation and decide to what degree the
translation is useable.
b) Make corrections, which could range
from minor improvements of grammar to complete re-writes.
To the future!
So with an understanding of the aims and
potential applications of MT and armed with knowledge of
the different types available, the next question becomes
what next? It is a question that the whole industry is struggling
with right now. Everyone can see the benefit and application,
but the exact delivery and business model is still up for
debate.
The larger organizations such as Cisco and
Microsoft are applying this technology not only on MT WARNING
stamped FAQ content but within their general content creation
process to great effect. However there is still significant
pain and the business model is not yet ‘written in stone’.
There is room from improvement and these organizations are
active participants in the TAUS forum.
Google to has invested millions and has
a lot of content and product capability to show for it,
but still no distinct application business model.
Many of these are discussed in more detail
in the full version of this whitepaper found at www.jonckers.com/en/whitepapers
. Indeed there is much research going on right now to push
the content capability at this end of the localization business,
but it is equally true that a huge opportunity exists that
has still yet to find an answer. Everyday more content piles
up – locked up in its source language waiting to be discovered.
Someone will find a way to unlock its meaning, could it
be you?
Whatever the final answer is, one thing
at this point is beyond question MT will have a role in
solving the global content conundrum.
About Jonckers
Jonckers, Localization Provider of the Year
2006, is focused on delivering software, eLearning, and
multimedia localization services to the world’s best companies.
Jonckers achieves localization excellence through an ERP
controlled global network of wholly owned offices spanning
Asia, Europe and the US allowing Jonckers to deliver low
cost global resources without sacrificing quality. For more
information please visit www.jonckers.com.
Jonckers Contact Information
Europe
Ian Butler
+353 866090384
email: ian.butler[at]jonckers.com
Asia Pacific
Sung Cho
+82-2-6627-3000
email: sung.cho[at]jonckers.com
USA
Roman Sofianos
+1 877 590 1927, ext 706
email: roman.sofianos[at]jonckers.com
ClientSide
News Magazine - www.clientsidenews.com
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!
|