The binoculars in this phrase from a machine–translated
French article are named Barbara and Jenna. It seems the French
word “jumelles” can mean both “twin girls” and “binoculars.”
I can’t cite a source – this may be a translator’s urban legend.
But even if it isn’t authentic, such an absurdity will certainly
seem plausible to most users of conventional machine translation.
Linguistic skeptics like Alan
Melby have explained why a computer can’t translate
like a person. Developers of machine translation (MT) have
largely conceded that point and focused on developing systems
that provide rough information (“gisting”). But if MT is
still unable to make a contextual distinction between two
teenagers and an optical instrument, can it ever really
be useful?
It turns out that it can, but only if it ceases
to be purely machine translation. It’s as if developers
said, “You know, maybe Melby has a point. Maybe language
is just too complex for us to ever develop good enough rules
to make machine translation really useful for general purposes.
But can we use computers to capitalize on the linguistic
creativity of human beings?” This was the starting point
of academic research into what was initially known as example–based
machine translation (EBMT). Rather than struggle to make
ever more complex rules systems for analyzing the source
language and transforming it back into the target language
with the aid of a dictionary, why not use a computer to
sift through a corpus of human translations and pick the
best matches for a given sentence to be translated?
The EBMT approach did not make the jump from
research institutions to practical application in its first
incarnation. It was just too difficult for a machine in
those days to come up with enough examples and analyze them
sufficiently – unless the “machine” was a human being. If
a computer could show a person examples of how a sentence
(or very similar sentence) had been previously translated,
the person could use his own linguistic skills to choose
the best example, or modify it to fit the new sentence.
This approach, now known as translation memory (TM), has
revolutionized the field of translation in the past ten
years.
TM can speed up the translation process and
enhance consistency with minimal loss of quality, if used
correctly. However, its speed is still limited by the length
of time a person can work with full attentiveness, and its
quality by the skill of the translator. And because a TM
system has no linguistic intelligence of its own, it only
works at all if there is a human translator available for
the desired language pair – a real problem for many languages.
Finally, a TM system breaks the source text into segments
– usually entire sentences – and checks them against the
existing translated segments in the memory. It is generally
not able to compare smaller phrases inside one segment to
phrases of other segments and suggest translations. Researchers,
such as those at the French firm Lingua
et Machine (developers of the Similis TM system),
are working on “second–generation” TM systems. The T! AUS
(Translation Automation Users Society) has begun referring
to early commercial tools in this area as “Advanced Leveraging”.
Even as these early tools are hitting the market, though,
they may be superseded by more powerful technology.
The enormous speed of modern massively parallel
computing, combined with the staggering amount of translated
content now available on millions of websites, has revived
the seemingly lost cause of EBMT, in a much more sophisticated
form referred to as Statistically
Based Machine Translation (SMT). The huge advantage
of the SMT method is that the machine no longer has to “know”
the context to decide, for instance, what “jumelles” means.
It analyzes the collective wisdom of a huge database of
human translations, assesses the probabilities of the alternative
translations and incorporates the most likely candidate
into the translation. It’s a fairly safe bet that, with
a sufficiently large corpus of examples to analyze, the
statistical process would generate the correct translation
of “jumelles,” because there are a lot more sentences in
the real world that refer to 18–year old twi! ns than to
18–year–old binoculars.
If SMT technology lives up to its auspicious
beginnings, it may have sweeping effects on the language
industry, not least on McElroy Translation. Executive strategy
at McElroy embraces the potential of this dramatic MT technology
advancement. As was the case with translation memory technology,
the inclusion of SMT will open up new types of translation
work that were never before feasible. For some large localization
projects for instance, a judicious mixture of TM and MT
(“MTM”) can lead to reduced cycle times and greater productivity.
These are exciting times for the translation industry. The
best part is that instead of threatening the value of human
translators, these new technologies increase it.