Transfer-based machine translation
By Wikipedia,
the free encyclopedia,
http://en.wikipedia.org/wiki/Transfer-based_machine_translation
Get the List of 5,400+ Translation Agencies Now! No Recurring Membership Fees!
Transfer-based machine translation is a type of
machine
translation, it is based on the idea of interlingua
and is currently one of the most widely used methods of
machine translation.
Overview
Both transfer-based and interlingua-based machine translation
have the same idea: to make a translation it is necessary
to have an intermediate representation that captures the
"meaning" of the original sentence in order to generate
the correct translation. In interlingua-based MT this intermediate
representation must be independent of the languages in question,
whereas in transfer-based MT, it has some dependence on
the language pair involved.
The way in which transfer-based machine translation systems
work varies substantially, but in general they follow the
same pattern: they apply sets of linguistic rules which
are defined as correspondences between the structure of
the source language and that of the target language. The
first stage involves analysing the input text for morphology
and syntax
(and sometimes semantics)
to create an internal representation. The translation is
generated from this representation using both bilingual
dictionaries and grammatical rules.
It is possible with this translation strategy to obtain
fairly high quality translations, with accuracy in the region
of 90% (although this is highly dependent on the language
pair in question — for example the distance between the
two).
How it works
In a rule-based machine translation system the original
text is first analysed morphologically and syntactically
in order to obtain a syntactic representation. This representation
can then be refined to a more abstract level putting emphasis
on the parts relevant for translation and ignoring other
types of information. The transfer process then converts
this final representation (still in the original language)
to a representation of the same level of abstraction in
the target language. These two representations are referred
to as "intermediate" representations. From the target language
representation, the stages are then applied in reverse.
Analysis and transformation
Various methods of analysis and transformation can be used
before obtaining the final result. Along with these statistical
approaches may be augmented generating hybrid systems. The
methods which are chosen and the emphasis depends largely
on the design of the system, however, most systems include
at least the following stages:
- Morphological analysis. Surface forms of the
input text are classified as to part-of-speech (e.g. noun,
verb, etc.) and sub-category (number, gender, tense, etc.)
All of the possible "analyses" for each surface form are
typically outputted at this stage, along with the lemma
of the word.
- Lexical categorisation. In any given text some
of the words may have more than one meaning,
causing ambiguity
in analysis. Lexical categorisation looks at the context
of a word to try and determine the correct meaning in
the context of the input. This can involve part-of-speech
tagging and word
sense disambiguation.
- Lexical transfer. This is basically dictionary
translation, the source language lemma (perhaps with sense
information) is looked up in a bilingual dictionary and
the translation is chosen.
- Structural transfer. While the previous stages
deal with words, this stage deals with larger constituents,
for example phrases
and chunks.
Typical features of this stage include concordance of
gender and number, and re-ordering of words or phrases.
- Morphological generation. From the output of
the structural transfer stage, the target language surface
forms are generated.
Transfer types
One of the main features of transfer based machine translation
systems is a phase that "transfers" an intermediate representation
of the text in the original language to an intermediate
representation of text in the target language. This can
work at one of two levels of linguistic analysis , or somewhere
in between. The levels are:
- Superficial transfer (or syntactic). This level
is characterised by transferring "syntactic structures"
between the source and target languages. It is suitable
for languages in the same family or of the same type,
for example in the Romance
languages between Spanish, Catalan, French, Italian,
etc.
- Deep transfer (or semantic). This level constructs
a semantic representation that is dependent on the source
language. This representation can consist of a series
of structures which represent the meaning. In these transfer
systems predicates are typically produced. The translation
also typically requires structural transfer. This level
is used to translate between more distantly related languages,
or languages which have no genetic relationship at all
(e.g. Spanish-English or Spanish-Basque, etc.)
- Statistical
machine translation
|