The Interpretive
Model and Machine Translation
By
Mathieu Guidere
Master in Arabic language and literature and Ph.D in Translation Studies and Applied
Linguistics from the University of Paris-Sorbonne,
Lyon 2 University - France
Saint-Cyr Research Centre, France
mathieu.guidere@univ-lyon2.fr
http://perso.univ-lyon2.fr/~mguidere
Get the List of 4,400+ Translation Agencies Now! No Recurring Membership Fees!
For a long time, translation formed part of
linguistic studies (see G. MOUNINs works). However, during the last few decades, it
has been institutionally associated with Language Sciences, which represent a
vast and very dynamic field in which interdisciplinarity plays a key role.
This association
has led to the burgeoning of a translation science (traductology or translation studies)
within the field of Language Sciences which does not deal specifically with
translation but with translation operations and process, thus
reflecting the change in perspective adopted to approach the study object.
Our aim is to put
forward an epistemological analytical grid of the field in question i.e. the works
related to the analytical study of translation and its natural processing as a prelude to
machine translation or computer-assisted translation. However, delimiting a field requires
one or several perspectives in order to define its axes, issues, methods and aims.
Therefore, a
broad outline of the theoretical conflict between the issues of meaning and translation
will first be laid down. We will then explain how this conflict transcends logical
formalization. The aim of devising a theory is to set the translation pedagogy free from
the interpretive model. Finally, these issues will be reexamined in order to
reuse the data for natural language processing (machine translation and computer-assisted
translation).
To set up the
analytical grid, we will have recourse to three basic fields related to scientific
methodology: the observation field, the hypothesizing field and the validation field (see
Aurouxs works). The purpose here is not to compare the observed approaches or to
express any value judgment concerning them; but to tackle them from the natural language
processing perspective as a step prior to translation, because this perspective as well as
its implementation are part of an objective process, meaning that they merely
draw up assessments about specific data. In other words, we are first and foremost
concerned with the observation, well sustained by descriptions and validated data, so as
to put these works into perspective and to draw up a specialization field in the light of
the principles introduced below.
The
methodological choices concern the perspectives selected to analyze works on translation
in this study. These choices put the discipline at the crossroads of theoretical
linguistics and scientific empiricism, on the basis that the effects of a theory are
commensurate with the resulting application. For this reason, the study object,
translation, will be tackled in a descriptive way, i.e. as it is practiced and
evolves professionally. But this object must be defined by new analytical protocols and
imperatively has to move away from the prevailing interpretive models.
By taking these
postulates into consideration, we will only describe attested works (the corpora of
translated and published texts) and use what are regarded, in these works, as empirical
elements which can be subject to a corroborative or validation test. But this
aim does not rule out the possibility of making observations of a different type using
information not contained in our corpora and works.
The data for
analysis can be divided into three main categories. First of all, electronic texts
associated with observations. Secondly, a computerized system of hypotheses and
indications. Finally, validation applications relating hypotheses and linguistic data
arising from observation.
Our work is
essentially based on three tools: electronic texts grouped into machine-readable corpora,
work tools for observing and classifying linguistic data and corroborative tools to
validate the observation results.
Corpora used in
this study had to match Sinclairs terms. The observation of linguistic data should
lead to the constitution of a study object in accordance with a specific and sustained
extraction protocol. Results arising from the observation must be remarkable,
meaning that they should reveal high frequency usages and occurrences in the reference
corpus.
Consequently, our
attention was turned to real final works (texts, sentences, expressions, terms) and not to
practices relevant to language usage (speaking, writing, memorizing). The idea was that
these speech practices cannot be subject to the rigorous imperatives of data examination
and that only observed works allow the application of objective procedures. But this does
not mean at all that what is observed does not reveal what is happening in the
speakers mind[1].
This separation
between data and practices finds its counterpart in the field of
computer science, in the separation between the declarative and the
procedural[2].
For the moment, we have to decide which type of data must be observed and, specifically,
which phrases and terms are potential subjects for a systematic study of translation.
Until now, our
approach has been based on the empirically verified postulate that the corpus texts used
for examining data represent well-formed and subsequent phrases respecting specific
constraints, therefore allowing us to distinguish a discourse construction from an
anarchic set of phrases lacking coherence and consistency.
This starting
point is important because it puts a great deal of emphasis, in the observation and
analysis, on the significance of textual linguistics in comparison with theoretical and
general linguistics. This means that we are pursuing several objectives: first,
recognizing a text from a series of phrases with no logical or semantic link between them,
secondly, tagging the content of the text from a typological point of view (technical,
journalistic, etc.) and finally, classifying the information extracted according to a
previously defined protocol and linguistic criteria.
To achieve these
objectives, not only must an observation methodology be adopted but results should also be
expressed in appropriate language. Therefore, in a text, we must learn to observe, on the
one hand, phrases according to the three levels of analysis (morphological, semantic,
syntactic) and, on the other, relationships between phrases according to the discourse
type (argumentative model or textual anaphora).
Once the
methodology has been adopted, some work hypotheses can be made while referring to three
main axes: firstly, the type of formalism used, secondly, the linguistic extension or
portability[3]
and finally the aim or objectives of the analysis.
Concerning the first axis, the choice was made to
make the results more explicit while setting up hypotheses in a form which could be
computerized, i.e. likely to be represented by an algorithm and read by a machine.
This is the peculiarity of formalization[4]
that we wanted to be specific to the constraints of the translation process.
In this respect,
there are two ways of formalizing linguistic data: one is totally independent
from the computerized tool which processes data afterwards and uses explicit instructions
in the form of standard rules; the other is based on the formal possibilities of
machine-readable algorithms to represent the linguistic information. However, both ways
are often so complementary that we should start with the first way before tackling the
second. In both cases, machine-readable linguistic formalisms are obtained at the end of
the procedure.
Concerning the
second axis, we decided to choose, as a starting point, a source text (ST) and, as a
finishing point, a target text (TT) in order to examine, in a contrastive way, their
interactions through a range of structures of varying complexity which needed to be
described and extracted. Once the structure is applied to the ST, in accordance with a
specific protocol, it is simply searched for and validated in the TT. Hence, this is a
source-oriented point of view of the linguistic extension.
It should be
mentioned at this point that translation studies distinguish two points of view in the
practice and analysis of translations: the source-oriented point of view which
favors the specificities and requirements peculiar to the source text (faithfulness,
literality) and the target-oriented point of view which favors the target text
(rewording, adaptation).
Concerning the
third axis (the aim of the analysis), it should be noted that we already have the
inputs and the outputs, i.e. we already know the results of
the operation before even starting the formalization and implementation of the program
because we are working on text corpora which have previously been translated and
synchronized. The goal of this application is to show that the program runs in accordance
with given specifications. In other words, the program implementation is mainly a
validating procedure for the observation results[5].
In the light of
these elements, it ought to be mentioned that in the field of machine translation (MT),
the issue of linguistic extension is essential and requires that we dwell upon
it. It can be stated as such: in a linguistic A system, the information associated with a
subgroup of translation units (sentences and expressions) shows a certain regularity and
coherence likely to be systematized and computerized. The question is to know whether the
properties of the A system, while maintaining the same underlying coherence, can be
extended to a B system in such a way that the source units of translation have adequate
equivalents in the target language. If this is possible and the modifications to be
introduced do not affect the internal coherence of the B system, we may then say that the
A system and its subgroup of units are linguistically extensible, meaning that they are
transferable by a computerized translation.
Let us take an
example likely to be described adequately by a grammar in spite of its complexity and
semantic ambiguity. The sentence The Minister of Education met his Interior
counterpart can easily be translated by a human in any language. However, to be
translated by a machine, its linguistic properties must be extensible to the system which
will receive it. In this particular case, for example, the the possessive
phrase (the genitive construction in Arabic) should be transferable and the ellipsis
in the recurrent phrases (the minister of something) should be acceptable in both cases
without major modifications. Moreover, the issue of predication poses thorny
problems concerning portability in two different linguistic systems like
English and Arabic.
To avoid making
problems of correspondence between languages insurmountable, very detailed linguistic
indications must be provided to reach the next level of formalization as a prelude to
computerization. A machine-readable system of equivalences is thus a set of linguistic
formulae in which every formula specifies at least one pair of phrases (see the holistic
perspective of translation).
As an example,
let us take a set of expressions (SES) in a source text (TEX) so that every expression
(EXP) can be associated with one or several indications (IND) similar for all expressions
(EXP) of the set (SES) in the text (TEX). This gives the following formula: SES = {IND,
EXP,TEX}.
On the basis of
this formula, an equivalent formula, valid for the target text, can be obtained: SES1 =
{IND1, EXP1, TEX1}. This formula is justified with regard to a set of expressions with
relevant linguistic features in common in the target text without necessarily being
equivalent to those of the source text on the structural level. There is no systematic
projection of the properties of one system onto another. If there is projection, it must
inevitably be done in accordance with a grammatical principle whose formulation is subject
to the calculation (formal or algorithmic) which underlies all expressions in the text. In
this way, a linguistic property can or cannot be projected, in the same way as a system
can or cannot be portable, regarding the possibility or not of translating sequences from
one language to another.
By adopting this
formalist point of view in translation, explicit criteria for the comparison
of texts are laid down, each dissected and expressed in the form of adequate equations.
According to this method of analyzing translation, there is no equivalence
between languages but only correspondence of structures and linguistic
features. As opposed to equivalents which can be analyzed according to the
similarity criterion, correspondents are pairs of objects different on the
form level but comparable on the function level.
The featuring of
these correspondents, which include semantic imprecision, mainly derives from
the choices made during the observation stage. Which comparison elements should be
adopted? Of course, we exclude from our criteria any subjective consideration concerning
the beauty or the elegance of the translation to be used for
machine translation.
Our
approach can be associated, from a theoretical point of view, with textual linguistics
with significant recourse to the principle of contrastivity and formalization.
In the framework
of this approach, texts taken as a study backup are classified according to the sources
which have produced and distributed them (for instance a paper or an official body) and
according to their denotative field based on explicit semantic considerations (for
instance, texts about law or health issues).
Once the field
and the type of the text have been well defined, observations focus, on the one hand, on
its segmentation and on the constituents of its syntax (the chunks), and on
the other, on the links between those constituents from a morphological and semantic point
of view.
Underlying
calculations ensure the validation of this approach from a theoretical and practical point
of view. Thus, the choice of textual units to be analyzed and formalized must be made
according to specific concepts such as those of recurrence,
coverage and precision. Statistics is used to detect the most
frequent linguistic and translational usages of a structure in a study corpus and to form
the description which must tell us about the most relevant elements.
Hence,
observation deals with what is immediately accessible in the phrases under study, while
semantics is not tackled at this point. The use of training corpora and the induction of
descriptions are at the heart of the textual approach. The main stages of analysis are the
following (reasoning from particular facts to a general conclusion):
1)
Segmentation and morphological
analysis;
2)
Disambiguation of morphological
categories;
3)
Local and textual syntactic analysis;
4)
Analysis of functional syntactic
relations.
The main
difficulty of the analysis before translation is still the disambiguation of the original
textual context. This difficulty is essentially related to the problem of sentence
delimitation in order to eliminate the potential syntactic relations for a given type of
rules (i.e. the morphosyntactic rules or chunking rules). This problem becomes
much more salient during machine analysis of texts because difficulties resulting from the
ambiguities of morphosyntactic tagging combine with those of segmentation). With current
formalisms, it is difficult to automatically reduce the generation of intrusive
analyses which will inevitably be a problem during translation (see Chanods
works).
Nevertheless,
research into textual linguistics is opening the way to an inductive process of
translation. It is becoming possible to formulate inductive generalizations like those of
linguistic correspondences which are actually observed. However, to advance
research, it is imperative to implement systematically corroborative tests able to measure
the validity of adopted rules.
Limits of Interpretation in Machine
Translation
One of the
fundamental issues regarding the translation approach is still that of principles allowing
the interpretation of the meaning to be translated. The perspective adopted here for
analyzing translations deems there to be a specific translation mechanism which intervenes
in the interpretation of phrases and general principles associated with interpretation to
be insufficient. However, this mechanism should be amended to take into consideration
linguistics marks (tense, mood, linking word, verbal and nominal lexicon) contributing to
the interpretation of phrases and speeches to be translated. We lay out here a general
framework of the formal representation, the theory of translational formalisms, and
an interpretive translation model, the model of contextual deductions, to
specifically examine the question of translational equivalences. We will demonstrate how
this approach could be applied to natural language processing as a prelude to translation
(CAT and MT).
In fact, a few
years ago, new directions in linguistics and semiotics began redefining interpretation in
translation and regarded it as an act of cognition passing through a comparative process
of possible equivalences. The idea of setting the record straight about interpretation in
translation meets the need to adjust practical observations to these new theoretical
directions.
To establish the
elements of the debate, we must start with texts from Umberto Ecos book Les Limites
de linterprétation. The author notes, in his introduction, that some
pushed too far the interpreters initiative that the problem today is to avoid
falling in a misinterpretation. And he later adds in his book: All in all, to
say that a text has no end does not mean that {every} act of interpretation has a happy
ending. This is why the author strives to restore a certain dialectic between the
rights of the reader-translator and the rights of the translated-to-be text.
Using the message
Dear friend, in this basket brought by my slave, there are thirty figs I send you as
a gift, Umberto Eco gives a range of significations and referents, but he asserts
that we do not have the right to say that the message could mean anything. It could mean a
lot of things but it would be hazardous to suggest any meanings. Asserting this fact means
admitting phrases have a literal meaning: I know how heated is the
controversy in this respect, but I still maintain that, within the limits of a given
language, there is a literal meaning for the lexical items, the one dictionaries mention
first. Eco says we must set out to define a kind of swinging, an unstable balance,
between the interpret initiative and faithfulness to the text. The
functioning of a text can be understood by taking into consideration the part played by
the addressee in the process of its comprehension, realization and interpretation as well
as the way the text itself projects the participation of the reader.
The debate on
interpretation in translation is based on two approaches: on the one hand, searching for what the author meant to say in the
text[6];
on the other, searching for what the author says in the text, regardless of his
intentions, either by relying on textual coherence or on the signification systems of the
addressee. However, in all cases, one must use the literal meaning to develop a
translation.
Translation
criticism tries to explain the reasons why the text gives the former meaning or the
latter. The number of versions a translator can come up with is potentially unlimited but,
at the end of this process, each one of them should be tested with respect to the textual
and linguistic coherence, thus rejecting precarious or approximate translations.
Therefore, a text lends itself to numerous readings without allowing all possible
translations. If we cannot tell which translation is the best for a text, we can,
however, tell which are incorrect. Every act of translation is a difficult transaction
between the translators competence and the type of competence a given text needs to
be translated in a rigorous and coherent way. Within the unreachable authors
intention, what he meant to say, and the arguable intention of the reader-translator, his
interpretation, there is the transparent meaning of the text which refutes any inadequate
or unacceptable translation.
It is difficult
to determine what is wrong and what is authentic in a translation, because definitions
depend on the issue in question. Nevertheless, in all cases, the condition sufficient to
have an incorrect meaning is the assertion that phrases from the source text have many
equivalents in the target text. Thus, translation is not erroneous because of its internal
properties but due to a pretended multiple identity between the source and the target.
Therefore, the
sentence All translators love foreign languages, for example, does not have
many parallel meanings but it accepts in practice several possible translations[7].
On the other hand, it is impossible to reasonably conclude that all these equivalences are
identical, structurally speaking, and regardless of the subjective perception of
individuals who have produced them.
These different
translations are not only different wording of the same idea. Each structure stylistically
expresses a different meaning. Consequently, we cannot say that a nominal sentence and a
verbal sentence convey the same idea and express the same meaning, even if the words used
are identical in the two structures. We know predication is not the same in both cases
because the nominal sentence emphasizes the noun whereas the verbal sentence focuses on
the process or the action. To declare that two structurally different translations are
equivalent to a third original structure is to simply ignore the specificities of the
linguistic structures in expressing nuances and meaning subtleties.
To be convinced
of the validity of these observations, retro-translation could be used as a
discriminating criterion between translations. Retro-translation means, in
fact, retranslating to the source language, without resorting to the original the version
already translated into the target language. Translating the version translated backwards
and blindly often allows us to notice that the equivalent structure was not
the one taken as a starting point for translation, demonstrating the inaccuracy of the
aforementioned translation.
The notion of
possible equivalence is useful for a translation theory because it helps to
decide which meaning interests the translator in his work and what he wants to convey
through language. But we must be aware of the fact that, among possible translations,
there are inevitable translations, improbable translations, and inadmissible
translations.
In a sentence
such as: All translators love foreign languages, the translator must think of
the best way of rendering it in the target language. He will first think in relation to
the three levels of language: morphological, semantic, and syntactic. The inevitable
translation will take into consideration these levels while being linguistically correct
and culturally appropriate. The improbable translation will move away from literal
accuracy in an over-translation of the original or create a certain stylistic effect.
Finally, the inadmissible translation will give a semantically different version of
the original while being linguistically accurate.
In this regard, a
distinction should be made between semantic translation and critical
translation. The first is the result of the technique adopted by the translator,
when faced with the linear progression of a text, of giving a certain meaning in
accordance with the lexicon of its phrases, whereas the second is a metalinguistic
activity aimed at describing and explaining, on the formal level, why a given text gives a
given translation, with the exception of all others, however sensible they are.
An exemplary
translator is not only required to be precise and meticulous but also to pay great
attention to the stylistic subtleties of both his work languages according to the
principle that every wording has its own meaning and aim in the linguistic system using
it (the economy of language principle). If the exemplary translator acts
as such, he will produce a consensual translation without any subjective value judgment.
Otherwise, he will be compelled to search in vain for possible meanings and potential ways
of rendering them.
Some translators
may wonder: why be so rigorous if the meaning is understood and conveyed?
However, such translators, indulgent or careless depending on each case, will not be
exemplary translators, because they seek the exact meaning and the inevitable translation,
the one likely to be taken and modeled for language natural processing. But how can we
achieve this goal when faced with so many readings and interpretations?
According to the
semiotician Peirce, the meaning interpretation is an action involving the cooperation of
three subjects: the sign (ex.: the word rose), its object (the real tangible
flower) and its interpretant (the concept of the red flower). What is important in the
definition of Peirce is that it does not take into consideration an interpreter or
conscious subject. Hence, it should be remembered, in accordance with the analyses of
Peirce and Eco, how important the distinction between the meaning system (the sign system)
and the process of communication is (requiring the presence of an interpreter).
The meaning
system is a series of elements with a combinatory rule governing the disposition of
elements between them (its syntax). The acceptable sequences of a syntactic system
associated with another system can be transferable from one language to another (ex.:
w+a+t+e+r = water = drinkable transparent liquid is transferable in any
language in the world without recourse to human interpretation).
In a semiotic
system, any content could become a new expression likely to be interpreted or translated
by another expression in another language. Abduction is a form of inference which
tries to accurately interpret the meaning of a phrase and to establish a rule using a word
and its context. Recognizing a series of words as a coherent sequence (i.e.
as a text) means finding a textual theme able to create a coherent connection between
different data with no link between them. The identification of a textual theme is an
example of an abduction. Every translator makes abductions to choose between numerous
possible readings of a text. The economy of language criteria compel us to always choose
the easiest option in the absence of any other selection tool.
For a Corpus-based Translation
Methodology
The method adopted is the method of formal linguistics
but the approach suggested here[8]
is based on three postulates of corpus linguistics:
Firstly, all translation solutions already exist in
translated texts.
Secondly, all translational equivalences are subject to
analysis and formalization.
Thirdly, all formalizations are systematized and
computerized.
This approach is primarily applied to specialized texts
(with a controlled or closed vocabulary) and secondarily to general texts (with a connoted
and polysemous vocabulary). The first category comprises the vast majority of literature
translated nowadays, whereas literary (or poetic) language represents a tiny part of the
discursive usage in textual corpora.
This approach is aimed at determining translations
likely to be formalized. To identify the most relevant translation solutions, we have
recourse to the calculation of occurrence frequency: the more an equivalent is frequent in
translated texts, the more it is regarded as an inevitable solution; the less frequent it
is in translation, the more it is regarded as marginal.
We can mention, for example, for the sentence All
translators love foreign languages, five different ways of translating it into
Arabic. The most frequent wording would be considered as inevitable, regardless of its
intrinsic quality because we deem its recurrence to be proof of its validity, not to
mention its legitimacy. The goal is not really to evaluate the quality of translations but
rather to take note of the translational usage.
Thus, corpus-based translation is based on three
principles or main presuppositions:
1)
The immanence principle: each pair of texts
forms the same composite element of signification; the analysis examines both texts but
only as translations of each other; it does not rely on external data such as dictionary
information or grammars.
2)
The composition principle: The only true
meaning is through and in the relationship between the two texts, especially
the correspondence relationship between translation units; the analysis of
bitexts consists, therefore, of establishing the correspondence network
between different elements, a network which will be the basis for the text translation.
3)
The structuring principle: every translation
respects a discursive logic and a grammar, i.e. a certain number of linguistic
rules and basic structures. In a set of units named translations there are
different levels of correspondence, each with their own grammar.
Therefore, the global content of a translation can be
analyzed on three different levels:
1)
The translation level: in a translation, we
study the changes which convey the meaning of the source text to the target text. At the
end of a translation process, the analysis seeks to redraw the various stages, logically
related to one another, which mark the transformation of a sentence into its equivalent.
In each stage, we specify the links between the functions of some of the phrasal elements
which determine the meaning and produce the transformations.
2)
The discursive level: the analysis involves
three operations: (a) identifying and classifying sequences i.e. significant
elements in a text; (b) establishing equivalents to each element in the text in order to
determine how this element was translated in the text; (c) finding why elements, in a
given text, are translated in such and such a way.
3)
The logic-semantic level: it is the most
abstract level of analysis. It works on the postulate that logic and meaningful forms
underlie translations of any speech. At this level, analysis means specifying the logic
which manages fundamental articulations of translation units. To do so, we must have
recourse to formalization and representation of relations within and between sentences.
The thinking in translation studies currently seems
limited to two correlative paradoxes. On the one hand, the pragmatism of the
interpretive model which obviously tends to over-reduce the method and
sacrifice precision for the sake of communication, and accuracy for the sake of rapidity.
On the other, the opposition between the logic paradigm and the hermeneutic paradigm
reduces translation pedagogy to a kind of sophisticated mnemonics without any real
applicable or metalinguistic dimension.
In this tense field, our translation theory evolves
between the two paradigms (the requirement for interpretation and the necessity for
formalization). It questions interpretive practices in the process of translation. In
fact, the interpretation issue seems today to be the linking point between text theories
and translation theories. In our discipline, this issue is nowadays the main controversial
element for establishing a new applicable translation methodology.
We suggest below a preliminary draft of the translation
work which could be requested from novice translators.
1) Alignment and Criticism of
translated corpora
Aligning corpora means matching every translation
unit of the source corpus to an equivalent unit of the target corpus. In this case,
the term translation unit covers long sequences like chapters or paragraphs as
well as shorter sequences such as sentences, phrases or simply words.
The translation unit selected depends on the point of
view chosen for the linguistic analysis and on the type of corpus used as a database. If
the translated corpus is very faithful to the original, we will proceed with a close
alignment of the two corpora with the sentence or even the word, as the basic unit,
whereas if the corpus used is an adaptation rather than a literal translation, we will
align larger units such as paragraphs or even chapters.
It is obvious that the initial postulate, which allows
an educational use of such corpora, is to establish correspondence between the content of
examined units and their interconnections. So-called free translations must
lead to well-sustained thinking on missing sequences, changes in the text order, content
modification, meaning adaptation, etc. All these operations are common in everyday
translation practice but their frequency varies according to the fields of study.
Furthermore, there are important structural differences
between English and Arabic which prevent rigorous sequential processing. Due to the huge
linguistic difference between the two systems, we often notice that the sentence order has
been modified and sometimes omissions or additions occur between two texts which are
nonetheless a translation of each other. These aspects must be examined from a stylistic
point of view and, if possible, systematized.
All these observations lead us to consider parallel
corpora not so much as a set of equivalent sequences but rather as corresponding text
databases. At any level (text, paragraph, sentence, phrase or word), the examined
corpus should be regarded as a lexical and translation database. In other words, we
suggest submitting it to a search technique similar to the one used in
information searching systems.
Thus, the main goal will be to highlight structural
equivalences between the two languages, and, more pragmatically, to search for the closest
T2 (the target text) unit to the request represented by a T1 (the source text)
unit.
2) The Linguistic and Stylistic
Analysis of the Corpus
The different levels of linguistic analysis serve as a
basis to study translation examples:
-
Firstly,
morphological analysis identifies equivalent words or morphemes in the corpus.
-
Secondly,
syntactic analysis identifies corresponding phrases and structures in both texts.
-
Finally,
semantic analysis identifies the meaning of units and eventual ambiguities in every text.
The usefulness of such a corpus goes beyond the limited framework of
translation. While the main goal is translation criticism, other useful applications may
also be considered such as generating bilingual terminology lists, extracting examples for
pedagogic purposes, enhancing current dictionaries or even for the induction of grammar
rules.
The suggested approach allows us to optimize thinking
in translation studies regarding bilingual texts.
The general idea of this approach is to associate
equivalent translation units (words, sentences, syntactic structures) when the
corpus sequences are identified.
The main goal of such an approach is to allow the
pairing mechanism to be divided into two different parts:
1)
Identifying the potentially associable
units in the two corpora.
2)
Calculating the probability of suggested units by
submitting them to the bilingual corpus data.
By dividing the procedure into two phases, relatively
easy translation models can be put in place in order to identify units likely to correlate
the theoretical analysis with actual translations observed in the corpus.
One of the possible ways of devising operational
systems is to develop analysis methods based on the data stored in training corpora.
But such methods, based on model training, depend on the amount of a priori
available information.
In this respect, a distinction can be made between two
types of situations:
Situation 1: A parallel corpus of
analyzed and annotated translation units is available a priori, i.e. a corpus for which a syntactic scheme
representing the structure of a unit has been selected for each unit, given its meaning.
This first situation, in which a significant amount of
information is available to evaluate parameters of the equivalence model, will be referred
to as a training situation and will or will not be used, depending on its occurrence
frequency in the annotated corpus.
Situation 2: Relatively little information is
available, meaning it is a raw corpus. In this case, hypotheses should be made on the
basis of iterative re-estimation of corpus data. For
example, all units starting with the sequence except that will be grouped in
order to compare their translations.
It is interesting to know in this respect that one of
the advantages of the statistical model, compared to more theoretical approaches of
contrastive linguistics, is that it considerably reduces the number of possibilities of
approximate translations while evaluating the quality of available corpora.
Hence, an examination of translation possibilities
available in our corpus leads to the following observations concerning the nature of
equivalences:
- cases of strong equivalence in which the
number of words, their order and their meanings in the (bilingual) dictionary are the
same.
Example:
P1: The rise in unemployment in March worries
officials.
T1: izdiyâd al-bitâla
fî mâris yuqliq al-masûlîn
Literally: (The) rise (in) the unemployment in
March worries the officials.
- cases of approximate equivalence in which the
number of words and their meanings are the same but their order is different.
Example:
P1: The President of the Republic received his
Syrian counterpart
T1: istaqbala raîs
al-jumhûriyya nazîrahu al-sûrî
Literally: received (the) President of the
Republic his counterpart Syrian.
- cases of weak equivalence in which the order
and number of words are different but their meanings in the dictionary are the same.
Example:
P1: Rains are expected in the North of the
country.
T1: yutawaqqau an tumtira
fi al-shamâl
Literally: It is expected that it rains in the
North
In our bilingual corpus, this last case accounts for
the majority of translation equivalences.
A decreasing alignment of the bilingual corpus is used
to ensure the greatest possible reliability for the searching operation, from the largest
translation units (chapters and paragraphs) to the smallest ones (sentences followed by
phrases and words). Thus, the field of analysis is tightened by performing a
shrinking alignment of the corpus units and by focusing the search on
gradually smaller units.
Conclusion
From a methodological point of view, combining a
linguistic approach with a stylistic approach makes it possible to fine-tune alignment and
enhance translation criticism.
However, some aspects deserve particular attention in
order to ensure training efficiency.
On the one hand, the type of data used, i.e. the
bilingual parallel texts, may pose a problem if the quality of the corpus is poor or if
its translation quality has not been subject to strict control.
On the other hand, the sharpness of criticism
and the precision of extracted information concerning translation depend on the volume of
available training data.
For all the aforementioned reasons, there will be a
need for a long training period with a great amount of diverse textual data. Once this
stage is completed, the mechanisms observed by the trainee on corpus can be reactivated to
infer different kinds of already tested translation solutions.
Indicative Bibliography
Chanod, J.-P., 1993, « Problèmes de robustesse en analyse
syntaxique », in Actes de la conférence
« Informatique et langue naturelle ».
IRIN, Université de Nantes.
Cori, M., Marandin J.-M., 2001, « La linguistique au
contact de linformatique : de la
construction des grammaires aux grammaires de
construction », Histoire, Epistémologie,
Langage, 23 (1),
pp. 49-79.
Eco, U., 1992, Les Limites de linterprétation,
Paris, Grasset.
Gazdar,
G., & Mellish Ch., 1989, Natural
language processing in LISP, an introduction to computational linguistics,
Addison-Wesley.
Guidère, M., 2002, Manuel de traduction français-arabe,
Paris, Ellipses
Guidère, M, 2000, Publicité et traduction,
Paris, LHarmattan.
Guidère,
M. 2001, Toward Corpus-Based Machine Translation for Standard Arabic, in Translation
Journal, n°1, vol. 6, http://accurapid.com/journal/19mt.htm
Kamp
H., & Reyle U., 1993, From discourse to logic, introduction to model theoretic
semantics of natural language, formal logic and discourse representation theory,
Dordrecht, Boston : Kluwer Academic Publishers.
Lederer, M., 1994, La traduction aujourdhui :
le modèle interprétatif, Paris, Hachette.
Mounin G.,
1978, Linguistique et traduction, Bruxelles, Mardaga.
Peirce,
Ch.-S., 1978, Ecrits sur le signe, Paris, Editions du Seuil.
Seleskovitch,
D., Lederer, M., 2001 (4è éd.), Interpréter pour traduire, Paris,
Didier Erudition.
Sinclair,
J.M., Payne, J., Perez Hernandez, C. (eds.), 1996, Corpus to Corpus : a Study of
Translation Equivalence, IJCL 9.3.
Tognini-Bonelli,
E. 2001, Corpus Linguistics at Work, Amsterdam / Philadelphia, John
Benjamins Publishing.
Wichmann,
A., Fligelstone, S., Knowels, G., Eds. 1997, Teaching and Language Corpora, London / New York,
Longman.
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice
counts!
|