Translation Technology and the Translator
By
John Hutchins
WJHutchins@compuserve.com
http://ourworld.compuserve.com/homepages/WJHutchins/
(University of East
Anglia, Norwich, UK)
[From: ITI conference 11: international conference,
exhibition & AGM. Proceedings compiled by Catherine
Greensmith & Marilyn Vandamme. Proceedings of the
Eleventh Conference of the Institute of Translation
and Interpreting, 8-10 May 1997, at The Crown Hotel,
Crown Place, Harrogate. London: ITI, 1997. Pp. 113-120.]
This
article in PDF-format
Become a Member of
TranslationDirectory.com at Just 4 EUR/Month
(Paid Yearly)
Advertisements:
1. Introduction
Translators are perhaps the most critical
audience for presentations about the automation of
translation. Many of them will agree with comments
made by J.E.Holmström in a report on scientific and
technical dictionaries submitted to Unesco in 1949.
Having heard that some researchers were investigating
the possibilities, he thought that "the resulting
literary style would be atrocious and fuller of 'howlers'
and false values than the worst that any human translator
produces". The reason was that "translation is an
art; something which at every step involves personal
choice between uncodifiable alternatives; not merely
direct substitutions of equated sets of symbols but
choices of values dependent for their soundness on
the whole antecedent education and personality of
the translator." His comments preceded by five years
the first tentative demonstration of a prototype system,
and were based on pure speculation. Nevertheless,
such comments have been repeated again and again by
translators for nearly fifty years, and no doubt they
shall be heard again in the next fifty.
However, we shall see that computer-based
translation systems are not rivals to human translators,
but they are aids to enable them to increase productivity
in technical translation or they provide means of
translating material which no human translator has
ever attempted. In this context we must distinguish
(1) machine translation (MT), which aims to undertake
the whole translation process, but whose output must
invariably be revised; (2) computer aids for translators
(translation tools), which support the professional
translator; and (3) translation systems for the 'occasional'
non-translator user, which produce only rough versions
to aid comprehension. These differences were not recognised
until the late 1980s; the previous assumption had
been that MT systems, whether running on a mainframe
or a microcomputer, could serve all these functions
with greater or less success. In part, this failure
to identify different needs and to design systems
specifically to meet them has contributed to misconceptions
about what translation technology can do for the professional
translator.
2. The first MT systems
When machine translation (MT) was
in its infancy, in the early 1950s, research was necessarily
modest in its aims.1 It was constrained by the limitations
of hardware, in particular by inadequate computer
memories and slow access to storage, and by the unavailability
of highlevel programming languages. Even more crucially
it could look to no assistance from the language experts.
Syntax was a relatively neglected area of linguistic
study and semantics was virtually ignored. The early
researchers knew that whatever systems they could
develop would produce poor quality results, and they
assumed major involvement of human translators both
in the pre-editing of input texts and in the post-editing
of the output. They proposed also the development
of controlled languages and the restriction of systems
to specific subject areas.
In this atmosphere the first demonstration
systems were developed, notably the collaboration
between IBM and the Georgetown University in 1954.
Based on small vocabularies and carefully selected
texts, the translations produced were impressively
colloquial. Consequently, the general public and potential
sponsors of MT research were led to believe that good
quality output from automatic systems was achievable
within a matter of a few years. The belief was strengthened
by the emergence of greatly improved computer hardware,
the first programming languages, and above all by
developments in syntactic analysis based on research
in formal grammars (e.g. by Chomsky and others.)
For the next decade MT research grew
in ambition. It became widely assumed that the goal
of MT must be the development of fully automatic systems
producing high quality translations. The use of human
assistance was regarded as an interim arrangement.
The emphasis of research was therefore on the search
for theories and methods for the achievement of 'perfect'
translations. The current operational systems were
regarded as temporary solutions to be superseded in
the near future. There was virtually no serious consideration
of how 'less than perfect' MT could be used effectively
and economically in practice. Even more damaging was
the almost total neglect of the expertise of professional
translators, who naturally became anxious and antagonistic.
They foresaw the loss of their jobs, since this is
what many MT researchers themselves believed was inevitable.
Progress was much slower than expected,
and the output of systems showed no sign of improvements.
In these circumstances it was not surprising that
in 1966 a committee set up by US sponsors of research
- the Automatic Language Processing Advisory Committee
(ALPAC) - found that MT had failed according to its
own aims, since there were no fully automatic systems
capable of good quality translation and there seemed
little prospect of such systems in the near future.
While this ALPAC report brought to
an end many MT projects, it did not banish the public
perception of MT research as essentially the search
for fully automatic solutions. The subsequent history
of translation technology is in part the story of
how this mistaken emphasis of the early years has
had to be repaired and corrected. The neglect of the
translation profession has been made good eventually
by the provision of translation tools and translator
workstations. MT research has itself turned increasingly
to the development of realistic practical systems
where the necessity for human involvement at different
stages of the process is fully accepted as an integral
component of their design architecture.
Hence since the early 1970s development
has continued in three main strands: computerbased
tools for translators, operational MT systems involving
human assistance in various ways, and 'pure' theoretical
research towards the improvement of MT methods.
3. MT in operation
Until the late 1980s one paradigm
dominated the utilisation of MT systems. It had been
inherited from the very earliest days: the system
produced large volumes of poorly translated texts,
which were either (i) used for the assimilation of
information directly or (ii) submitted to extensive
post-editing, with the aim of getting texts of publishable
quality for dissemination. As a means of improving
the quality many organisations introduced controls
on the vocabulary, structure and style of texts before
input to systems; and this has been how Systran, Logos,
METAL and similar mainframe systems have been used
(and continue to be used) by multinational companies
and other large organisations.
When the first PC versions of MT systems
appeared it was widely assumed that they would be
used in much the same way: to obtain 'rough gists'
for information purposes or as 'draft translations'
for later refinement. In both cases, it was also widely
assumed that the principal users of MT systems would
be translators or at least people with good knowledge
of both source and target languages; and, in the cased
of the use in large organisations, it was expected
that most would be professionally trained translators.
However, during the late 1980s - and
with increasing pace since the early 1990s - this
paradigm and its assumptions have been broken by developments
on a number of fronts.2 Firstly, there has been the
commercial availability of translator workstations,
designed specifically for the use of professional
translators; these are essentially computer-based
translation tools and not intended to produce even
partial translations fully automatically. Secondly,
the PC-based systems were bought and used by an increasingly
large number of people with no interest in translation
as such; they were being used as 'aids for communication',
where translation quality was of much less importance.
Thirdly, there came the development of domain-specific
systems by clients themselves: custom-built systems
accepting input in constrained vocabulary and integrated
closely in documentation and publication systems.
Fourthly, the growth of telecommunication networks
with communication across many languages has led to
a demand for translation devices to deal rapidly in
real time with an immense and growing volume of electronic
language. Finally, the wider availability of databases
and information resources in many different languages
has led to the need for multilingual search and access
devices which incorporate translation modules.
All current commercial and operational
systems produce output which must be edited (revised)
if it is to attain publishable quality. Only if rough
translations are acceptable for information analysis
purposes can the output of MT systems be left unrevised.
Commercial developers of MT systems now always stress
to customers that MT does not and cannot produce translations
acceptable without revision: they stress the imperfect
nature of MT output. They recognise fully the obligation
to provide sophisticated facilities for the formatting,
input, revision and publication of texts within total
documentation processing from initial authoring to
final dissemination.
It is now widely accepted that MT
works best in domain-specific and controlled environments.
The first domain-specific success was Meteo, a system
for translating weather forecasts from English into
French, and used continuously since 1977 by the Canadian
broadcasting service. The use of controlled input
was taken up in the late 1970s by Xerox for its implementation
of the Systran system. Other applications of controlled
input have followed in the 1980s and 1990s with other
general-purpose systems, e.g. for the localisation
of computer software for sale in many countries and
languages.
However, rather than adapting general-purpose
MT systems in this way, it is now recognised that
it is better to design systems ab initio
for use with controlled language. A number of independent
companies outside the academic MT research community
have been doing this in recent years (e.g. Volmac);
the largest current development is the Caterpillar
project based on the research at Carnegie Mellon University.
4. Tools for translators
In general most commentators agree
that MT (full automation) as such is quite inappropriate
for professional translators. They do not want to
be subservient to machines; few want to be revisers
of poor quality MT output. What they have long been
asking for are sophisticated translation tools. Since
the early 1990s they can now have them in the shape
of translation workstations. These offer translators
the opportunity of making their work more productive
without taking away the intellectual challenge of
translation. Translator workstations combine access
to dictionaries and terminological databanks, multilingual
word processing, the management of glossaries and
terminology resources, appropriate facilities for
the input and output of texts (e.g. OCR scanners,
electronic transmission, high-class printing).
The development of translation tools
became feasible, firstly with the availability of
realtime interactive computer environments in the
late 1960s, then the appearance of word processing
in the 1970s and of microcomputers in the 1980s and,
subsequently, with intraorganisation networking and
the development of larger computer storage capacities.
Although workstations were developed outside the older
MT research community, their appearance has led to
a decline of the previous antagonism of translators
to the MT community in general. They are seen to be
as the direct result of MT research. Indeed, the 'translation
memory' facility, which enables the storage of and
access to existing translations for later (partial)
reuse or revision or as sources of example translations,
does in fact derive directly from what was initially
'pure' MT research on bilingual text alignment within
a statistics-based approach to automatic translation.
At the present time, the sales of
translator workstations incorporating translation
memories are increasing rapidly, particularly in Europe.
Their success has built upon translators' experience
with terminology management systems and upon the demonstrable
improvements of productivity, terminological consistency
and overall quality. The next stage of development
will be the fuller integration of MT modules in order
to provide automatic translation of sentences or text
fragments when required, e.g. if the existing texts
in a translation memory do not provide usable translation
sources.
5. Research for machine translation
After ALPAC, research on MT has, of
course, continued.3 However, the field has continued
to attract perfectionists. Very often, systems have
been developed without any idea of how they might
be used or who the users might be. MT has been seen
as a testbed for exploring new linguistic and computational
techniques. In nearly every case, it was found that
the 'pure' adoption of a new theory was not as successful
as initial trials on small samples appeared to demonstrate.
The basic lesson is that MT demands an eclectic approach,
the use of hybrid methods combining a variety of techniques;
and, above all, no quick results can be expected with
any new approach.
What was often forgotten is that MT
is the application of computational, linguistic, etc.
methods and techniques to a practical task; that translation
is itself a means to an end - a task which has never
been and cannot be 'perfect'; there are always other
possible (often multiple) translations of the same
text according to different circumstances and requirements.
MT can be no different: there cannot be a 'perfect'
automatic translation. The use of an MT system is
contingent upon its cost effectiveness in practical
situations.
Within the last ten years, research
on spoken translation has developed into a major focus
of MT activity. Research projects such as those at
ATR in Japan, Carnegie-Mellon University in the US
and on the Verbmobil project in Germany are ambitious.
But they do not make the mistake of attempting to
build all-purpose systems: systems are constrained
and limited to specific domains, sublanguages and
categories of users. Nevertheless, there are obvious
potential benefits even if success is only partial.
Research has begun also on systems
for speakers or writers who are ignorant of the target
language; an area neglected in the past. In these
cases, what is required is a means of conveying a
message in an unknown language; it does not have to
be a straight translation of any existing original.
From interactive dialogue a translatable (MT-amenable)
'message' can be composed for automatic conversion
into an idiomatic and correct message in the target
language without further involvement of the originator.
As for translation for those wholly
ignorant of the source language, this need has been
provided until recently by the use of unrevised outputs
from older batch-processing systems, i.e. as by-products
of systems primarily intended to produce translations
for revision before publication. Within the last decade,
however, cheap PC-based software has appeared on the
market which can be (and undoubted is being) used
by monolinguals who want only to grasp something of
the gist of texts. They are not wholly satisfactory,
of course, and the development of fully automatic
systems specifically for this potentially huge market
is a challenge for future MT research.
6. Translation and networking
With the expansion of global telecommunications
(the Internet and World Wide Web) has come the networking
of translation services. Nearly all the larger MT
software vendors now offer their systems as a service
to individual or company customers. Texts can be sent
on-line for immediate 'rough' translation with no
post-editing, or for treatment in a more traditional
manner with expert revision, editing and preparation
for publication by the service. This form of networked
MT is clearly a further development of familiar translation
services, and one with considerable growth potential.
It is assumed that in future there will emerge various
forms of networked 'translation brokerage' services
which will advise customers on the most appropriate
MT service for their needs, e.g. in terms of costs,
languages, speed, dictionary coverage, terminology
control, overall translation quality, post-editing
support, etc. Some of these 'translation brokers'
may themselves be automated, and undertake searches
of the Web for particular client needs. As a consequence,
we may well see the emergence of more specialised
MT systems (for particular domains and language pairs),
some of which will thrive and others which will fail
in the global competitive market.
Even more significant for the future,
however, is the appearance of systems for on-line
and real-time translation of electronic mail messages.
In 1994 the CompuServe service introduced automatic
translation from and to English and French, German
or Spanish for messages on one of its forums.4
It became so popular that the facility was extended
to two other on-line services within the next couple
of years, until now thousands of messages a day are
being translated. The software used was not of course
designed originally to deal with the frequently ungrammatical
conversational style and the sometimes idiosyncratic
vocabulary of electronic mail. Hence, much of the
output is garbled and barely comprehensible; but a
large number of users have found the results valuable
aids for comprehension.
Only a fully automatic system could operate in real-time
on this scale. The potential market for network MT
systems is enormous. At CompuServe alone there are
more than 3,000 other on-line services where MT could
be introduced; and other Internet services could easily
follow their lead. It has been estimated that there
are currently over 40 million electronic mail messages
a month. If only a small fraction of these were candidates
for translation, the demand would be enormous.
In addition to electronic messages,
the amount of information available in text form on
Web pages can now counted in their hundreds of millions,
and they are growing exponentially at a high rate
(10% between 1995 and 1996). The non-English content
is estimated as 80% of the total, and there is no
doubt that readers everywhere prefer to have text
in their own language, no matter how flawed and error-ridden
it may be, rather than to struggle to understand a
foreign language text. The Japanese software companies
have already recognised the huge potential market
and there are a number of English-Japanese translation
modules available for integration with Web software.
Similar Web translation software is being developed
and sold for other languages, both by existing vendors
of MT systems and by new companies.
A further factor will be the growth
of multilingual access to information sources. Increasingly,
the expectation of users is that on-line databases
should be searchable in their own language, that the
information should be translated and summarised into
their own language. The European Union is placing
considerable emphasis on the development of tools
for information access for all members of the community.
Translation components are obviously essential components
of such tools; they will be developed not as independent
stand-alone modules, but fully integrated with the
access software for the specific domains of databases.
The use of MT in this wider context is clearly due
for rapid development in the near future.
A further factor will be the growth
of multilingual access to information sources. Increasingly,
the expectation of users is that on-line databases
should be searchable in their own language, that the
information should be translated and summarised into
their own language. The European Union is placing
considerable emphasis on the development of tools
for information access for all members of the community.
Translation components are obviously essential components
of such tools; they will be developed not as independent
stand-alone modules, but fully integrated with the
access software for the specific domains of databases.
The use of MT in this wider context is clearly due
for rapid development in the near future.
7. Implications for professional
translation
Where do these developments leave
the professional translator? It is plausible to divide
the demand for translation into three main groups.
The first group is the traditional demand for translations
of publishable quality: translation for dissemination.
The second, emerging with the information explosion
of the twentieth century, is the demand for translations
of short-lived documents for information gathering
and analysis which can be provided in unedited forms:
translation for assimilation. The third group is the
demand for on-the-spot translation - the traditional
role of the interpreter - which has taken a new form
with electronic telecommunications: translation for
interaction.
Translation for dissemination has
been satisfied with mixed successes and frequent failures
by the large-scale MT systems which are most familiar
to translators. Cost-effective use of relatively poor
quality output, which has to be revised by human translators,
is difficult to achieve without some control of the
language of input texts (at least for terminology
consistency). It has been an option for only the largest
multinational companies with large volumes of documentation,
which cannot be dealt with except by automating parts
of their total documentation processes. In recent
years, translation workstations have offered a feasible
and probably more attractive route for professional
translators: translations of publishable quality can
be made at higher productivity levels while maintaining
translators' traditional working methods. In the future,
we can expect the majority of professional translators
to be using such tools - not just from commercial
expediency, but from personal job satisfaction.
Translation for assimilation has not
traditionally been undertaken by professional translators.
The work has been done in organisations often by secretaries
or other clerical staff with some knowledge of languages
as an occasional service, and usually under time pressures.
Those performing the work have naturally been dissatisfied
with the results, since they are not professionally
trained. In this function, MT has filled a gap since
the first systems were available in the early 1960s.
The use of Systran at the European Commission illustrates
the value of such 'rough' translation facilities.
This use exceeds by far its use for the production
of translations for dissemination. It is believed
that most of the use for the cheaper PC-based translation
software is translation for information assimilation,
mainly for personal use but sometimes within an organisation.
Rarely, if ever, do professional translators see this
output. Undoubtedly, there will continue to be a large
and growing demand for this type of translation need
- one which the translation profession as such has
not been able to meet in the past.
Translation for interaction covers
the role of translation in face-to-face communication
(dialogue, conversation) and in correspondence, whether
traditional mail or the newer electronic, more immediate,
form. Translators have often been employed occasionally
by their organisations in these areas, e.g. as interpreters
for foreign visitors and as mediators in company correspondence,
and they will continue to do so. But for the real-time
translation of electronic messages it is not possible
to envisage any role for the translator; for this,
the only possibility is the use of fully automatic
systems.
However, the very familiarity of MT
systems will alert a much wider public to translation
as a major and crucial feature of global communication,
and probably to a degree never before experienced.
Inevitably, translation will itself receive a much
higher profile than in the past. People using the
crude output of MT systems will come to realise the
added value (i.e. higher quality) of professionally
produced translations. As a result, the demand for
human produced translation will rise, and the translation
profession will be busier than ever. Fortunately,
professional translators will have the support of
a wide range of computer-based translation tools,
enabling them to increase productivity and to improve
consistency and quality. In brief, automation and
MT will not be a threat to the livelihood of the translator,
but will be the source of even greater business and
will be the means of achieving considerably improved
working conditions.
Notes
1 For the history of machine
translation see: W.J.Hutchins: Machine translation:
past, present, future. Chichester (UK): Ellis
Horwood, 1986.
2 For a survey of current
use of MT systems see: C.Brace, M.Vasconcellos and
L.C.Miller: 'MT users and usage: Europe and the Americas',
MT News International no.12 (October 1995),
14-19.
3 For a review of MT research
see: W.J.Hutchins: 'Research methods and system designs
in machine translation: a ten-year review, 1984-1994',
in: Machine Translation Ten Years On, international
conference, 12-14 November 1994, Cranfield University.
4 For details see: M.Flanagan:
'Two years online: experiences, challenges and trends',
in: Expanding MT Horizons: proceedings of
the Second Conference of the Association for Machine
Translation in the Americas, 2-5 October 1996, Montreal,
Quebec, Canada, pp. 192-197.
|