The
MT Summit series of conferences began nearly fifteen
years ago, in 1987 at Hakone, Japan. Much has
changed in the field of MT since then. Many of
the methods, systems and techniques that are familiar
to us today have emerged in the last fifteen years.
For example, in the late 1980s there were no example-based
MT systems,
no statistics-based methods, there were no translation
memories, there was no text alignment, there was
no localization industry, there were scarcely
any MT systems for personal computers; and, above
all, there was no translation on the Internet,
and the World Wide Web was just a gleam in the
eyes of its creators. Systems were used only by
large organisations, governmental bodies and a
few multinationals. As they had been for decades,
they were designed for mainframe computers, for
expensive minicomputers and for workstations;
it was assumed that all output had to be edited,
whatever the eventual use. Personal computers
(or microcomputers as they were usually known)
were still new, low-powered and expensive, and
MT software for them was rare and difficult to
obtain.
The
assumption then was still what it had been since
the very beginning of MT, namely that the aim
of MT development (even if distant) was automatic
translation of high quality comparable to that
of a human translator, and not requiring human
revision before publication. The vision was essentially
what Bar-Hillel in 1960 had called FAHQT (fully
automatic high quality translation). It was still
the vision of the pioneers before the ALPAC report
in 1966. Systems were developed for use primarily
on mainframe computers (or the smaller mini-computers)
by multinational companies or government organizations
for the production (dissemination) of goodquality
publishable
documents.
By
the late 1980s, of course, it had long been recognized
that publishable quality could be achieved only
with human intervention, either the revision of
MT output or the control of input documents (e.g.
simplification and regularization of grammar and
vocabulary). It was also being recognized that
some good use could be made of less-than-perfect
MT output (raw translations) for information gathering
purposes (assimilation). However, this use was
seen as a by-product, as a slightly less than
respectable use of MT. In brief, it was believed
that although FAHQT could not be realised it ought
to remain the ultimate goal.
However,
this (usually implicit) adherence to FAHQT as
an ultimate goal has been damaging, with effects
we still see today. It means that we have to apologise
for the low quality of MT systems, and for the
fact that quality has not significantly improved
since the late 1980s (even since the 1970s). It
means also that we have to counter many misconceptions:
on the one hand, that MT seeks to replace human
translators; and on the other, that MT research
is inherently misguided, since automatic
translation must be impossible.
*
* *
The
impact of the Internet has been significant in
recent years: an accelerating growth of real-time
on-line translation, and the development of many
systems designed specifically for the translation
of Web pages and of electronic mail. The demand
for immediate translations will surely continue
to grow rapidly, but at the same time users are
also going to want better results. There is clearly
an urgent need for translation systems developed
specifically to deal with the kind of colloquial
(often ill formed and badly spelled) messages
found on the Internet. The old linguistics rule-based
approaches are probably not equal to the task
on their own, and corpus-based methods making
use of the voluminous data available on the Internet
itself are obviously appropriate. But as yet there
has been little research on such systems.
At
the same time, the Internet is also providing
the means for more rapid delivery of quality translation
to individuals and to small companies. A number
of MT system vendors offer translation services,
usually adding value by human post-editing. More
will surely appear as the years go by. It is probable
that the very existence of low-quality MT output
from Internet systems and from commercial software
will create a demand for good translations
from people who have had no previous access to
translation facilities.
Another
profound impact of the Internet will concern the
nature of the software itself. What users of Internet
services are seeking is information in whatever
language it may have been written or stored –
translation is just one means to that end. Users
will want a seamless integration of information
retrieval, data and information extraction, and
text summarization systems with translation. As
this conference has demonstrated, research has
begun in such areas as cross-lingual information
retrieval, multilingual summarization, and so
forth, and before many years there will, I am
sure, be systems available on the market and the
Internet.
In
fact, it is probable that in future years there
will be fewer pure MT systems (commercial,
on-line, or otherwise) and many more computer-based
tools and applications where automatic translation
is just one component. As a first step, it will
surely not be long before all word-processing
software includes translation as an inbuilt option
(it is already common in Japan.) Integrated language
software will be the norm not only for the multinational
companies but also available and accessible for
anyone from their own computer (whether desktop,
laptop, network-based, etc.) and from any device
(television, mobile telephone, etc.) interfacing
with computer networks. Again, it will not spell
the end of the pure MT system completely, but
be a demand-led expansion of the provision of
translation software in some accessible and usable
form for the future information society.
In
the past there has often been tension between
the translation profession and those who advocate
and research computer-based translation tools.
But now at the end of the twentieth century it
is already apparent that MT and human translation
can and will co-exist in relative harmony. Those
skills which the human translator can contribute
will always be in demand.
Where
translation has to be of publishable quality,
both human translation and MT have their roles.
Machine translation is demonstrably cost-effective
for large scale and/or rapid translation of (boring)
technical documentation, (highly repetitive) software
localization manuals, and many other situations
where the costs of MT plus essential human preparation
and revision or the costs of using computerized
translation tools (workstations, etc.) are significantly
less than those of traditional human translation
with no computer aids. By contrast, the human
translator is (and will remain) unrivalled for
non-repetitive linguistically sophisticated texts
(e.g. in literature and law), and even for one-off
texts in specific highly-specialized technical
subjects.
For
the translation of texts where the quality of
output is much less important, machine translation
is often an ideal solution. For example, to produce
rough translations of scientific and technical
documents that may be read by only one person
who wants to merely find out the general content
and information and is unconcerned whether everything
is intelligible or not, and who is certainly not
deterred by stylistic awkwardness or grammatical
errors, MT will increasingly be the only answer.
In general, human translators are not prepared
(and may resent being asked) to produce such rough
translations. The only alternative to MT is no
translation at all.
However,
as already mentioned, greater familiarity with
crummy translations will inevitably stimulate
demand for the kind of good quality translations
which only human translators can satisfy.
For
the one-to-one interchange of information, there
will probably always be a role for the human translator,
e.g. for the translation of business correspondence
(particularly if the content is sensitive or legally
binding). But for the translation of personal
letters, MT systems are likely to be increasingly
used; and, for electronic mail and for the extraction
of information from Web pages and computer-based
information services, MT is the only feasible
solution.
As
for spoken translation, there must surely always
be a place for the human translator. There can
be no prospect of automatic translation replacing
the interpreter of diplomatic exchanges. While
we can envisage MT of speech in highly constrained
domains (e.g. telephone enquiries, banking transactions,
computer input, instructions to machinery) it
seems unlikely that spoken language translation
will extend into open-ended dynamic situations
of interpersonal communication.
Finally,
MT systems are opening up new areas where human
translation has never featured: the production
of draft versions for authors writing in a foreign
language, who need assistance in producing an
original text; the real-time on-line translation
of television subtitles; the translation of information
from databases; and, no doubt, more such new applications
will appear in the future as the global communication
networks expand and as the realistic usability
of MT (however poor in quality
compared with human translation) becomes familiar
to a wider public.
*
* *
In
the context of these recent developments and what
we may expect in the near future, the old FAHQT
vision as the main ultimate goal of MT activity
is no longer
appropriate – a new vision (or image, or aspiration)
is needed.
In
itself, the name ‘machine translation’
is misleading. For some, the word machine suggests
something old-fashioned: a steam-powered, or electrical
device, not an electronic computer. More seriously,
however, the word translation is misleading, since
what is involved is not translation as commonly
conceived; the word translation suggests
human-level performance. For the general public,
translation implies close translation (faithful
to the content and style of the original), indistinguishable
from native-language text produced by a good human
translator. Anything less is open to ridicule
and dismissed as ‘non-translation’.
This
being so, we should not even claim to be doing
translations. In fact, most MT vendors do now
stress that their products are aids, producing
texts that can be improved, and should be improved
if they are to be disseminated or published. Nevertheless,
it could be helpful if different terminology were
available. But what?
I
believe that we should stress the communicative
function and the status of systems as aids or
tools of bilingual or multilingual communication.
As a cover term for current and future systems
I suggest cross-language (or translingual)
communication aids. The emphasis should be
on research and development of tools for communication
between different languages, where there is a
wide range of different needs and where there
are different criteria for judging whether those
needs have been met, where the aim is to develop
tools and systems that are ‘useful’. The
usefulness
of a system or tool relates to its basic functions
and to its aims.
What
are the types of systems and aids available now
and foreseeable in thenear
future?
1.
traditional MT with batch processing where
the output quality is improved either by controlling
the input (pre-editing or controlled languages)
and/or by postediting (revision by human translators).
The context is that of dissemination of (usually
technical) documentation where good quality (‘close’)
translation is the desired end product, and where
human revision is economically acceptable. The
MT output
is computer-produced draft translation.
2.
the now traditional use of computer-based
translation aids, primarily by professional
users, e.g. bilingual dictionaries, terminology
management, translation memories, and in particular
translator workstations.
3.
aids for assimilating information/documents in
other languages (text assimilation aids or
‘gisting’ aids). This is the traditional
use of MT systems for intelligence/surveillance
work, for document filtering, and now for scanning
Web pages.
4.
aids for producing texts in another language (text
production aids). This is a more
recent development. It is represented in particular
by