The development and use of machine translation systems and computer-based translation tools
This
article in PDF-format
Download our List of 5,600+
Translation Agencies! No Recurring Membership Fees!
Advertisements:
Abstract:
This survey of the present demand
and use of computer-based translation software concentrates
on systems designed for the production of translations
of publishable quality, including developments in
controlled language systems, translator workstations,
and localisation; but it covers also the developments
of software for non-translators, in particular for
use with Web pages and other Internet applications,
and it looks at future needs and systems under development.
The final section compares the types of translations
that can be met most appropriately by human and by
machine (and computer-aided) translation respectively.
Keywords: machine
translation, computer-aided translation, translator
workstations, multilingual systems.
Types of translation demand
When giving any general overview of
the development and use of machine
translation (MT) systems and translation tools, it
is important to distinguish four basic
types of translation demand. The first, and traditional
one, is the demand for
translations of a quality normally expected from human
translators, i.e. translations of
publishable quality – whether actually printed and
sold, or whether distributed
internally within a company or organisation. The second
basic demand is for
translations at a somewhat lower level of quality
(and particularly in style), which are
intended for users who want to find out the essential
content of a particular document
– and generally, as quickly as possible. The third
type of demand is that for translation
between participants in one-to-one communication (telephone
or written
correspondence) or of an unscripted presentation (e.g.
diplomatic exchanges.) The
fourth area of application is for translation within
multilingual systems of information
retrieval, information extraction, database access,
etc.
The first type of demand illustrates
the use of MT for dissemination.
It has been satisfied, to some extent, by machine
translation systems ever since they were first developed
in the 1960s. However, MT systems produce output which
must invariably be revised or ‘post-edited’ by human
translators if it is to reach the quality required.
Sometimes such revision may be substantial, so that
in effect the MT system is producing a ‘draft’ translation.
As an alternative, the input text may be regularised
(or ‘controlled’ in vocabulary and sentence structure)
so that the MT system produces few errors which have
to be corrected. Some MT systems have, however, been
developed to deal with a very narrow range of text
content and language style, and these may require
little or no preparation or revision of texts.
In recent years, the use of MT systems for dissemination
purposes has been
augmented by developments in translation tools (e.g.
terminology databases and
translation memories), integrated in authoring and
publishing processes. These
‘translation workstations’ are more attractive to
human translators. Whereas, with MT
systems translators see themselves as subordinate
to the machine, in so far as they
edit, correct or re-translate the output from a computer,
with translation workstations
(or workbenches) the translators are in control of
computer-based facilities, which
they can accept or reject as they wish.
The second type of demand – the use
of MT for assimilation – has been
met in the past as, in effect, a by-product of systems
designed originally for the dissemination application.
Since MT systems did not (and still cannot) produce
high quality translations, some users have found that
they can extract what they needed to know from the
unedited output. They would rather have some translation,
however poor, than no translation at all. With the
coming of cheaper PC-based systems on the market,
this type of use has grown rapidly and substantially.
With the third type – MT for interchange
– the situation is changing quickly. The demand for
translations of electronic texts on the Internet,
such as Web pages, electronic mail and even electronic
‘chat’ lists, is developing rapidly. In this context,
the possibility of human translation is out of the
question. The need is for immediate translation in
order to convey the basic content of messages, however
poor the input. MT systems are finding a ‘natural’
role, since they can operate virtually or in fact
in real-time and on-line and there has been little
objection to the inevitable poor quality. Another
context for MT in personal interchange is the focus
of much research. This is the development of systems
for spoken language translation, e.g. in telephone
conversations and in business negotiations. The problems
of integrating speech recognition and automatic translation
are obviously formidable, but progress is nevertheless
being made. In the future – still distant, perhaps
– we may expect on-line MT systems for the translation
of speech in highly restricted domains.
The fourth type of MT application
– as components of information access
systems – is the integration of translation software
into: (i) systems for the search and retrieval of
full texts of documents from databases (generally
electronic versions of journal articles in science,
medicine and technology), or for the retrieval of
bibliographic information; (ii) systems for extracting
information (e.g. product details) from texts, in
particular from newspaper reports; (iii) systems for
summarising texts; and (iv) systems for interrogating
non-textual databases. This field is the focus of
a number of projects in Europe at the present time,
which have the aim of widening access for all members
of the European Union to sources of data and information
whatever the source language.
Historical background
2 Systems for automatic translation
have been under development for 50 years – in fact,
ever since the electronic computer was invented in
the 1940s there has been research on their application
for translating languages (Hutchins 1986). For many
years, the systems were based primarily on direct
translations via bilingual dictionaries, with relatively
little detailed analysis of syntactic structures.
By the 1980s, however, advances in computational linguistics
allowed much more sophisticated approaches, and a
number of systems adopted an indirect approach to
the task of translation. In these systems, texts of
the source language are analysed into abstract representations
of ‘meaning’, involving successive programs for identifying
word structure (morphology) and sentence structure
(syntax) and for resolving problems of ambiguity (semantics).
Included in the latter are component programs to distinguish
between homonyms (e.g. English words such as light,
which can be a noun, and adjective or verb, and solution,
which can be a mathematical or a chemical term) and
to recognise the correct semantic relationships (e.g.
in The driver of the bus with a yellow coat).
The abstract representations are intended to be unambiguous
and to provide the basis for the generation of texts
into one or more target languages. There have in fact
been two basic ‘indirect’ approaches. In one the abstract
representation is designed to be a kind of language-independent
‘interlingua’, which can potentially serve as an intermediary
between a large number of natural languages. Translation
is therefore in two basic stages: from the source
language into the interlingua, and from the interlingua
into the target language. In the other indirect approach
(in fact, more common approach) the representation
is converted first into an equivalent representation
for the target language. Thus there are three basic
stages: analysis of the input text into an abstract
source representation, transfer to an abstract target
representation, and generation into the output language.
Until the late 1980s, systems of all
these kinds were developed, and it is true
to say that all current commercially available systems
are also classifiable into these
three basic system types: direct, interlingual and
‘transfer’. The best known of the MT
systems for mainframe computers are in fact essentially
of the ‘direct translation’
type, e.g. the Systran, Logos and Fujitsu (Atlas)
systems. They are however improved
versions of the type; unlike their predecessors, they
are highly modular in
construction and easily modifiable and extendable.
In particular, the Systran system,
originally designed for translation only from Russian
into English, is now available
for a very large number of language pairs: English
from and into most European
languages (French, German, Italian, Spanish, Portuguese),
Japanese, Korean, etc.
Logos, originally marketed for German to English,
is also now available for other
languages: English into French, German, Italian and
Spanish, and German into French
and Italian. The Fujitsu ATLAS system, on the other
hand, is still confined to
translation between English and Japanese (in both
directions).
Among the most important of the mainframe
‘transfer’ systems was METAL,
supported for most of the 1980s by Siemens in Germany.
However, it was only at the
end of the decade that METAL came onto the market,
and sales were poor. During the
1990s, rights to METAL have been transferred to two
organisations (GMS and
LANT) in a complex arrangement. But the best known
systems adopting the ‘transfer’
approach were research projects: Ariane at GETA in
Grenoble, an MT project going
back to the 1960s, and Eurotra funded by the Commission
of the European
Communities. There were hopes that Ariane would become
the French national
system, and there were plans to incorporate it in
a translator’s workstation for
Eurolang (see below), but in the end nothing came
of them. As for Eurotra, it was
undoubtedly one of the most sophisticated systems,
but after involving some hundred
of researchers in most countries of Western Europe
for almost a decade, it failed to
produce the working system that the sponsors wanted.
It had been hoped that Eurotra
would eventually replace the Systran systems that
the Commission had acquired and
was developing internally. In the late 1980s, Japanese
governmental agencies began
to sponsor an interlingua system for Asian languages,
involving co-operation with
researchers in China, Thailand, Malaysia and Indonesia.
However, this project too has
so far not produced a system after a decade of work.
(For surveys of MT research and
development in 1980s and early 1990s see Hutchins
1993, 1994.)
Governmental and non-commercial
use
The earliest installations of MT systems were in national
and international
governmental and military translation services – primarily
because they could afford
the costs of the computer hardware required. The US
Air Force introduced Systran in
1970 for translating Russian military scientific and
technical documentation into
English. Although some documents were edited, much
of the output was passed to
recipients without revision; over 90% accuracy for
technical reports is claimed. The
National Air Intelligence Center, which took over
the service from the USAF, now
produces translations (many unedited) for a wide range
of US government
organisations (Pedtke 1997). As well as Russian-English
it has available systems from
Systran for translating Japanese, Chinese and Korean
into English, and under
development with Systran is a system for SerboCroat
into English.
In Europe, the largest translation
service is that of the European Commission,
and was one of the first organisations to install
MT. It began in 1976 with the Systran
system for translating from English into French. In
subsequent years, versions were
developed for many other language pairs, covering
the needs for translation among
the European Union languages. While the translation
of many legal texts continues to
be done by human translators, the Systran systems
are used increasingly not only for
the translation of internal documents (with or without
post-editing) but also as rough
versions for the assistance of administrators when
composing texts in non-native
languages (Senez 1996).
Production of technical documentation
Until the 1990s the normal assumption
was that MT systems were intended to
be used for the production of documentation of publishable
quality, primarily but not
exclusively of a scientific and technical nature.
The assumption was, in other words,
that MT systems were to be used in conditions where
otherwise human translators
would be employed with expertise in the subjects concerned.
Evidently, the actual
quality of MT output was inadequate for direct use.
It had to be extensively revised
before it could be published, and translators were
therefore employed as ‘posteditors’.
In these circumstances, the use of MT became a matter
of economics. It was
viable only if overall quality and speed could be
achieved at lower cost than the
employment of human translators.
Although today there are other uses
for MT, as we have already indicated, this
application remains the most important, particularly
for the vendors and developers of
the larger ‘mainframe’-type systems (Systran and Logos).
The main customers and
users are the multinational companies exporting equipment
in the global market
(Vasconcellos 1993; Brace et al. 1995). The need here
is for translation of
promotional and technical documentation. In the latter
case, technical documents are
often required in very large volumes: a set of operational
manuals for a single piece of
equipment may amount to several thousands of pages
Furthermore, there can be
frequent revisions with the appearance of new models.
In addition, there must be
consistency in translation: the same component must
be referred to and translated the
same way each time. This scale of technical translation
is well beyond human
capacity. Nevertheless, in order to be most cost-efficient,
a MT system should be well
integrated within the overall technical documentation
processes of the company: from
initial writing to final publishing and distribution.
Systems developed for the support
of technical writers – not just assistance with terminology,
but also on-line style
manuals and grammar aids – are now being linked seamlessly
into translation and
publishing processes.
There are numerous examples of the successful and
long-term use of MT
systems by multinationals for technical documentation.
One of the best known is the
application of the Logos systems at the Lexi-Tech
company in New Brunswick,
Canada; initially for the translation into French
of manuals for the maintenance of
naval frigates, the company has built up a service
undertaking many other large
translation projects. Also using Logos are Ericsson,
Osram, Océ Technologies, SAP
and Corel. Systran has many large clients: Ford, General
Motors, Aérospatiale,
Berlitz, Xerox, etc. The METAL German-English system
has been successfully used
at a number of European companies: Boehringer Ingelheim,
SAP, Philips, and the
Union Bank of Switzerland.
A pre-requisite for successful MT
installation in large companies is that the
user expects a large volume of translation within
a definable domain (subjects,
products, etc.) The financial commitment to a terminology
database and to dictionary
maintenance must be justifiable. Whether produced
automatically or not, it is
desirable for company documentation to be consistent
in the use of terminology.
Many companies in fact insist upon their own use of
terms, and will not accept the
usage of others. To maintain such consistency is almost
impossible outside an
automated system. However, it does mean that before
an MT system can be installed,
the user must have already available a well-founded
terminological database, with
authorised translation equivalents in the languages
involved, or – at least – must make
a commitment to develop the required term bank.
For similar reasons, it is often desirable
if the MT system is to produce output
in more that one target language. Most large-scale
MT systems have to be customised,
to a greater or lesser extent, for the kind of language
found in the types of documents
produced in a specific company. This can be the addition
of specific grammatical
rules to deal with frequent sentence and clause constructions,
as well as the inclusion
of specific rules for dealing with lexical items,
and not just those terms unique to the
company. The amount of work involved in such customisation
may not be justifiable
unless output is in a number of different languages.
Controlled language and domain-specific
systems
In these circumstances, however, it has often been
found feasible to introduce
a greater degree of control. One of the earliest and
best known examples is the
application of the Systran system by the Xerox Corporation.
At Xerox technical
authors are obliged to compose documents in what is
called Multinational Customized
English, where not only the use of specific terms
is laid down but also the
construction of sentences (Elliston 1979). The advantages
of this approach are: the
avoidance of ambiguities in the input which the MT
system cannot deal with
adequately, the consequential better quality output,
the faster production of technical
documents simultaneously in a number of different
languages, and (not least) the
production of more easily comprehensible English documents.
These advantages have
been recognised by other multinational companies,
and the use of ‘controlled
languages’ is increasing: for example, the Caterpillar
Corporation has devised its own
form of English to facilitate translation in a knowledge-based
MT system being
developed for it at the Carnegie-Mellon University
(Mitamura and Nyberg 1995).
There are some companies offering to build ‘controlled’
language MT systems for
specific clients. The oldest established – and the
pioneer in this approach – is the
Smart Corporation, New York. Systems have been developed
by Smart for a number
of major clients: Citicorp, Chase, Ford, General Electric,
etc. Each incorporates a
system for ‘normalising’ English documents. This system
component is considered so
crucial to success that the actual translation process
is regarded as virtually a ‘byproduct’
(Lee 1994). There are Smart systems translating into
French, German,
Greek, Italian, Japanese, and Spanish. The largest
Smart installation, perhaps, is the
system designed for the Canadian Ministry of Employment,
where it has been used
for many years to translate information about job
advertisements and similar
documentation.
In Europe, the Cap Volmac company
in the Netherlands and the LANT
company in Belgium offer similar services, building
for various clients specialised
translation systems utilising their own software for
controlled languages. Cap Volmac
Lingware Services is a Dutch subsidiary of the Cap
Gemini Sogeti Group. Over the
years this software company has constructed controlled-language
systems for textile
and insurance companies, mainly from Dutch to English
(Van der Steen and
Dijenborgh 1992). However, possibly the best known
success story for custom-built
MT is the PaTrans system developed for LingTech A/S
to translate English patents
into Danish. The system is based on methods and experience
gained from the Eurotra
project of the European Commission (Ørsnes
et al. 1996)
These last examples of systems illustrate
that a growing number of companies
and organisations are developing their own MT facilities,
as opposed to purchasing
commercial systems. This has been a feature from early
days. The successful Météo
system in Canada for translating weather forecasts
from English into French (and later
from French into English) was effectively a customer-specific
system – in this case
the Canadian Environment service. It may be noted
that a variant of the Météo
software was successfully operated during the Olympic
games in Atlanta (Chandioux
and Grimaila 1996). Météo is an example
of a ‘sublanguage’ system, i.e. designed for
to deal with the particular language of meteorology.
Another example of a custom-built
system is TITUS, a highly constrained
‘sublanguage’ system for translating abstracts of
documents of the textile industry
from and into English, French, German, and Spanish,
in regular use since 1970. Better
known are the two customer-specific systems for translating
between English and
Spanish built at the Pan American Health Organization
in Washington – designed and
developed by workers in the organisation itself. These
highly successful systems (now
also available to users outside PAHO) are general-purpose
systems, not constrained in
vocabulary or text type, although obviously the dictionaries
are strongest in the
health-related social science fields (Leon and Aymerich
1997).
In the 1990s there have been a number
of other examples. In Finland, the
Kielikone system was developed originally as a workstation
for Nokia
Telecommunications. Subsequently, versions were installed
at other Finnish
companies and the system is now being marketed more
widely (Arnola 1996). A
similar story applies to GSI-Erli. This large language
engineering company developed
an integrated in-house translation system combining
a MT engine and various
translation aids and tools on a common platform AlethTrad.
Recently it has been
making the system available in customised versions
for outside clients (Humphreys
1996).
On a smaller scale, but equally successful,
has been the system developed by
the translation service of a small British company
Hook and Hatton. In this case, the
need was for translation of chemical texts from Dutch
into English (Lewis 1997). The
designer began by simple pattern matching of phrases,
and gradually built in more
syntactic analysis as and when results were justifiable
and cost-effective.
Based on experience over many years
in developing knowledge-based MT and
experimenting with speech translation and corpus-based
methods, members of the
group at Carnegie-Mellon University have developed
an architecture for the rapid
production of usable MT systems for specific clients
in some less common languages,
such as SerboCroat and Haitian Creole (Frederking
et al. 1997). There is no pretence
of high quality, merely ‘usefulness’ for languages
otherwise inaccessible.
Another example of custom-built MT
in a specialised area is the program
developed for TCC Communications at the Simon Fraser
University for translating
closed captions on television programs (Toole et al.
1998). Not only are there time
constraints – translation must be in real-time – but
also there are the challenges of
colloquialisms, dialogue, robustness, and paucity
of context indicators. The system, at
present running live for English into Spanish, demanded
techniques otherwise found
mainly in Internet applications (see below.)
In Japan, there are further examples
of custom-built systems. The Japan
Information Centre of Science and Technology translates
abstracts of Japanese
scientific and technical articles into English. In
the late 1980s it assumed
responsibility of the Mu Japanese-English MT system
developed at the University of
Kyoto. From this, it now has one of the largest MT
operations in Japan (O’Neill-
Brown 1996). Other custom-built systems of significance
in Japan are the SHALT
system developed by IBM Japan for its own translation
needs, the ARGO system
developed by CSK in Tokyo for translating Japanese
stock market reports into
English, and the NHK system for translating English
news articles into Japanese.
Translation
workstations
In the 1990s, the possibilities
for large-scale translation broadened with the appearance
on the market of translation workstations (or translator
workbenches). The original ideas for integrating
various computer-based facilities for translators
at one place go back to the early 1980s, in particular
with the systems from ALPS. Translation workstations
combine multilingual word processing, means of receiving
and sending electronic documents, OCR facilities,
terminology management software, facilities for
concordancing, and in particular ‘translation memories’.
The latter is a facility that enables translators
to store original texts and their translated versions
side by side, i.e. so that corresponding sentences
of the source and target are aligned. The translator
can thus search for a phrase or even full sentence
in one language in the translation memory and have
displayed corresponding phrases in the other language.
These may be either exact matches or approximations
ranked according to closeness.
It is often the case
in large companies that technical documents, manuals,
etc. undergo numerous revisions. Large parts may
remain unchanged from one version to the next. With
a translation memory, the translator can locate
and re-use already translated sections. Even if
there is not an exact match, the versions displayed
may usable with minor changes. There will also be
access to terminology databases, in particular company-specific
terminology, for words or phrases not found in the
translation memory. In addition, many translator
workstations are now offering full automatic translations
using MT systems such as Systran, Logos, and Transcend.
The translator can choose to use them either for
the whole text or for selected sentences, and can
accept or reject the results as appropriate (Heyn
1997).
It is often the case
in large companies that technical documents, manuals,
etc.
undergo numerous revisions. Large parts may remain
unchanged from one version to
the next. With a translation memory, the translator
can locate and re-use already
translated sections. Even if there is not an exact
match, the versions displayed may
usable with minor changes. There will also be access
to terminology databases, in
particular company-specific terminology, for words
or phrases not found in the
translation memory. In addition, many translator
workstations are now offering full
automatic translations using MT systems such as
Systran, Logos, and Transcend. The
translator can choose to use them either for the
whole text or for selected sentences,
and can accept or reject the results as appropriate
(Heyn 1997).
Although widely used
by administrators within the European Commission,
the full-scale MT system Systran is relatively little
used by the Commission’s professional translators.
For them, the translation service is developing
its own workstation facility, EURAMIS, i.e. European
Advanced Multilingual Information System (Theologitis
1997). This combines access to the Commission’s
own very large multilingual database (Eurodicautom),
the dictionary resources of Systran, facilities
for individual and group terminology database creation
and maintenance (using Trados’ MultiTerm software),
translation memory (again for individuals and groups),
access to CELEX (the full-text database of European
Union legislation and directives), software for
document comparison (to detect where changes have
taken place), and also, of course, access to the
Systran MT systems themselves. The latter are now
available from English into Dutch, French, German,
Greek, Italian, Portuguese, and Spanish; from French
into Dutch, English, German, Italian, and Spanish;
from Spanish into English and French; and from German
into English and French. The whole EURAMIS system
is linked to other facilities such as authoring
tools (spelling, grammar and style checkers, and
multilingual drafting aids), the internal European
Commission administrative network, and to outside
resources on the Internet.
Localisation
of software
One of the fastest
growing areas for the use of computers in translation
is in the industry of software localisation. Here
the demand is for supporting documentation to be
available in many languages at the time of the launch
of new software. Translation has to be done quickly,
but there is much repetition of information from
one version to another. MT and, more recently, translation
memories in translation workstations are the obvious
solution (Schaeler 1996). Among the first in this
field was the large software company SAP AG in Germany.
They use two MT systems: METAL for German to English
translation, and Logos for English to French, and
plan to introduce further systems for other language
pairs.
Most localisation,
however, is based on the translation memory and
workstation approach. Typical examples are Corel,
Lotus, and Canon. It is interesting
to note that much of this localisation activity
is based in Ireland – thanks to earlier
government and European Union support for the computer
industry. However,
localisation is a multi-national and global industry,
with its own organisation
(Localization Industry Standards Association, based
in Geneva) holding regular
seminars and conferences in all continents (For
details see LISA Forum Newsletter)
Localisation companies
have been at the forefront of efforts in Europe
to
define standardised lexical resource and text handling
formats, and to develop
common network infrastructures. This is the OTELO
project coordinated by Lotus in
Ireland, with other members such as SAP, Logos,
and GMS. The need for a general
translation environment for a wide variety of translation
memory, machine translation
and other productivity tools is seen as fundamental
to the future success of companies
in the localisation industry.
Systems for
personal computers
Software for personal
computers began to appear in the early 1980s (with
the
Weidner MicroCAT system becoming particularly successful).
Nearly all the main
Japanese computer companies produced systems for
translation to and from English,
e.g. the PIVOT system from NEC, the ASTRANSAC system
from Toshiba, HICATS
from Hitachi, PENSEE from Oki and DUET from Sharp.
Outside Japan, systems
for personal computers began to appear a little
earlier,
but from relatively few companies. The first American
systems came in the early
1980s from ALPS and from Weidner. The ALPS products
were intended primarily as
aids for translation, providing tools for accessing
and creating terminology resources
but they did include an interactive translation
module. Although at first sold with
some success, the producers concluded by the end
of the decade that the market was
not yet ready and the products were in effect withdrawn.
Instead, ALPS turned itself
into a translation service (ALPNET), using its own
tools internally. By contrast,
Weidner sold a full translation system in a growing
number of language pairs
(English, French, German, Spanish), and the business
flourished. Weidner produced
two versions of its systems: MicroCat for small
personal computers, and MacroCat
for larger minicomputers or workstations. The company
was then purchased by a
Japanese company Bravis, a Japanese version was
marketed, but soon afterwards the
owner decided that the MT market for personal computers
was still undeveloped and
the business was sold. MicroCat disappeared completely,
but MacroCat was
purchased by Intergraph, who modified and developed
it for its range of publishing
software and sold it later as Transcend – recently
Transcend was acquired by
Transparent Language Inc. (For these developments
see Hutchins 1993, 1994).
At the end of the 1980s,
most of the commercial systems on the market today
appeared. First came the PC-Translator systems (from
Linguistic Products, based in
Texas) for low-end personal computers. Over the
years, many language pairs have
been produced and marketed, apparently successfully
as far as sales are concerned.
Next came Globalink with systems for French, German
and Spanish to and from
English. (There was also a Russian-English system
deriving essentially from the
original owner’s experience on the 1960s Georgetown
project.) Within a few years,
Globalink merged with MicroTac, a company which
had been very successful in
selling its cheap Language Assistant series of PC
software – essentially automatic
dictionaries, with minimal phrase translation facility.
In the early 1990s, Globalink
produced its now well-known ‘Power Translator’ series
for translation of English to
and from French, German and Spanish, and recently
Globalink has marketed the more
advanced ‘Telegraph’ series of translation software
products, and Globalink itself was
acquired by Lernout & Hauspie, a leading speech
technology company.
Since the beginning
of the 1990s, many other systems for personal computers
have appeared. For Japanese and English there are
now also LogoVista from the
Language Engineering Corporation, and Tsunami and
Typhoon from Neocor
Technologies (also now owned by Lernout & Hauspie).
From the former Soviet
Union – where particularly in the 1960s and 1970s
there was very active research on
MT – we have now Stylus (recently renamed ProMT)
and PARS, both marketing
systems for Russian and English translation; Stylus
also for French, and PARS also
for Ukrainian. Other PC-based systems from Europe
include: Hypertrans for
translating between Italian and English; the Winger
system for Danish-English,
French-English and English-Spanish, now also marketed
in North America; and
TranSmart, the commercial version of the Kielikone
system, for Finnish-English
translation.
Vendors of older mainframe
systems (Systran, Fujitsu, Metal, Logos) are
being obliged to compete by downsizing their systems;
many have done so with
success, managing to retain most features of their
mainframe products in the PCbased
versions. Systran Pro and Systran Classic, for example,
are Windows-based
versions of the successful system developed since
the 1960s for clients worldwide in a
large range of languages; the large dictionary databases
offered by Systran give these
systems clear advantages over most other PC products.
Both Systran Classic (for
home use) and Systran Pro (for use by translators)
are now sold for under a five
hundred dollars in many language pairs: English-French,
English-German, English-
Spanish; and for English to Italian and Japanese
to English. The publishing company
Langenscheidt acquired rights to sell a version
of METAL, in collaboration with
GMS (Gesellschaft für Multilinguale Systeme, now
owned by Lernout & Hauspie) –
the system is called ‘Langenscheidt T1’ and offers
various versions for German and
English translation. Also from Germany is the Personal
Translator, a joint product of
IBM and von Rheinbaben & Busch, based on the
LMT (i.e. Logic-Programming
based Machine Translation) transfer-based system
under development since 1985.
LMT itself is available as a MT component for the
IBM TranslationManager. Both
Langenscheidt T1 and the Personal Translator are
intended primarily for the nonprofessional
translator, competing therefore with Globalink and
similar products. (For
these developments see proceedings of MT conferences:
AMTA, EAMT, MT
Summit, and MT News International.)
Sales of commercial
PC translation software have shown a dramatic rise.
There are now estimated to be some 1000 different
MT packages on sale (when each
language pair is counted separately.) The products
of one vendor (Globalink) are
present in at least 6000 stores in North America
alone; and in Japan one system
(Korya Eiwa from Catena, for English-Japanese translation)
is said to have sold over
100,000 copies in its first year on the market.
Though it is difficult to establish how
much of the software purchased is regularly used
(some cynics claim that only a very
small proportion is tried out more than once), there
is no doubting the growing
volume of ‘occasional’ translation, i.e. by people
from all backgrounds wanting
renderings of foreign texts in their own language,
or wanting to communicate in
writing with others in other languages, and who
are not deterred by poor quality. It is
this latent market for low-quality translation,
untapped until very recently, which is
now being discovered and which is contributing to
massive increases in sales of
translation software.
MT on the Internet
At the same time, many
MT vendors have been providing network-based
translation services for on-demand translation,
with human revision as optional extras.
In some cases these are client-server arrangements
for regular users; in other cases,
the service is provided on a trial basis, enabling
companies to discover whether MT is
worthwhile for their particular circumstances and
in what form. Such services are
provided, for example, by Systran, Logos, Globalink,
Fujitsu, JICST and NEC.
Some companies have
now been set up primarily for this purpose: LANT
in Belgium is a major example, based on its rights
to develop the METAL system and on the Eurolang
Optimizer, which it also markets (Caeyers 1997).
Its speciality is the customisation of controlled
languages for use with its MT and translation memory
systems. In late 1997 it launched its multilingual
service for the translation of electronic mail,
Web pages and attached files. And in Singapore,
there is MTSU (Machine Translation Service Unit
of the Institute of Systems Science, National University
of Singapore), using its own locally-developed systems
for translation from English into Chinese, Malay,
Japanese and Korean (with Chinese its main strength)
and with editing by professional translators. The
service is providing large scale translation over
the Internet for many customers world wide (mainly
multinational organisations), and including much
of the localisation needs for software companies
in the Chinese-language markets (LISA Forum
Newsletter 4(3), August 1995, p.12.)
A further sign of the
influence of Internet is the growing number of MT
software products for translating Web pages. Japanese
companies have led the way:
nearly all the companies mentioned above have a
product on this lucrative market;
they have been followed quickly elsewhere (e.g.
by Systran, Globalink, Transparent
Language, LogoVista). As well as PC software for
translating Web pages, we are now
seeing Internet services adding translation facilities:
the most recent example is the
availability on AltaVista of versions of Systran
for translating French, German and
Spanish into and from English – with what success
or user satisfaction it is too early
to say (Yang and Lange 1998).
Equally significant
has been the use of MT for electronic mail and for
‘chat
rooms’. Two years ago CompuServe introduced a trial
service based on the Transcend
system for users of the MacCIM Support Forum. Six
months later, the World
Community Forum began to use MT for translating
conversational e-mail. Usage has
rocketed (Flanagan 1996). Most recently, CompuServe
introduced its own translation
service for longer documents either as unedited
‘raw’ MT or with optional human
editing. Soon CompuServe will offer MT as a standard
for all its e-mail. As for
Internet chat, Globalink has joined with Uni-Verse
to provide a multilingual service.
The use is not simple
curiosity, although that is how it often begins.
CompuServe records a high percentage of repeat large-volume
users for its service,
about 85% for unedited MT – a much higher percentage
than might have been
expected. It seems that most is used for assimilation
of information, where poorer
quality is acceptable. The crucial point is that
customers are prepared to pay for the
product – and CompuServe is inundated with complaints
if the MT service goes
down!
It is clear that the
potential for MT on, via and for the Internet is
now being
fully appreciated – no company can afford to be
left behind, and all the major players
have ambitious plans, e.g. Lernout & Hauspie
(McLaughlin and Schwall 1998), which
has now acquired MT systems from Globalink, Neocor
and AppTek as well as the old
METAL system (from GMS).
Future needs
and developments
Despite the recent
growth of systems for personal computers and of
Internet
services, it is still true to say that there is
nothing yet really suitable for the
independent professional translator, i.e. for those
not working for large companies or
in translation organizations. It is known that some
translators have tried to apply
commercial PC-based software to their needs, but
the amount of adaptation required
and the generally poor output has made them unsatisfactory
and uneconomic. More
suitable for the independent translator would be
a cost-effective translation
workstation. However, current workstations on the
market are still too expensive for
the individual translator. Although there is promise
of low-cost computer tools for this
potentially large market – e.g. terminology and
concordancing software, and perhaps
alignment software – there is no doubt that this
segment is not being covered as well
as many other areas.
Another area at present
poorly served is the need for reliable but low-cost
translation of documents into unknown foreign languages
where users do not want to
engage expert bilingual translators. There is no
problem with translation into
recipients’ own languages – PC systems can give
adequate ‘rough’ versions for users
to get some idea of the basic message – but for
translation into an unknown language
there are still no solutions. There have been recently
some cheap Japanese products
which serve this specific ‘foreign language authoring’
demand in the case of writing
business letters (based on standard phrases and
document templates), but for other
areas and for longer documents, where there is less
‘stereotyping’, there is nothing as
yet. For translation into another language unknown
(or poorly known) by the sender,
what is really required is software which can be
relied upon to provide good quality
output (and most PC products are not good enough).
A number of research groups are
investigating interactive systems, where the sender
composes an MT-friendly version
of a letter or document in collaboration with the
computer. With a sufficiently
‘normalised’ input text, the MT system can guarantee
grammatically and stylistically
correct output. As yet, however, this work (e.g.
at GETA in France) is still at the
laboratory stage (Boitet and Blanchon 1995).
The same is true for
software combining MT with information access,
information extraction, and summarisation software.
There are no commercial
systems yet on the market; developments are still
at the research stages. The potential
and the demand has been recognised: for example,
in recent years, most research
funds of the European Union have been focused not
on MT or ‘pure’ natural language
processing (as it was during the 1980s), but on
projects for multilingual tools with
direct applications in mind; many involve translation
of some kind, usually within a
restricted subject field and often in controlled
conditions (Hutchins 1996; Schütz
1996). As just one example, the AVENTINUS project
is developing a system for
police forces in the area of drug control and law
enforcement: information about
drugs, criminals and suspects will be available
on databases accessible in any of the
European Union languages.
There is growing interest
in such multilingual applications worldwide. The
application that has received most attention has
been ‘cross-language information
retrieval’, i.e. software enabling users to search
foreign language databases in their
own languages. As yet most work has focussed on
the construction and operation of
appropriate translation dictionaries, for the matching
of query words against words or
phrases in document databases (Bian and Chen 1998,
Oard 1998) – although the
provision of software for fast translation of original
texts into the enquirer’s own
language is naturally also envisaged (McCarley and
Roukos 1998). Clearly it will not
be long before commercial software is available
for this application.
The future application
that is probably most desired by the general public
is
the translation of spoken language. But, from a
commercial (and even research)
perspective, the prospects for automatic speech
translation are still distant (Krauwer et
al. 1997). It was only in the 1980s that developments
in speech recognition and
synthesis made spoken language translation a feasible
objective. In Japan a joint
government and industry company ATR was established
in 1986 near Osaka, and it is
now one of the main centres for automatic speech
translation. The aim is to develop a
speaker-independent real-time telephone translation
system for Japanese to English
and vice versa, initially for hotel reservation
and conference registration transactions.
Other speech translation projects have been set
up subsequently. The JANUS system
is a research project at Carnegie-Mellon University
and at Karlsruhe in Germany. The
researchers are collaborating with ATR in a consortium
(C-STAR), each developing
speech recognition and synthesis modules for their
own languages (English, German,
Japanese). (One by-product of this research was
mentioned earlier: the rapiddeployment
project for custom-built systems in less-common
languages.) The fourth
major effort in speech translation is the long-term
VERBMOBIL project funded by
the German Ministry for Research and Technology
which began in May 1993. The
aim is a portable aid for business negotiations
as a supplement to users’ own
knowledge of the languages (German, Japanese, English).
Numerous German
university groups are involved in fundamental research
on dialogue linguistics, speech
recognition and MT design; a prototype is nearing
completion, and a demonstration
product is targeted for early in the next century.
Speech translation
is probably at present the most innovative area
of
computer-based translation research, and it is attracting
most funding and the most
publicity. However, few experienced observers expect
dramatic developments in this
area in the near future – the development of MT
for written language has taken many
years to reach the present stage of widespread practical
use in multinational
companies, a wide range of PC based products of
variable quality and application,
growing use on networks and for electronic mail.
Despite today’s high profile for
written-language MT, researchers know that there
is still much to be done to improve
quality. Spoken-language MT has not yet reached
even the stage of real-time testing
in non-laboratory settings.
Comparison
of human and machine translation
From this survey it
should be apparent that the application of computers
to the
task of translating natural languages has not been
and is unlikely to be a threat to the
livelihood of professional translators. Those skills
which the human translator can
contribute will continue always to be in demand.
There is no prospect, for example,
that machine translation could ever attempt the
translation of literary or legal texts. By
contrast, for the rough translation of electronic
texts on the Internet there is no rivalry
for machine translation – human translators cannot
compete in terms of speed, even if
they were prepared to undertake poor quality translation
of ephemeral material.
We may compare the
relative merits of human and machine translation
according to the categories of need and use outlined
at the beginning of this paper. As
far as the dissemination function (production of
publishable translations) is
concerned, human translation is more satisfactory
and less costly overall whenever it
is a question of translating one particular text
in a unique subject domain (whether
scientific, technical, medical, legal or literary).
Machine translation demands the
costly investment of dictionary maintenance and
updating and the costly involvement
of post-editing. This can be justifiable (i.e. cost-effective)
only when large volumes of
documentation within a particular domain are being
translated. It is even more
justifiable if translation is into more than one
target language (when pre-editing and/or
vocabulary and grammar control of original texts
is possible), and when there is
considerable repetition. For such tasks, the human
translator would be overwhelmed
by the scale of the task, by the boring repetitiveness
and by the need to maintain
terminological consistency. By contrast, the computer
can handle large volumes and
can automatically maintain consistency. In brief,
machine translation is ideal for large
scale and/or rapid translation of (boring) technical
documentation, (highly repetitive)
software localisation manuals, and real-time translation
of weather reports. The
human translator is (and will remain) unrivalled
for non-repetitive linguistically
sophisticated texts (e.g. in literature and law).
For the translation
of texts for assimilation, where the quality of
output can be
poorer than that for texts to be published, it is
clear that machine translation is an ideal
solution. Human translators are not prepared (and
resent being asked) to produce
‘rough’ translations of scientific and technical
documents that may be read by only
one person who wants to merely find out the general
content and information and is
unconcerned whether everything is intelligible or
not, and who is certainly not
deterred by stylistic awkwardness or grammatical
errors. Of course, they might prefer
to have output better than that presently provided
by most MT systems, but if the only
alternative option is no translation at all then
machine translation is fully acceptable.
For the interchange
of information, there may still in the future continue
to be
a role for the human translator in the translation
of business correspondence
(particularly if the content is sensitive or legally
binding). But for the translation of
personal letters, MT systems are likely to be increasingly
used; and, for electronic
mail and for the extraction of information from
Web pages and computer-based
information services, MT is the only feasible solution.
For spoken translation,
by contrast, there will be a continuing market for
the
human translator. There is surely no prospect of
automatic translation replacing the
interpreter of diplomatic and business exchanges.
Although there has been research on
the computer translation of telephone enquiries
within highly constrained domains,
and future implementation can be envisaged in this
area, for the bulk of telephone
communication there is unlikely to ever be any substitute
for human translation.
Finally, MT systems
are opening up new areas where human translation
has
never featured: the production of draft versions
for authors writing in a foreign
language, who need assistance in producing an original
text; the on-line translation of
television subtitles, the translation of information
from databases; and no doubt, more
such new applications will appear in the future.
In these areas, as in others mentioned,
there is no threat to the human translator because
they were never included in the
sphere of professional translation. There is no
doubt that MT and human translation
can and will co-exist in harmony and without conflict.
References:
AMTA (1996): Expanding
MT horizons: Proceedings of the Second Conference
of the Association for Machine Translation in the
Americas, 2-5 October 1996, Montreal, Canada.
[Washington, D.C.: AMTA]
AMTA (1998): Machine
translation and the information soup: Third Conference
of the Association for Machine Translation in the
Americas… Langhorne, PA, USA, October 1998: Proceedings,
ed. D.Farwell, L.Gerber [and] E.Hovy. Berlin: Springer-Verlag,
1998. (Lecture Notes in Artificial Intelligence
1529)
Arnola, H. (1996):
“Kielikone machine translation technology and its
perspective on the economics of machine translation.”
In: EAMT Workshop (1996), 73-88.
Bian, G.W. and Chen,
H.H. (1998): “Integrating query translation and
document translation in a cross-language information
retrieval system.” In: AMTA (1998), 250-265.
Boitet, C. and Blanchon,
H. (1995): “Multilingual dialogue-based MT for monolingual
authors: the LIDIA project and a first mockup.”
Machine Translation 9(2), 99-132.
Brace, C., Vasconcellos,
M. and Miller, L.C. (1995): “MT users and usage:
Europe and the Americas.” MT News International
12 (October 1995), 14-19.
Caeyers, H. (1997):
“Presentation of LANT technology.” In: MT Summit
(1997), 253-254.
Chandioux, J. and Grimaila,
A. (1996): “'Specialized' machine translation.”
In: AMTA (1996), 206-211.
EAMT Workshop (1996):
EAMT Machine Translation Workshop… Vienna, Austria,
29-30 August 1996. Proceedings. [Geneva: EAMT].
EAMT Workshop (1997):
Language technology in your organization?
1997 EAMT Workshop, Copenhagen, 21-22 May 1997.
Proceedings. [Geneva: EAMT]
Elliston, J.S.G. (1979)
“Computer-aided translation – a business viewpoint.”
Translating and the Computer: proceedings of
a seminar, London, 14th November 1978; ed.
B.M.Snell (Amsterdam: North-Holland), 149-158.
Flanagan, M. (1996):
“Two years online: experiences, challenges, and
trends.” In: AMTA (1996), 192-197.
Frederking, R.E., Brown,
R.D. and Hogan, C. (1998): “The DIPLOMAT rapiddeployment
speech MT system.” In: MT Summit (1997), 261-262.
Heyn, M. (1996): “Integrating
machine translation into translation memory systems.”
In: EAMT Workshop (1996), 111-124.
Humphreys, L. (1996)
“Use of linguistic resources like translation memories
in machine translation systems.” In: EAMT Workshop
(1996), 101-110.
Hutchins, W.J. (1986):
Machine translation: past, present, future.
Chichester: Ellis Horwood.
Hutchins, W.J. (1993):
“Latest developments in machine translation technology:
beginning a new era in MT research.” In: MT Summit
(1993), 11-34.
Hutchins, W.J. (1994):
“Research methods and system designs in machine
translation: a ten-year review, 1984-1994.” Machine
Translation: Ten Years On. Proceedings of the second
international conference…held at Cranfield University,
England, 12-14 November 1994…ed. Douglas Clarke
and Alfred Vella. Cranfield: Cranfield University,
1998. Ch.4.
Hutchins, W.J. (1996):
“The state of machine translation in Europe.” In:
AMTA (1996), 198-205.
Krauwer, S. et al.
(1997), eds.: Spoken language translation.
Proceedings of a Workshop sponsored by the Association
of Computational Linguistics and by the European
Network in Language and Speech, 11 July 1997… Madrid,
Spain. [Somerset, N.J.: ACL].
Lee, A. (1994); “Controlled
English with and without machine translation.” Aslib
Proceedings 46(5), 131-133.
Leon, M. and Aymerich
(1997): “PAHO systems: SPANAM and ENGSPAN.” In:
MT Summit (1997), 259-260.
Lewis, T. (1997) “MT
as a commercial service: three case studies.” In:
MT Summit (1997), 219-223.
McCarley, J.S. and
Roukos, S. (1998): “Fast document translation for
cross-language information retrieval.” In: AMTA
(1998), 150-157.
McLaughlin, S. and
Schwall, U. (1998): “Spicing up the information
soup: machine translation and the Internet.” In:
AMTA (1998), 384-397.
Mitamura, T. and Nyberg,
E.H. (1995): “Controlled English for knowledge-based
MT: experience with the KANT system.” TMI-95: Proceedings
of the Sixth International Conference on Theoretical
and Methodological Issues in Machine Translation,
July 5-7, 1995, Leuven, Belgium; 158-172.
MT Summit (1993): MT
Summit IV: international cooperation for global
communication, July 20-22, 1993, Kobe, Japan.
[Tokyo: AAMT.]
MT Summit (1995): MT
Summit V, Luxembourg, July 10-13, 1995. Proceedings.
[Brussels: SEMA.]
MT Summit (1997): MT
Summit VI: past, present, future. 29 October – 1
November 1997, San Diego, California. Proceedings,
ed. By V.Teller and B.Sundheim. Washington, D.C.:
Association for Machine Translation in the Americas.
Oard, D.W. (1998):
“A comparative study of query and document translation
for cross-language information retrieval.” In: AMTA
(1998), 472-483.
O’Neill-Brown, P. (1996):
“JICST Japanese-English machine translation system.”
In: AMTA (1996), 257-260.
Ørsnes, B.,
Music, B. and Maegaard, B. (1996): “PaTrans – a
patent translation system.” Coling 96.
Proceedings… Copenhagen August 1996, 1115-1118.
Pedtke, T.R. (1997):
“U.S. government support and use of machine translation:
current status.” In: MT Summit (1997), 3-13.
Schaeler, R. (1996):
“Machine translation, translation memories and the
phrasal lexicon: the localisation perspective.”
In: EAMT Workshop (1996), 21-33.
Schütz, J. (1996):
“European research and development in machine translation.”
MT News International 15 (October 1996),
8-11.
Senez, D. (1996): “Machine
translation feasibility study at the European Commission.”
In: EAMT Workshop (1996), 63-70.
Theologitis, D. (1997):
“EURAMIS, the platform of the EC translator.” In:
EAMT Workshop (1997), 17-32.
Toole, J., Turcato,
D., Popowich, F., Fass, D. and McFetridge, P. (1998):
“Timeconstrained machine translation.” In: AMTA
(1998), 103-112.
Van der Steen, G. and
Dijenborgh, B.J. (1992): “Online collection and
translation of industrial texts.” Translating
and the Computer 14… 10-11 November 1992, London.
(London: Aslib), 135-164.
Vasconcellos, M. (1993):
“The present state of machine translation usage
technology; or: How do I use thee? Let me count
the ways.” In: MT Summit (1993), 47-62.
Yang, J. and Lange,
E. (1998): “SYSTRAN on AltaVista: a user study on
real-time machine translation on the Internet.”
In: AMTA (1998), 275-285.
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!
|