History of machine translation
By Wikipedia,
the free encyclopedia,
http://en.wikipedia.org/wiki/History_of_machine_translation
Become a Member of
TranslationDirectory.com at Just 4 EUR/Month
(Paid Yearly)
Advertisements:
The history of machine translation generally starts
in the 1950s, although work can be found from earlier periods.
The Georgetown
experiment in 1954 involved fully automatic translation
of more than sixty Russian sentences into English. The experiment
was a great success and ushered in an era of significant
funding for machine translation research. The authors claimed
that within three or five years, machine translation would
be a solved problem.[1]
However, the real progress was much slower, and after the
ALPAC
report in 1966, which found that the ten years long
research had failed to fulfill the expectations, the funding
was dramatically reduced. Starting in the late 1980s, as
computational power increased and became less expensive,
more interest began to be shown in statistical
models for machine translation.
Today there is still no system that provides the holy-grail
of "fully automatic high quality translation" (FAHQT). However,
there are many programs now available that are capable of
providing useful output within strict constraints; several
of them are available online, such as Google
Translate and the SYSTRAN
system which powers AltaVista's BabelFish.
The beginning
The history of machine translation dates back to the seventeenth
century, when philosophers such as Leibniz
and Descartes
put forward proposals for codes which would relate words
between languages. All of these proposals remained theoretical,
and none resulted in the development of an actual machine.
The first patents for "translating machines" were applied
for in the mid 1930s. One proposal, by Georges Artsrouni
was simply an automatic bilingual dictionary using paper
tape. The other proposal, by Peter
Troyanskii, a Russian,
was more detailed. It included both the bilingual dictionary,
and a method for dealing with grammatical roles between
languages, based on Esperanto.
The system was split up into three stages: the first was
for a native-speaking editor in the sources language to
organise the words into their logical forms and syntactic
functions; the second was for the machine to "translate"
these forms into the target language; and the third was
for a native-speaking editor in the target language to normalise
this output. His scheme remained unknown until the late
1950s, by which time computers were well-known.
The early years
The first proposals for machine translation using computers
were put forward by Warren
Weaver, a researcher at the Rockefeller
Foundation, in his July, 1949 memorandum.[2]
These proposals were based on information
theory, successes of code
breaking during the second
world war and speculation about universal underlying
principles of natural language.
A few years after these proposals, research began in earnest
at many universities in the United
States. On 7
January 1954,
the Georgetown-IBM
experiment, the first public demonstration of a MT system,
was held in New York at the head office of IBM. The demonstration
was widely reported in the newspapers and received much
public interest. The system itself, however, was no more
than what today would be called a "toy" system, having just
250 words and translating just 49 carefully selected Russian
sentences into English — mainly in the field of chemistry.
Nevertheless it encouraged the view that machine translation
was imminent — and in particular stimulated the financing
of the research, not just in the US but worldwide.[3]
Early systems used large bilingual dictionaries and hand-coded
rules for fixing the word order in the final output. This
was eventually found to be too restrictive, and developments
in linguistics at the time, for example generative
linguistics and transformational
grammar were proposed to improve the quality of translations.
During this time, operational systems were installed. The
United
States Air Force used a system produced by IBM
and Washington
University, while the Atomic
Energy Commission in the United
States and Euratom
in Italy
used a system developed at Georgetown
University. While the quality of the output was poor,
it nevertheless met many of the customers' needs, chiefly
in terms of speed.
At the end of the 1950s, an argument was put forward by
Yehoshua
Bar-Hillel, a researcher asked by the US government
to look into machine translation against the possibility
of "Fully Automatic High Quality Translation" by machines.
The argument is one of semantic
ambiguity or double-meaning. Consider the following sentence:
- Little John was looking for his toy box. Finally he
found it. The box was in the pen.
The word pen may have two meanings, the first meaning
something you use to write with, the second meaning a container
of some kind. To a human, the meaning is obvious, but he
claimed that without a "universal encyclopedia" a machine
would never be able to deal with this problem. Today, this
type of semantic ambiguity can be solved by writing source
texts for machine translation in a controlled
language that uses a vocabulary
in which each word has exactly one meaning.
The 1960s, the ALPAC report and
the seventies
Research in the 1960s in both the Soviet
Union and the United States concentrated mainly on the
Russian-English
language pair. Chiefly the objects of translation were
scientific and technical documents, such as articles from
scientific
journals. The rough translations produced were sufficient
to get a basic understanding of the articles. If an article
discussed a subject deemed to be of security interest, it
was sent to a human translator for a complete translation;
if not, it was discarded.
A great blow came to machine translation research in 1966
with the publication of the ALPAC
report. The report was commissioned by the US government
and performed by ALPAC,
the Automatic Language Processing Advisory Committee, a
group of seven scientists convened by the US government
in 1964. The US government was concerned that there was
a lack of progress being made despite significant expenditure.
It concluded that machine translation was more expensive,
less accurate and slower than human translation, and that
despite the expenses, machine translation was not likely
to reach the quality of a human translator in the near future.
The report, however, recommended that tools be developed
to aid translators — automatic dictionaries, for example
— and that some research in computational linguistics should
continue to be supported.
The publication of the report had a profound impact on
research into machine translation in the United States,
and to a lesser extent the Soviet
Union and United
Kingdom. Research, at least in the US, was almost completely
abandoned for over a decade. In Canada,
France
and Germany,
however, research continued; in 1970, the Systran
system was installed for the United
States Air Force and subsequently in 1976 by the Commission
of the European Communities. The METEO
System, developed at the Université
de Montréal, was installed in Canada
in 1977 to translate weather forecasts from English to French,
and was translating close to 80,000 words a day or 30 million
words a year until it was replaced by a competitor's system
on the 30th
September, 2001.[4]
While research in the 1960s concentrated on limited language
pairs and input, demand in the 1970s was for low-cost systems
that could translate a range of technical and commercial
documents. This demand was spurred by the increase of globalisation
and the demand for translation in Canada,
Europe,
and Japan.
The 1980s and early 1990s
By the 1980s, both the diversity and the number of installed
systems for machine translation had increased. A number
of systems relying on mainframe
technology were in use, such as Systran,
and Logos.
As a result of the improved availability of microcomputers,
there was a market for lower-end machine translation systems.
Many companies took advantage of this in Europe, Japan,
and the USA. Systems were also brought onto the market in
China,
Eastern
Europe, Korea,
and the Soviet
Union.
During the 1980s there was a lot of activity in MT in Japan
especially. With the Fifth
generation computer Japan intended to leap over its
competition in computer hardware and software, and one project
that many large Japanese electronics firms found themselves
involved in was creating software for translating to and
from English (Fujitsu, Toshiba, NTT, Brother, Catena, Matsushita,
Mitsubishi, Sharp, Sanyo, Hitachi, NEC, Panasonic, Kodensha,
Nova, Oki).
Research during the 1980s typically relied on translation
through some variety of intermediary linguistic representation
involving morphological along with syntactic and semantic
analysis.
At the end of the 1980s there was a large surge in a number
of novel methods for machine translation. One system was
developed at IBM
that was based on statistical
methods. Other groups used methods based on large numbers
of example translations, a technique which is now termed
example-based
machine translation. A defining feature of both of these
approaches was the lack of syntactic and semantic rules
and reliance instead on the manipulation of large text corpora.
During the 1990s, encouraged by successes in speech
recognition and speech
synthesis , research began into speech translation.
There was significant growth in the use of machine translation
as a result of the advent of low-cost and more powerful
computers. It was in the early 1990s that machine translation
began to make the transition away from large mainframe
computers toward personal
computers and workstations.
Two companies that led the PC market for a time were Globalink
and MicroTac,
following which a merger of the two companies (in December
1994) was found to be in the corporate interest of both.
Intergraph and Systran also began to offer PC versions around
this time. Sites also became available on the internet,
such as AltaVista's
Babel
Fish (using Systran
technology) and Google
Language
Tools (also initially using Systran
technology exclusively).
Recent research
The field of machine translation has in the last few years
seen major changes. Currently a large amount of research
is being done into statistical
machine translation and example-based
machine translation. Today, only a few companies use
statistical
machine translation commercially, e.g. Language
Weaver (sells translation products and services), Google
(uses their proprietary statistical MT system for some language
combination in Google's language tools) and Microsoft
(uses their proprietary statistical MT system to translate
knowledge base articles). There has been a renewed interest
in hybridisation, with researchers combining syntactic and
morphological (i.e., linguistic) knowledge into statistical
systems, as well as combining statistics with existing rule
based systems.
See also
Notes
- ^
Hutchins, J. (2005)
- ^
Weaver
memorandum (March 1949)
- ^
Hutchins, J. (2005)
- ^
PROCUREMENT
PROCESS by Canadian International Trade Tribunal,
30th
July, 2002,
consulted 2007-02-10
- ^
Van Slype, G. (1983)
References
Further reading
- Hutchins, J. (1986) Machine Translation: past, present,
future (Chichester: Ellis Horwood) ISBN
0-85312-788-3 — available online here
|