Translation Technology and the Translator

By John Hutchins

WJHutchins@compuserve.com
http://ourworld.compuserve.com/homepages/WJHutchins/

(University of East Anglia, Norwich, UK)

[From: ITI conference 11: international conference, exhibition & AGM. Proceedings compiled by Catherine Greensmith & Marilyn Vandamme. Proceedings of the Eleventh Conference of the Institute of Translation and Interpreting, 8-10 May 1997, at The Crown Hotel, Crown Place, Harrogate. London: ITI, 1997. Pp. 113-120.]

This article in PDF-format

Translators are perhaps the most critical audience for presentations about the automation of translation. Many of them will agree with comments made by J.E.Holmström in a report on scientific and technical dictionaries submitted to Unesco in 1949. Having heard that some researchers were investigating the possibilities, he thought that "the resulting literary style would be atrocious and fuller of 'howlers' and false values than the worst that any human translator produces". The reason was that "translation is an art; something which at every step involves personal choice between uncodifiable alternatives; not merely direct substitutions of equated sets of symbols but choices of values dependent for their soundness on the whole antecedent education and personality of the translator." His comments preceded by five years the first tentative demonstration of a prototype system, and were based on pure speculation. Nevertheless, such comments have been repeated again and again by translators for nearly fifty years, and no doubt they shall be heard again in the next fifty.

However, we shall see that computer-based translation systems are not rivals to human translators, but they are aids to enable them to increase productivity in technical translation or they provide means of translating material which no human translator has ever attempted. In this context we must distinguish (1) machine translation (MT), which aims to undertake the whole translation process, but whose output must invariably be revised; (2) computer aids for translators (translation tools), which support the professional translator; and (3) translation systems for the 'occasional' non-translator user, which produce only rough versions to aid comprehension. These differences were not recognised until the late 1980s; the previous assumption had been that MT systems, whether running on a mainframe or a microcomputer, could serve all these functions with greater or less success. In part, this failure to identify different needs and to design systems specifically to meet them has contributed to misconceptions about what translation technology can do for the professional translator.

When machine translation (MT) was in its infancy, in the early 1950s, research was necessarily modest in its aims.¹ It was constrained by the limitations of hardware, in particular by inadequate computer memories and slow access to storage, and by the unavailability of highlevel programming languages. Even more crucially it could look to no assistance from the language experts. Syntax was a relatively neglected area of linguistic study and semantics was virtually ignored. The early researchers knew that whatever systems they could develop would produce poor quality results, and they assumed major involvement of human translators both in the pre-editing of input texts and in the post-editing of the output. They proposed also the development of controlled languages and the restriction of systems to specific subject areas.

In this atmosphere the first demonstration systems were developed, notably the collaboration between IBM and the Georgetown University in 1954. Based on small vocabularies and carefully selected texts, the translations produced were impressively colloquial. Consequently, the general public and potential sponsors of MT research were led to believe that good quality output from automatic systems was achievable within a matter of a few years. The belief was strengthened by the emergence of greatly improved computer hardware, the first programming languages, and above all by developments in syntactic analysis based on research in formal grammars (e.g. by Chomsky and others.)

For the next decade MT research grew in ambition. It became widely assumed that the goal of MT must be the development of fully automatic systems producing high quality translations. The use of human assistance was regarded as an interim arrangement. The emphasis of research was therefore on the search for theories and methods for the achievement of 'perfect' translations. The current operational systems were regarded as temporary solutions to be superseded in the near future. There was virtually no serious consideration of how 'less than perfect' MT could be used effectively and economically in practice. Even more damaging was the almost total neglect of the expertise of professional translators, who naturally became anxious and antagonistic. They foresaw the loss of their jobs, since this is what many MT researchers themselves believed was inevitable.

Progress was much slower than expected, and the output of systems showed no sign of improvements. In these circumstances it was not surprising that in 1966 a committee set up by US sponsors of research - the Automatic Language Processing Advisory Committee (ALPAC) - found that MT had failed according to its own aims, since there were no fully automatic systems capable of good quality translation and there seemed little prospect of such systems in the near future.

While this ALPAC report brought to an end many MT projects, it did not banish the public perception of MT research as essentially the search for fully automatic solutions. The subsequent history of translation technology is in part the story of how this mistaken emphasis of the early years has had to be repaired and corrected. The neglect of the translation profession has been made good eventually by the provision of translation tools and translator workstations. MT research has itself turned increasingly to the development of realistic practical systems where the necessity for human involvement at different stages of the process is fully accepted as an integral component of their design architecture.

Hence since the early 1970s development has continued in three main strands: computerbased tools for translators, operational MT systems involving human assistance in various ways, and 'pure' theoretical research towards the improvement of MT methods.

Until the late 1980s one paradigm dominated the utilisation of MT systems. It had been inherited from the very earliest days: the system produced large volumes of poorly translated texts, which were either (i) used for the assimilation of information directly or (ii) submitted to extensive post-editing, with the aim of getting texts of publishable quality for dissemination. As a means of improving the quality many organisations introduced controls on the vocabulary, structure and style of texts before input to systems; and this has been how Systran, Logos, METAL and similar mainframe systems have been used (and continue to be used) by multinational companies and other large organisations.

When the first PC versions of MT systems appeared it was widely assumed that they would be used in much the same way: to obtain 'rough gists' for information purposes or as 'draft translations' for later refinement. In both cases, it was also widely assumed that the principal users of MT systems would be translators or at least people with good knowledge of both source and target languages; and, in the cased of the use in large organisations, it was expected that most would be professionally trained translators.

However, during the late 1980s - and with increasing pace since the early 1990s - this paradigm and its assumptions have been broken by developments on a number of fronts.² Firstly, there has been the commercial availability of translator workstations, designed specifically for the use of professional translators; these are essentially computer-based translation tools and not intended to produce even partial translations fully automatically. Secondly, the PC-based systems were bought and used by an increasingly large number of people with no interest in translation as such; they were being used as 'aids for communication', where translation quality was of much less importance. Thirdly, there came the development of domain-specific systems by clients themselves: custom-built systems accepting input in constrained vocabulary and integrated closely in documentation and publication systems. Fourthly, the growth of telecommunication networks with communication across many languages has led to a demand for translation devices to deal rapidly in real time with an immense and growing volume of electronic language. Finally, the wider availability of databases and information resources in many different languages has led to the need for multilingual search and access devices which incorporate translation modules.

All current commercial and operational systems produce output which must be edited (revised) if it is to attain publishable quality. Only if rough translations are acceptable for information analysis purposes can the output of MT systems be left unrevised. Commercial developers of MT systems now always stress to customers that MT does not and cannot produce translations acceptable without revision: they stress the imperfect nature of MT output. They recognise fully the obligation to provide sophisticated facilities for the formatting, input, revision and publication of texts within total documentation processing from initial authoring to final dissemination.

It is now widely accepted that MT works best in domain-specific and controlled environments. The first domain-specific success was Meteo, a system for translating weather forecasts from English into French, and used continuously since 1977 by the Canadian broadcasting service. The use of controlled input was taken up in the late 1970s by Xerox for its implementation of the Systran system. Other applications of controlled input have followed in the 1980s and 1990s with other general-purpose systems, e.g. for the localisation of computer software for sale in many countries and languages.

However, rather than adapting general-purpose MT systems in this way, it is now recognised that it is better to design systems ab initio for use with controlled language. A number of independent companies outside the academic MT research community have been doing this in recent years (e.g. Volmac); the largest current development is the Caterpillar project based on the research at Carnegie Mellon University.

In general most commentators agree that MT (full automation) as such is quite inappropriate for professional translators. They do not want to be subservient to machines; few want to be revisers of poor quality MT output. What they have long been asking for are sophisticated translation tools. Since the early 1990s they can now have them in the shape of translation workstations. These offer translators the opportunity of making their work more productive without taking away the intellectual challenge of translation. Translator workstations combine access to dictionaries and terminological databanks, multilingual word processing, the management of glossaries and terminology resources, appropriate facilities for the input and output of texts (e.g. OCR scanners, electronic transmission, high-class printing).

The development of translation tools became feasible, firstly with the availability of realtime interactive computer environments in the late 1960s, then the appearance of word processing in the 1970s and of microcomputers in the 1980s and, subsequently, with intraorganisation networking and the development of larger computer storage capacities. Although workstations were developed outside the older MT research community, their appearance has led to a decline of the previous antagonism of translators to the MT community in general. They are seen to be as the direct result of MT research. Indeed, the 'translation memory' facility, which enables the storage of and access to existing translations for later (partial) reuse or revision or as sources of example translations, does in fact derive directly from what was initially 'pure' MT research on bilingual text alignment within a statistics-based approach to automatic translation.

At the present time, the sales of translator workstations incorporating translation memories are increasing rapidly, particularly in Europe. Their success has built upon translators' experience with terminology management systems and upon the demonstrable improvements of productivity, terminological consistency and overall quality. The next stage of development will be the fuller integration of MT modules in order to provide automatic translation of sentences or text fragments when required, e.g. if the existing texts in a translation memory do not provide usable translation sources.

After ALPAC, research on MT has, of course, continued.³ However, the field has continued to attract perfectionists. Very often, systems have been developed without any idea of how they might be used or who the users might be. MT has been seen as a testbed for exploring new linguistic and computational techniques. In nearly every case, it was found that the 'pure' adoption of a new theory was not as successful as initial trials on small samples appeared to demonstrate. The basic lesson is that MT demands an eclectic approach, the use of hybrid methods combining a variety of techniques; and, above all, no quick results can be expected with any new approach.

What was often forgotten is that MT is the application of computational, linguistic, etc. methods and techniques to a practical task; that translation is itself a means to an end - a task which has never been and cannot be 'perfect'; there are always other possible (often multiple) translations of the same text according to different circumstances and requirements. MT can be no different: there cannot be a 'perfect' automatic translation. The use of an MT system is contingent upon its cost effectiveness in practical situations.

Within the last ten years, research on spoken translation has developed into a major focus of MT activity. Research projects such as those at ATR in Japan, Carnegie-Mellon University in the US and on the Verbmobil project in Germany are ambitious. But they do not make the mistake of attempting to build all-purpose systems: systems are constrained and limited to specific domains, sublanguages and categories of users. Nevertheless, there are obvious potential benefits even if success is only partial.

Research has begun also on systems for speakers or writers who are ignorant of the target language; an area neglected in the past. In these cases, what is required is a means of conveying a message in an unknown language; it does not have to be a straight translation of any existing original. From interactive dialogue a translatable (MT-amenable) 'message' can be composed for automatic conversion into an idiomatic and correct message in the target language without further involvement of the originator.

As for translation for those wholly ignorant of the source language, this need has been provided until recently by the use of unrevised outputs from older batch-processing systems, i.e. as by-products of systems primarily intended to produce translations for revision before publication. Within the last decade, however, cheap PC-based software has appeared on the market which can be (and undoubted is being) used by monolinguals who want only to grasp something of the gist of texts. They are not wholly satisfactory, of course, and the development of fully automatic systems specifically for this potentially huge market is a challenge for future MT research.

With the expansion of global telecommunications (the Internet and World Wide Web) has come the networking of translation services. Nearly all the larger MT software vendors now offer their systems as a service to individual or company customers. Texts can be sent on-line for immediate 'rough' translation with no post-editing, or for treatment in a more traditional manner with expert revision, editing and preparation for publication by the service. This form of networked MT is clearly a further development of familiar translation services, and one with considerable growth potential. It is assumed that in future there will emerge various forms of networked 'translation brokerage' services which will advise customers on the most appropriate MT service for their needs, e.g. in terms of costs, languages, speed, dictionary coverage, terminology control, overall translation quality, post-editing support, etc. Some of these 'translation brokers' may themselves be automated, and undertake searches of the Web for particular client needs. As a consequence, we may well see the emergence of more specialised MT systems (for particular domains and language pairs), some of which will thrive and others which will fail in the global competitive market.

Even more significant for the future, however, is the appearance of systems for on-line and real-time translation of electronic mail messages. In 1994 the CompuServe service introduced automatic translation from and to English and French, German or Spanish for messages on one of its forums.⁴ It became so popular that the facility was extended to two other on-line services within the next couple of years, until now thousands of messages a day are being translated. The software used was not of course designed originally to deal with the frequently ungrammatical conversational style and the sometimes idiosyncratic vocabulary of electronic mail. Hence, much of the output is garbled and barely comprehensible; but a large number of users have found the results valuable aids for comprehension.
Only a fully automatic system could operate in real-time on this scale. The potential market for network MT systems is enormous. At CompuServe alone there are more than 3,000 other on-line services where MT could be introduced; and other Internet services could easily follow their lead. It has been estimated that there are currently over 40 million electronic mail messages a month. If only a small fraction of these were candidates for translation, the demand would be enormous.

In addition to electronic messages, the amount of information available in text form on Web pages can now counted in their hundreds of millions, and they are growing exponentially at a high rate (10% between 1995 and 1996). The non-English content is estimated as 80% of the total, and there is no doubt that readers everywhere prefer to have text in their own language, no matter how flawed and error-ridden it may be, rather than to struggle to understand a foreign language text. The Japanese software companies have already recognised the huge potential market and there are a number of English-Japanese translation modules available for integration with Web software. Similar Web translation software is being developed and sold for other languages, both by existing vendors of MT systems and by new companies.

A further factor will be the growth of multilingual access to information sources. Increasingly, the expectation of users is that on-line databases should be searchable in their own language, that the information should be translated and summarised into their own language. The European Union is placing considerable emphasis on the development of tools for information access for all members of the community. Translation components are obviously essential components of such tools; they will be developed not as independent stand-alone modules, but fully integrated with the access software for the specific domains of databases. The use of MT in this wider context is clearly due for rapid development in the near future.

Where do these developments leave the professional translator? It is plausible to divide the demand for translation into three main groups. The first group is the traditional demand for translations of publishable quality: translation for dissemination. The second, emerging with the information explosion of the twentieth century, is the demand for translations of short-lived documents for information gathering and analysis which can be provided in unedited forms: translation for assimilation. The third group is the demand for on-the-spot translation - the traditional role of the interpreter - which has taken a new form with electronic telecommunications: translation for interaction.

Translation for dissemination has been satisfied with mixed successes and frequent failures by the large-scale MT systems which are most familiar to translators. Cost-effective use of relatively poor quality output, which has to be revised by human translators, is difficult to achieve without some control of the language of input texts (at least for terminology consistency). It has been an option for only the largest multinational companies with large volumes of documentation, which cannot be dealt with except by automating parts of their total documentation processes. In recent years, translation workstations have offered a feasible and probably more attractive route for professional translators: translations of publishable quality can be made at higher productivity levels while maintaining translators' traditional working methods. In the future, we can expect the majority of professional translators to be using such tools - not just from commercial expediency, but from personal job satisfaction.

Translation for assimilation has not traditionally been undertaken by professional translators. The work has been done in organisations often by secretaries or other clerical staff with some knowledge of languages as an occasional service, and usually under time pressures. Those performing the work have naturally been dissatisfied with the results, since they are not professionally trained. In this function, MT has filled a gap since the first systems were available in the early 1960s. The use of Systran at the European Commission illustrates the value of such 'rough' translation facilities. This use exceeds by far its use for the production of translations for dissemination. It is believed that most of the use for the cheaper PC-based translation software is translation for information assimilation, mainly for personal use but sometimes within an organisation. Rarely, if ever, do professional translators see this output. Undoubtedly, there will continue to be a large and growing demand for this type of translation need - one which the translation profession as such has not been able to meet in the past.

Translation for interaction covers the role of translation in face-to-face communication (dialogue, conversation) and in correspondence, whether traditional mail or the newer electronic, more immediate, form. Translators have often been employed occasionally by their organisations in these areas, e.g. as interpreters for foreign visitors and as mediators in company correspondence, and they will continue to do so. But for the real-time translation of electronic messages it is not possible to envisage any role for the translator; for this, the only possibility is the use of fully automatic systems.

However, the very familiarity of MT systems will alert a much wider public to translation as a major and crucial feature of global communication, and probably to a degree never before experienced. Inevitably, translation will itself receive a much higher profile than in the past. People using the crude output of MT systems will come to realise the added value (i.e. higher quality) of professionally produced translations. As a result, the demand for human produced translation will rise, and the translation profession will be busier than ever. Fortunately, professional translators will have the support of a wide range of computer-based translation tools, enabling them to increase productivity and to improve consistency and quality. In brief, automation and MT will not be a threat to the livelihood of the translator, but will be the source of even greater business and will be the means of achieving considerably improved working conditions.

¹ For the history of machine translation see: W.J.Hutchins: Machine translation: past, present, future. Chichester (UK): Ellis Horwood, 1986.

² For a survey of current use of MT systems see: C.Brace, M.Vasconcellos and L.C.Miller: 'MT users and usage: Europe and the Americas', MT News International no.12 (October 1995), 14-19.

³ For a review of MT research see: W.J.Hutchins: 'Research methods and system designs in machine translation: a ten-year review, 1984-1994', in: Machine Translation Ten Years On, international conference, 12-14 November 1994, Cranfield University.

⁴ For details see: M.Flanagan: 'Two years online: experiences, challenges and trends', in: Expanding MT Horizons: proceedings of the Second Conference of the Association for Machine Translation in the Americas, 2-5 October 1996, Montreal, Quebec, Canada, pp. 192-197.

Reprinted by permission from Mr. John Hutchins,
http://ourworld.compuserve.com/homepages/WJHutchins/

Submit your article!