The Rise of the Machine! (Machine Translation, that is) Machine Translation translation jobs
Home More Articles Join as a Member! Post Your Job - Free! All Translation Agencies
Advertisements

The Rise of the Machine! (Machine Translation, that is)



Become a member of TranslationDirectory.com at just $12 per month (paid per year)





ClientSide News Magazine pictureOne of the original anticipated uses of computers was machine translation. By Machine Translation (MT), we mean the automation of the translation process. As early as the 1950’s a primitive experiment of translating sixty Russian sentences into English was deemed a success and resulted in a period of significant funding for research which through various ebbs and flows continues to this day. It was thought that with this new technology ‘oldstyle’ human translations would soon become obsolete. However, for several reasons this never happened and ‘old-style’ translation practices are still dominant in the localization industry.

The difficulties which restricted the development of MT are that natural language is essentially still too complicated to be broken down into a finite rule set. This is especially true for synonyms (e.g. in an example provided by Tony Hartley, ‘light’ in English could mean the following in French: ‘allumer’, ‘lumière’, ‘léger’, ‘clair’) and idioms.

Moreover, even if the software routines can maintain the meaning they will not be able to deliver a natural easy reading style. For example take this sentence:

Ces exemples ont été sélectionnés dans un grand nombre de sources différentes.

This was translated through a well know MT application as:

These examples were selected in a great different number of sources.

This is mostly understandable, but it’s not attractive or indeed correct English. When translated by a human it is rendered: “The examples have been selected from a wide variety of different sources”.

However the promise and allure of MT refuses to disappear and with many positive changes and opportunities facing the industry at large the stage is once again putting out a call for MT to be a main player in global content production. This means that it is also time for the localization community, buyers and sellers alike, to re-engage in the technology and understand exactly what application and benefit can be extracted. If proof of this necessity is needed then look to TAUS (the Translation Automation User Society) and it incredibly impressive list of members and you will have no doubt about why you need to understand the current state of the technology and its implications.

Why the sudden increase in interest?

The current resurgence in popular interest is largely due to very good timing. As with many of histories great shifts in thought and process, a number of different developments have all come together at once to create opportunity for all. An overview of the biggest of these factors is given below:

• to be translated, linked to the growth in current, and future value of the global marketplace.
• The increased value and importance of huge volumes of user created content. Wiki groups, crowd sourcing, chat and/or knowledge bases all help to support potential buyers around the world.
• The competitive nature of the online and technology industries means that anything that offers even a hint of ‘faster time to market’ as machine translation surely does is guaranteed consideration.
• The understanding and recognition of term control and the demand and value of consistency for the valuation of a Global Brand.
• The ever present need for tactical costs-savings.
• The development of the market to the point of real global competition means product alone is no longer a competitive offering – driving more content demands than ever, often with very little incremental budget increases available.
• Better technical output quality and rules complexity from the MT tools Massive statistical and analytical capabilities of computers
• The emergence of ‘necessary quality’ rather than absolute quality

This final point on ‘better quality’ needs further clarification, as an understanding of this discussion is central to the future adoption of MT and underlies most of the reasons that there is such sudden interest in the technology. The reason for MT generally not being considered good enough in the past was because it was felt that the ‘quality’ was not high enough to be useful and often this was done in comparison to human translation as if one would replace the other at the touch of a button.

However, the question is now being asked, what exactly is ‘quality’? Perhaps most clearly for our consideration, is the question whether quality is more important than usability. If a Machine Translated knowledge base can solve 10,000 queries a quarter, regardless of grammatical awkwardness then many want to argue the quality is ‘good enough’ to justify the expense and time invested.

Arguably therefore, the current question of quality is not so much human translation versus machine translation— it is machine translation versus no translation at all. When considered from this angle the future is very promising indeed for the technology.

How Machine Translation Works

So with a focus and understanding of the goals of the current MT movement we should turn our attention to a look at the different types of tools and systems available. There are two main methods for machine translation, Rules Based Machine Translation (RBMT) and Statistics based (EBMT and SBMT).

Rules-Based Machine Translation (Transfer-based MT)

Rules based techniques rely on using linguistic rules for the source and target languages and trying to translate the meaning through dictionaries and grammatical understanding. This technique uses aspects of artificial intelligence.

The system starts with checking the terms against a dictionary and parsing (grammatical analysis) the source. The parsing text will break sentences into its parts, assigning a grammatical function (such as subject, predicate or object) to each word in the sentence. This parsing produces better results than simple word-by-word transfer as it makes better use of sentence context and can provide translations that follow the grammar of the target language.

After the sentence has been parsed, a series of rules are used to reorder words and otherwise alter the structure of the incoming sentence to produce a translation that is grammatically correct for the target language.

Finally, a generation stage produces inflections, contractions, etc. for the target language words. Transferbased systems have to be designed for specific language pairs, so new sets of rules must be created for each language pair and each direction of translation.

Statistical Machine Translation (Data-Driven MT)

The second main method is statistical. This method uses a body of pre-existing translation and then compares source strings to see if an existing translation exists. Consisting of EBMT (Example-Based Machine Translation: extraction of phrases for recombination) and SBMT (Statistics-Based Machine Translation: statistical translation model, based on word frequency). These systems analyze a large number of previously created bilingual sentence pairs to establish which words or expressions in one language are most frequently matched with words or expressions in the other.

This approach offers the potential of much faster development of an MT system and can take advantage of the large translation memory databases that many organizations have built up.

The drawback of this method is the difficulty in building a reliable corpus of translation. For example this is the main technique used by Google for their online translation service and they have had to input as much as 200,000,000,000 words of translation to improve the service provided.

The Hybrid system

Most machine technologies use a combination of the above two techniques (RBMT coupled with EBMT/SBMT plus standard translation memories). A hybrid system implies several tools/processes, mainly RBMT coupled with EBMT/SBMT plus TMs. For example, to translate a given text EBMT/SBMT will be applied and the text remaining untranslated after this process will be translated using RBMT.

A Proposed Process

The following is a potential process for using machine translation:

• an optional stage of creation of dictionaries and translations memories;
• processing the translation through the machine translation engine;
• a third optional stage of post-editing the files to ensure desired quality.

There are four options for integrating this into a company’s processes:

1. The client does the machine translation (sets up the dictionaries and processes the files) and the L10N vendor checks the translation to ensure that translation is at the required quality (normally called post-editing).

2. L10N Company takes the source files from the customer and process the files and post-edits the files themselves.

3. L10N Company takes the source files from the customer and out-sources the machine translation and post-edits the files themselves.

4. L10N Company takes the source text and machine translation and delivers the machine translated files without human involvement in the text; the only human involvement would be in the processing of the files.

It is commonly agreed that the quality coming from MT ranges from poor to average. In any case, prior to any project, the notion of expected quality must be clearly defined with the client.

Pre and Post-Editing Process

Machine Translation is most attractive when used for short lived text such as Online News, Knowledge Bases, and anything that can be obtained freely. As soon as something incurs a cost then the question will be asked what the client is getting for his/her money. To get Machine translated raw text to an acceptable quality pre-editing of the source and post-editing is required and these expenses needs to be justified.

To get a reasonable level of quality, you need pre-edited files (by writers who know their output will be machine translated), an effective MT software package, and considerable (but variable) amount of work in post-editing using linguistic reviewers.

Machine Translation post-editing is a new task in the localization field that requires slightly different skills than for traditional translation. The post-editing localizer has to:

a) Compare the source material with the raw machine translation and decide to what degree the translation is useable.

b) Make corrections, which could range from minor improvements of grammar to complete re-writes.

To the future!

So with an understanding of the aims and potential applications of MT and armed with knowledge of the different types available, the next question becomes what next? It is a question that the whole industry is struggling with right now. Everyone can see the benefit and application, but the exact delivery and business model is still up for debate.

The larger organizations such as Cisco and Microsoft are applying this technology not only on MT WARNING stamped FAQ content but within their general content creation process to great effect. However there is still significant pain and the business model is not yet ‘written in stone’. There is room from improvement and these organizations are active participants in the TAUS forum.

Google to has invested millions and has a lot of content and product capability to show for it, but still no distinct application business model.

Many of these are discussed in more detail in the full version of this whitepaper found at www.jonckers.com/en/whitepapers . Indeed there is much research going on right now to push the content capability at this end of the localization business, but it is equally true that a huge opportunity exists that has still yet to find an answer. Everyday more content piles up – locked up in its source language waiting to be discovered. Someone will find a way to unlock its meaning, could it be you?

Whatever the final answer is, one thing at this point is beyond question MT will have a role in solving the global content conundrum.

About Jonckers

Jonckers, Localization Provider of the Year 2006, is focused on delivering software, eLearning, and multimedia localization services to the world’s best companies. Jonckers achieves localization excellence through an ERP controlled global network of wholly owned offices spanning Asia, Europe and the US allowing Jonckers to deliver low cost global resources without sacrificing quality. For more information please visit www.jonckers.com.

Jonckers Contact Information

Europe
Ian Butler
+353 866090384
email: ian.butler[at]jonckers.com

Asia Pacific
Sung Cho
+82-2-6627-3000
email: sung.cho[at]jonckers.com

USA
Roman Sofianos
+1 877 590 1927, ext 706
email: roman.sofianos[at]jonckers.com




ClientSide News Magazine - www.clientsidenews.com







Submit your article!

Read more articles - free!

Read sense of life articles!

E-mail this article to your colleague!

Need more translation jobs? Click here!

Translation agencies are welcome to register here - Free!

Freelance translators are welcome to register here - Free!









Free Newsletter

Subscribe to our free newsletter to receive news from us:

 
Menu
Recommend This Article
Read More Articles
Search Article Index
Read Sense of Life Articles
Submit Your Article
Obtain Translation Jobs
Visit Language Job Board
Post Your Translation Job!
Register Translation Agency
Submit Your Resume
Find Freelance Translators
Buy Database of Translators
Buy Database of Agencies
Obtain Blacklisted Agencies
Advertise Here
Use Free Translators
Use Free Dictionaries
Use Free Glossaries
Use Free Software
Vote in Polls for Translators
Read Testimonials
Read More Testimonials
Read Even More Testimonials
Read Yet More Testimonials
And More Testimonials!
Admire God's Creations

christianity portal
translation jobs


 

 
Copyright © 2003-2024 by TranslationDirectory.com
Legal Disclaimer
Site Map