Compatibility of Languages and Other Restrictions in the Statistical Translation by Google Translate Machine Translation translation jobs
Home More Articles Join as a Member! Post Your Job - Free! All Translation Agencies
Advertisements

Compatibility of Languages and Other Restrictions in the Statistical Translation by Google Translate


Become a member of TranslationDirectory.com at just $8 per month (paid per year)




This is an abridged translation into English of an article about the Google Translate technology which was written in Russian in December 2011. Prepared by Elena Tikhomirova and Nina Buonaiuto.

Introduction

Sometimes we wonder whether machines can replace humans in a particular area. The development of information technologies enables that fewer calculations have to be performed manually, computers take over some of the tasks completely and thereby do not require human control. In the field of machine translation, programs mostly serve as a tool for professionals, machine production here still needs improvement. Google offers an innovative technology based on the statistical machine translation method, and previously claimed that it can independently translate a site "at one click" and provide a sufficient understanding of the original. The purpose of this survey is to determine the adequacy of this produce in different translation directions and for various kinds of information, as well as to verify a statement by Google Translate developers that the service should improve significantly over time.

The online translator was tested in March-April 2011 in English, Russian, Ukrainian, French and German. The work on the survey was continued and completed in October-December 2011, when to elaborate the results Polish was added and all language pairs were retested on the same texts.

About the Technology of Google’s Online Translator

Google has posted the following claims on their web resources, read on to see if they prove to be realistic.

Google’s second guiding principle, out of ten, reads as follows: “It’s best to do one thing really, really well.” The Google Translate technology is a direct extension of the search engine, Google’s key service that has won the company its popularity. Google’s online translator like the search engine a) creates a database of texts from the Internet (of the variants of word and phrase matches between different languages) and b) develops algorithms of selecting the best match to the query (to the words and phrases of the source text). This is the statistical approach in machine translation.

FYI: how Google Translate works as per its creators

(Quotes from the Google Translate blogs.)

googletranslate.blogspot.com

When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation for you. By detecting patterns in documents that have already been translated by human translators, Google Translate can make intelligent guesses as to what an appropriate translation should be. This process of seeking patterns in large amounts of text is called "statistical machine translation". Since the translations are generated by machines, not all translation will be perfect. The more human-translated documents that Google Translate can analyse in a specific language, the better the translation quality will be. This is why translation accuracy will sometimes vary across languages.

googlerussiablog.blogspot.com (translated from Russian by E.T.)

Common machine translators usually transform grammar constructions from one language to another based on strictly detailed rules.

An example of such rule may be "if Present Perfect was used in the original, then in the Russian translation the respective verb form should be used".

These rules may be comparatively complex or simple. Rules that recognize complex constructions and change the word order in the output text may be used. In any case one has to write them in traditional translators manually. This approach has its pluses and minuses, particularly the inhuman difficulty of the work necessary to cover all the variety of a language by such rules.

Google Translate is different in essence. We have a set of statistical heuristics, for example, "this word sequence is usually translated in this way", which is supplemented by a number of auxiliary rules that generalize word groups. These rules are more than you normally find in a traditional dictionary, that’s why [sic!]they are not treated manually, they are automatically generated.

To teach Google Translate, we initially took a number of texts translated as close as possible to the original. Further on to perfect the rules we let users send us their translations of the mistranslated phrases.

They [Google Translate developers – E.T.] continue to work on extracting information from web pages, structuring it, finding the context, classifying data in general and websites in particular. Their work promotes the improvement of search, creation of new products based on a better understanding of the structure of the Internet.

Lab Work. Comparing the Translation Results

The translator was tested on texts in the following 6 languages: English, Russian, German, French, Ukrainian, and Polish. 4 languages thereof are synthetic, 2 analytical, 3 Slavonic, 2 Germanic, and 1 Romanic. They make 15 language pairs and 30 translation directions.

I took a text or two in all the six languages and translated them by Google Translate to the remaining five languages. See short extracts of the translations in Appendix 1.

The initial goal was to define the best translation directions and topics, but some observations during analysis showed that the idea was impracticable, and the subject of the article changed to the disclosure of the main principles of the statistical machine translation by Google. They aren’t obvious when you translate a few texts from or into English. But when you compare the results into a number of languages which belong to different linguistic groups, you start to notice certain tendencies.

A. Texts in most language pairs are translated via an intermediate translation into English with the “broken telephone” effect

You take a German text and translate it into English, French and Russian. The quality will be different. Why? There may be different guesses and company’s explanations (see above), but the real reason is that texts are first translated into English and this imperfect machine translation result is then rendered into another language. Thereby mistakes of the English version are inherited (whereas there wouldn’t be such errors in the direct translation), ambiguous places often misinterpreted (more mistakes, again), grammatical information in the endings nullified (most important between synthetic languages like German, Russian, Ukrainian, Polish). Indirect translations (via a mediator) are always worse than direct ones, even if a little bit.

Examples

(From translations made in March 2011. Also see Appendix 1.)

Сведения о том, что устройства компании Apple могут следить за своими владельцами, появились летом прошлого года… - Information that the Apple device can monitor its owners appeared last summer… - Informationen, die der Apple-Gerät können die Besitzer Monitor erschien im letzten Sommer…

любят поговорить - love to talk - die Liebe zu sprechen - l'amour de parler
не любят - do not like - mag sie nicht - je n'aime pas

Mimant une diva replette sur le retour… - Replete mimicking a diva on the back… - Vollgestopft Nachahmung einer Diva auf dem Rücken… - Реплт подражая дива на спине… - Реплт наслідуючи діва на спині…

Die Kratzmuster, die an fossilen Zähnen zu sehen sind, stehen in direktem Zusammenhang mit der Rechts- oder Linkshändigkeit individueller prähistorischer Menschen...

The scratch patterns that can be seen in fossil teeth, are directly related to the handedness of individual prehistoric people…

нуля шаблоны, которые можно увидеть в ископаемых зубов, которые непосредственно связаны с беспристрастности отдельных доисторических людей… (перевод немецкой фразы на русский)

нуля шаблоны, которые можно увидеть в ископаемых зубов, которые непосредственно связаны с беспристрастности отдельных доисторическихлюдей… (перевод английского перевода).

…in der Fachzeitschrift "Laterality". Als Nachweis dienten ihnen Schrammspuren an bis zu 500.000 Jahre alten fossilen Zähnen.

…in the journal Laterality. Schramm served them as proof traces of up to 500,000 year old fossil teeth.

…в журнале латерализации. Шрамм служили им в качестве доказательства следов до 500.000 летний зубы ископаемого (перевод немецкой фразы на русский).

…в журнале латерализации. Шраммслужили им в качестве доказательства следов до 500.000 летний зубыископаемого (перевод английского перевода).

B. Statistical translation is best between kindred languages

There was one exception from the previous rule. Russian-Ukrainian translations were made directly, without a mediator, and were excellent, almost professional level with few discrepancies.

Polish is also a Slavonic language, though not as close to Russian as Ukrainian is. Unexpectedly translations from the latter languages into Polish and back were made via English, and were twisted. It means Google doesn’t use this advantage of kindred languages whenever possible. Similarly translating between Turk languages could be just a lark, but would many buy it?

C. English is the “pivotal” language in Google Translate

Google’s translation service is really meant for the English-speaking audience: translations with this language as source or target are best. All other supported languages are connected to the pivotal language only and not to each other, there’s no need to elaborate each and every direction, only those with English. It means quick extensive development and providing some understanding between far languages. English as a lingua franca in machine translation that is.

I have an idea (it’s expounded in the Russian version of the article) that English is called-for not only because of the company’s background and the target audience, but due to linguistic reasons as well. Its analytical grammar ‘suits’ machine languages. Having strict word order and almost no endings, English facilitates a mathematical approach to analyzing and building phrases. You can combine words like bricks almost without fitting-in, you don’t have to readjust the endings every time you change a word in a structure. Thereby word order rather than word structure indicates the functions of the words (classes) and connections between them. Therefore it’s much easier to generate automated text in English than in synthetic languages like Russian. Probably the same is applicable to generating translations.

D. Compatibility with the pivotal language and the best translation directions

To make a rating of the best translation directions in my selection, I had to discard all directions without English as they would automatically be worse quality. For the remaining directions I listed and counted all typical mistakes possible. The results were predictable and coincided with the quality of the test samples. The best were translations between Western European languages (English and German being of the same Germanic group, English and French being both analytical), but they were not better than the almost professional Russian-Ukrainian translations (these two languages being in such a very close affinity to each other).

Thus it’s better to use Google’s statistical machine translation between languages in nearest affinity to English. This grammar compatibility barrier isn’t insurmountable but must require more of the traditional approach which implies teaching the machine grammar rules. It means extra interest and effort is needed.

E. Use English source texts with Google Translate

This piece of advice is meant for non-English speakers. When using Google Translate, first translate the text into English, improve the result and only then translate the English text in as many directions as you like. The same ‘nestling’ method is often used in simultaneous interpretation, translation of movies and masses of documents into multiple languages.

F. Time brings no considerable improvement of the quality of translations

Translations of the same texts made in March, October and December 2011 differ, but no variant conveys the message more clearly or has more correct constructions and translation ‘hits’. It discredits Google’s claims (see above) that the quality of translations depends on the number of analyzed texts in the concrete direction and that translations should improve considerably due to this ongoing extensive analysis.

Google Translate’s advantages are meant and best for Internet users

I’ve been a bit harsh showing the hidden imperfections, paparazziwise. In the end I must admit that Google’s translations are really worth attention and are not worse than that of the other vendors in spite of all the drawbacks.

I was wondering if such imperfect translation could now be used as a ready product. Could Google Translate substitute humans? And I thought that online translators have already done this to facilitate Internet surfing. People are getting used to worse quality of information and human translation in particular (at least, in Russia), why not then if you get it in an instant and for free?

In regards to sufficient quality or the equality to human specialists, it’s too early to say, but the statistical translation technology as it is now doesn’t offer such high hopes.

Any tool has its specific scope of usage for which it was developed and/or in which its qualities best show themselves. Google Translate’ results can be quite outstanding in the following areas:

- Translating from or into English

- Translating from Ukrainian into Russian and back,

- Navigating foreign websites, in particular buying and other transactions via Internet, using online services, games, correspondence/ chatting, reading reference information, news, blogs, product descriptions, fan club messages etc.

- Help to a specialist in this translation direction in terms of saving time and effort due to keying in the text, correct spelling, attempts at correct word order, and, which is the most important, the lucky find of a translated name, term or phrase

- Reading reference texts (and NOT translating fiction and poems or getting a ready-to-use translation of a document)

- Translating texts written in a clear literary style, built with short simple sentences, where the word order is direct or coincides with the rules in the target language, there are no parenthesis, omitted parts of the sentence etc., no slang, figurative meaning, idioms (especially allusions, irony, twisted idioms and any implications).

Google Translate reminds me of Polaroid cameras or conveyers. It certainly saves time and effort. At present scalability takes precedence of scrupulousness. Coverage of more languages seems to be one of the main goals of the development of the online translator. At the same time it’s the main restraint on the technology degrading the quality of the product. Firstly, almost all translations are made via an intermediate translation into English, certain translation directions are not elaborated despite the fact that direct statistical translation would have been more exact than via broken English, moreover it could have replaced the professionals almost completely in the directions between kindred languages. Secondly, semantic and grammatical accuracy couldn’t be up to the mark when translation variants are selected based on the criteria of being the most possible/ popular match and when cohesion is maintained on the same principles. One cannot make a database of all phrases and their translations to other languages, therefore creating a method to improve the grammar seems unavoidable.

 

See also: Appendix 1.

Читайте полную версию этой статьи на русском:
Совместимость языков и другие ограничения в статистическом переводе Google Translate




Published - August 2012












Submit your article!

Read more articles - free!

Read sense of life articles!

E-mail this article to your colleague!

Need more translation jobs? Click here!

Translation agencies are welcome to register here - Free!

Freelance translators are welcome to register here - Free!








Please see some ads as well as other content from TranslationDirectory.com:


Free Newsletter

Subscribe to our free newsletter to receive news from us:

 
Menu
Recommend This Article
Read More Articles
Search Article Index
Read Sense of Life Articles
Submit Your Article
Obtain Translation Jobs
Visit Language Job Board
Post Your Translation Job!
Register Translation Agency
Submit Your Resume
Find Freelance Translators
Buy Database of Translators
Buy Database of Agencies
Obtain Blacklisted Agencies
Advertise Here
Use Free Translators
Use Free Dictionaries
Use Free Glossaries
Use Free Software
Vote in Polls for Translators
Read Testimonials
Read More Testimonials
Read Even More Testimonials
Read Yet More Testimonials
And More Testimonials!
Admire God's Creations

christianity portal
translation jobs


 

 
Copyright © 2003-2019 by TranslationDirectory.com
Legal Disclaimer
Site Map