Compatibility of Languages and Other Restrictions in the Statistical Translation by Google Translate
This is an abridged translation into English of an article about the Google Translate technology which was written in Russian in December 2011. Prepared by Elena Tikhomirova and Nina Buonaiuto.
Sometimes we wonder whether machines can replace humans in a particular area. The development of information technologies enables that fewer calculations have to be performed manually, computers take over some of the tasks completely and thereby do not require human control. In the field of machine translation, programs mostly serve as a tool for professionals, machine production here still needs improvement. Google offers an innovative technology based on the statistical machine translation method, and previously claimed that it can independently translate a site "at one click" and provide a sufficient understanding of the original. The purpose of this survey is to determine the adequacy of this produce in different translation directions and for various kinds of information, as well as to verify a statement by Google Translate developers that the service should improve significantly over time.
The online translator was tested in March-April 2011 in English, Russian, Ukrainian, French and German. The work on the survey was continued and completed in October-December 2011, when to elaborate the results Polish was added and all language pairs were retested on the same texts.
About the Technology of Googleās Online Translator
Google has posted the following claims on their web resources, read on to see if they prove to be realistic.
Googleās second guiding principle, out of ten, reads as follows: āItās best to do one thing really, really well.ā The Google Translate technology is a direct extension of the search engine, Googleās key service that has won the company its popularity. Googleās online translator like the search engine a) creates a database of texts from the Internet (of the variants of word and phrase matches between different languages) and b) develops algorithms of selecting the best match to the query (to the words and phrases of the source text). This is the statistical approach in machine translation.
Lab Work. Comparing the Translation Results
The translator was tested on texts in the following 6 languages: English, Russian, German, French, Ukrainian, and Polish. 4 languages thereof are synthetic, 2 analytical, 3 Slavonic, 2 Germanic, and 1 Romanic. They make 15 language pairs and 30 translation directions.
I took a text or two in all the six languages and translated them by Google Translate to the remaining five languages. See short extracts of the translations in Appendix 1.
The initial goal was to define the best translation directions and topics, but some observations during analysis showed that the idea was impracticable, and the subject of the article changed to the disclosure of the main principles of the statistical machine translation by Google. They arenāt obvious when you translate a few texts from or into English. But when you compare the results into a number of languages which belong to different linguistic groups, you start to notice certain tendencies.
A. Texts in most language pairs are translated via an intermediate translation into English with the ābroken telephoneā effect
You take a German text and translate it into English, French and Russian. The quality will be different. Why? There may be different guesses and companyās explanations (see above), but the real reason is that texts are first translated into English and this imperfect machine translation result is then rendered into another language. Thereby mistakes of the English version are inherited (whereas there wouldnāt be such errors in the direct translation), ambiguous places often misinterpreted (more mistakes, again), grammatical information in the endings nullified (most important between synthetic languages like German, Russian, Ukrainian, Polish). Indirect translations (via a mediator) are always worse than direct ones, even if a little bit.
B. Statistical translation is best between kindred languages
There was one exception from the previous rule. Russian-Ukrainian translations were made directly, without a mediator, and were excellent, almost professional level with few discrepancies.
Polish is also a Slavonic language, though not as close to Russian as Ukrainian is. Unexpectedly translations from the latter languages into Polish and back were made via English, and were twisted. It means Google doesnāt use this advantage of kindred languages whenever possible. Similarly translating between Turk languages could be just a lark, but would many buy it?
C. English is the āpivotalā language in Google Translate
Googleās translation service is really meant for the English-speaking audience: translations with this language as source or target are best. All other supported languages are connected to the pivotal language only and not to each other, thereās no need to elaborate each and every direction, only those with English. It means quick extensive development and providing some understanding between far languages. English as a lingua franca in machine translation that is.
I have an idea (itās expounded in the Russian version of the article) that English is called-for not only because of the companyās background and the target audience, but due to linguistic reasons as well. Its analytical grammar āsuitsā machine languages. Having strict word order and almost no endings, English facilitates a mathematical approach to analyzing and building phrases. You can combine words like bricks almost without fitting-in, you donāt have to readjust the endings every time you change a word in a structure. Thereby word order rather than word structure indicates the functions of the words (classes) and connections between them. Therefore itās much easier to generate automated text in English than in synthetic languages like Russian. Probably the same is applicable to generating translations.
D. Compatibility with the pivotal language and the best translation directions
To make a rating of the best translation directions in my selection, I had to discard all directions without English as they would automatically be worse quality. For the remaining directions I listed and counted all typical mistakes possible. The results were predictable and coincided with the quality of the test samples. The best were translations between Western European languages (English and German being of the same Germanic group, English and French being both analytical), but they were not better than the almost professional Russian-Ukrainian translations (these two languages being in such a very close affinity to each other).
Thus itās better to use Googleās statistical machine translation between languages in nearest affinity to English. This grammar compatibility barrier isnāt insurmountable but must require more of the traditional approach which implies teaching the machine grammar rules. It means extra interest and effort is needed.
E. Use English source texts with Google Translate
This piece of advice is meant for non-English speakers. When using Google Translate, first translate the text into English, improve the result and only then translate the English text in as many directions as you like. The same ānestlingā method is often used in simultaneous interpretation, translation of movies and masses of documents into multiple languages.
F. Time brings no considerable improvement of the quality of translations
Translations of the same texts made in March, October and December 2011 differ, but no variant conveys the message more clearly or has more correct constructions and translation āhitsā. It discredits Googleās claims (see above) that the quality of translations depends on the number of analyzed texts in the concrete direction and that translations should improve considerably due to this ongoing extensive analysis.
Google Translateās advantages are meant and best for Internet users
Iāve been a bit harsh showing the hidden imperfections, paparazziwise. In the end I must admit that Googleās translations are really worth attention and are not worse than that of the other vendors in spite of all the drawbacks.
I was wondering if such imperfect translation could now be used as a ready product. Could Google Translate substitute humans? And I thought that online translators have already done this to facilitate Internet surfing. People are getting used to worse quality of information and human translation in particular (at least, in Russia), why not then if you get it in an instant and for free?
In regards to sufficient quality or the equality to human specialists, itās too early to say, but the statistical translation technology as it is now doesnāt offer such high hopes.
Any tool has its specific scope of usage for which it was developed and/or in which its qualities best show themselves. Google Translateā results can be quite outstanding in the following areas:
Google Translate reminds me of Polaroid cameras or conveyers. It certainly saves time and effort. At present scalability takes precedence of scrupulousness. Coverage of more languages seems to be one of the main goals of the development of the online translator. At the same time itās the main restraint on the technology degrading the quality of the product. Firstly, almost all translations are made via an intermediate translation into English, certain translation directions are not elaborated despite the fact that direct statistical translation would have been more exact than via broken English, moreover it could have replaced the professionals almost completely in the directions between kindred languages. Secondly, semantic and grammatical accuracy couldnāt be up to the mark when translation variants are selected based on the criteria of being the most possible/ popular match and when cohesion is maintained on the same principles. One cannot make a database of all phrases and their translations to other languages, therefore creating a method to improve the grammar seems unavoidable.
See also: Appendix 1.
Читайте полную версию этой статьи на русском:
Published - August 2012
Please see some ads as well as other content from TranslationDirectory.com: