Human Translation vs. Machine Translation: Rise of the Machines

Home

Join as a Member!

Post Your Job - Free!

All Translation Agencies

Advertisements

Human Translation vs. Machine Translation: Rise of the Machines

By Ilya Ulitkin,
an Associate Professor of the Department of Linguistics at Moscow State Regional University,
a freelance Russian-to-English translator,
editor in the Quantum Electronics journal

ulitkin-ilya at yandex ru

Become a member of TranslationDirectory.com at just $12 per month (paid per year)

Abstract

Ilya Ulitkin photo Different approaches to automatic evaluation of machine translation (MT) quality are considered. We describe several methods for automatic evaluation of MT, such as methods based on string matching and n-gram models. The candidate translations done by Google and PROMPT are compared with the reference translation by an automatic translation evaluation program and the results of the evaluation are presented.

Keywords: automatic evaluation, quality of translation, machine translation, BLEU, F-measure, TER.

1. Introduction

The idea of machine translation (MT) of natural languages first appeared in the seventeenth century, but became a reality only at the end of the twentieth century. Today, computer programs are widely used to automate the translation process. Although great progress has been made in the field of machine translation, fully automated translations are far from being perfect. Nevertheless, countries continue spending millions of dollars on various automatic translation programs. In the early 1990s, the U.S. government sponsored a competition among MT systems. Perhaps, one of the valuable outcomes of that enterprise was a corpus of manually produced numerical evaluations of MT quality, with respect to a set of reference translations [1]. The development of MT systems has given impetus to a large number of investigations, thereby encouraging many researchers to seek for reliable methods for automatic MT quality evaluation.

Machine translation evaluation serves two purposes: the relative estimate allows one to find out whether one MT system is better than the other, and the absolute estimate (having a value ranging from 0 to 1) gives an absolute measure of efficiency (for example, when equal to unity, it means perfect translation).

Although great progress has been made in the field of machine translation, fully automated translations are far from being perfect.
However, the development of appropriate methods for numerical MT quality evaluation is a challenging task. In many fields of science, measurable efficiency indices exist, such as, for example, the difference between the predicted and actually observed results. Since natural languages are complicated, an assessment of translation correctness is extremely difficult. Two completely different sequences of words (sentences) can be fully equivalent (e.g., There is a vase on the table and The vase is on the table), and two sequences that differ by a small detail can have completely different meanings (e.g., There is no vase on the table, and There is a vase on the table).

Traditionally, the bases for evaluating MT quality are adequacy (the translation conveys the same meaning as the original text) and fluency (the translation is correct from the grammatical point of view). Most modern methods of MT quality assessment rely on reference translations. Earlier approaches to scoring a ‘candidate’ text with respect to a reference text were based on the idea of similarity of a candidate text (the text translated by an MT system) and a reference text (the text translated by a professional translator), i.e., the similarity score was to be proportional to the number of matching words [2]. At about the same time, a different idea was put forward. It was based on fact that matching words in the right order in the candidate and reference sentences should have higher scores than matching words out of order [3].

Perhaps the simplest version of the same idea is that a candidate text should be rewarded for containing longer contiguous subsequences of matching words. Papineni et al. [4] reported that a particular version of this idea, which they call ‘BLEU,’ correlates very highly with human judgments. Doddington [5] proposed another version of this idea, now commonly known as the ‘NIST’ score. Although the BLEU and NIST measures might be useful for comparing the relative quality of different MT outputs, it is difficult to gain insight from such measures [6].

In this paper we consider different methods of MT quality assessment and analyze the translations of candidate and reference texts. In the following sections, we describe several automatic MT evaluation methods: some of them are based on string matching, others, such as n-gram models, are based on the use of information retrieval. Next, we will assess the quality of translation by using an automatic program.

2. Methods of automatic MT quality evaluation

To date, the main approach to the quality assessment of language models for MT systems relies on the use of statistical methods. In this case, the model is, in fact, a probability distribution on a set of all sentences of a language. Naturally, it is impossible to employ the model in this way; therefore, use is made of more compact algorithms. Let us briefly consider what models are currently used in commercial and experimental systems of MT quality assessment with unlimited dictionaries.

2.1 Method of approximate string matching

In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is a technique of finding strings that match a pattern approximately (rather than exactly). The problem of finding approximate string matching is typically divided into two sub-problems: finding an approximate substring inside a given string and finding dictionary strings that match the pattern approximately [7].

The word error rate (WER) is a metric based on this approach. The WER is calculated as the sum of insertions, deletions, and substitutions, normalized by the length of the reference sentence. If the WER is equal to zero, the translation is identical to the reference text. The main problem lies in the fact that the resulting estimate is not always in the range from 0 to 1. In some cases, when the translation is wrong, the WER can be greater than 1.

Another version of the WER is the WERg metric, in which the sum of insertions, deletions and substitutions is normalized by the Levenshtein distance, i.e., the length of the edits. In information theory and computational linguistics, the Levenshtein distance (editorial distance, or edit distance) between two strings is defined as the minimum number of edits needed to transform one string into the other, with allowable edit operations being insertion, deletion, or substitution of a single character [8]. The advantage of this metric is that the value of the translation quality will always be in the range from 0 to 1 (even in the worst case of coincidence, or in the absence of translation, the value will not exceed unity).

Experiments performed by Blattsom et al. have shown that the WERg metric is not reliable and does not agree with the estimates obtained when the machine translation is analyzed by humans [9].

The position-independent error rate (PER) neglects the order of the words in the string matching operation. In this case, the difference between the candidate text and the reference text, normalized by the length of the reference translation, is calculated [10].

Another metric that is widely used in assessing the translation quality is the translation error rate (TER). This metric makes it possible to measure the number of edits required to change a system output into one of the given reference translations [11].

In fact, any string matching metric can be used for assessing the MT quality. One such example is the “string kernel,” which allows one to take into account different levels of natural language (e.g., morphological, lexical, etc.), or the relationship between synonyms [12].

2.2 N-gram models

In n-gram language models, use is made of an explicit assumption that the probability of the next word in a sentence depends on the previous n-1 words. In practice, the models with n = 1, 2, 3 and 4 are used. For the English language, the most successful are three-gram or four-gram models. Today, almost all systems of MT quality assessment rely on n-gram models. In this case, the probability of the whole sentence is calculated as the product of the probabilities of its constituent n-grams.

The main advantages of n-gram models are their relative simplicity and the possibility of constructing a model that can be trained on a sufficiently large corpus of a language. However, such models are not devoid of drawbacks. The n-gram models make it impossible to simulate semantic and pragmatic relationships in a language. In fact, if a dictionary contains N words, the number of possible pairs of words will be N² . Even if only 0.1% of them actually occur in the language, the minimum volume of the language corpus, necessary to obtain statistically valid estimates, will amount to 125 billion words or about 1 terabyte. For three-gram models, the minimum corpus will reach hundreds of thousands of terabytes [13].

To overcome the drawbacks, use is made of well-developed smoothing techniques, which enables the assessment of the model parameters under the conditions of insufficient or non-existent data.

The main metrics based on n-grams are BLEU, NIST, F-measure, and METEOR.

BLEU (Bilingual Evaluation Understudy) is an algorithm for automatic evaluation of the quality of a machine translation, which is compared to the reference translation, using n-grams. This metric of MT quality assessment was first proposed and implemented by Papineni et al. [4].

Measuring translation quality is a challenging task, primarily due to the lack of definition of an ‘absolutely correct’ translation. The most common technique of translation quality assessment is to compare the output of automated and human translations of the same document. But this is not as simple as may seem: One translator’s translation may differ from that of another translator. This inconsistency between different reference translations presents a serious problem, especially when different reference translations are used to assess the quality of automated translation solutions.

A document translated by specially designed automated software can have a 60% match with the translation done by one translator and a 40% match with that of another translator. Although both professional translations are technically correct (they are grammatically correct, they convey the same meaning, etc.), 60% overlap of words is a sign of higher MT quality. Thus, although reference translations are used for comparison, they cannot be a completely objective and consistent measurement of the MT quality.

The BLEU metric scores the MT quality on a scale from 0 to 1. The closer the score to unity, the greater is the overlap with the reference translation and, therefore, the better the MT system. To cut the long story short, the BLEU metric measures how many words coincide in the same line, with the best score given not to matching words but to word sequences. For example, a string of four words in the translation that matches the human reference translation (in the same order) will have a positive impact on the BLEU score and is weighted more heavily (and scored higher) than a one- or two-word match [14].

The NIST (National Institute of Standards and Technology) precision measure is a metric used to evaluate the MT variants [5]. NIST was intended as an improved version of BLEU. In this case, the arithmetic mean of n-grams is calculated. An important difference from the BLEU metric is the fact that NIST also relies on the frequency component (precision and recall). If BLEU simply calculates the n-gram precision by adding an equal weight for each exact match, NIST also calculates how informative each matching n-gram is.

For example, even if the bigram ‘on the’ coincides with the same phrase in the reference text, the translation still receives a lower score than the correct matching of the bigram ‘size distribution,’ because the latter phrase is less likely to occur.

The F-measure is a metric which calculates the harmonic mean of precision and recall [15]. The metric is based on the search for the best match between the candidate and reference translations (the ratio between the total number of matching words to the length of the translation and the reference text). Sometimes it is useful to combine the precision and recall of the same averaged value [16].

The metric for evaluation of translation with explicit ordering (METEOR) is an improved version of the F-measure [17]. This system was designed to address some of the weaknesses in the BLEU metric. The METEOR scores the output by matching the automated and reference translations word-for-word. When more than one reference translation is available, the automated translation is compared with each of them and the best result is reported [18].

One can have different attitudes to the different metrics, but at this point BLEU, METEOR and NIST are most widely used. It is these metrics that are compared with all the other MT quality assessment systems. The developers of the F-measure claim that their metric shows the best agreement with the assessment made by a human [15]. However this is not always the case. The F-measure does not work well with the smallest average edit distance [9]. Empirical data show that more attention should be paid to the completeness (recall) of the translation. Studies suggest that the recall is most often the parameter, which allows one to determine the quality of translation [17].

3. Automatic evaluation of the quality of statistic (Google) and rule-based (Prompt) MT systems

Translation is an intellectual challenge, and, therefore, skepticism about the possibility of using a computer for automated translation is quite natural. However, the creators of MT systems have managed to endow their systems with a form of understanding, and machine translation now belongs to a class of artificial intelligence programs.

Currently, we can speak of two approaches to written translation: the first one is machine translation based on the rules of the source and target languages and the second approach involves statistical machine translation.

The earliest “translation engines” in machine-based translations were all based on the direct, so-called “transformer,” approach. Input sentences of the source language were transformed directly into output sentences of the target language, using a simple form of parsing. The parser did a rough analysis of the source sentence, dividing it into subject, object, predicate, etc. Source words were then replaced by target words selected from a dictionary, and their order rearranged so as to comply with the rules of the target language. This approach was used for a long time, only to be finally replaced by a less direct approach, which is called “linguistic knowledge.” Modern computers, which have more processing power and more memory, can do what was impossible in the 1960s. Linguistic-knowledge translators have two sets of grammar rules: one for the source language, and the other for the target language. In addition, modern computers analyze not only grammar (morphological and syntactic structure) of the source language but also the semantic information. They also have information about the idiomatic differences between the languages, which prevents them from making silly mistakes. The representative of rule-base approach to machine translation is the Prompt software developed by the leading Russian developer of linguistic IT solutions.

The second approach is based on a statistical method: by analyzing a large number of parallel texts (identical texts in the source and target languages), the program selects the variants that coincide most often and uses them in the translation. It does not apply grammatical rules, since its algorithms are based on statistical analysis rather than traditional rule-based analysis. In addition, the lexical units here are word combinations, rather than separate words. One of the well-known examples of this approach is “Google Translate,” which is based on an approach called statistical machine translation. However, the translated sentences are sometimes so discordant that it is impossible to understand them [19].

In this section using concrete examples we will compare the quality of translations made by such MT systems as Google ( http://translate.google.ru/) and Prompt (www.translate.ru).

For the analysis, we selected five titles, abstracts, and keywords from the ‘Kvantovaya Elektronika’ journal [20], which is first published in Russian and then translated into English by a group of professional translators.

Text 1

Эволюция функции распределения наночастиц Au в жидкости под действием лазерного излучения

Аннотация . Теоретически и экспериментально исследован процесс фрагментации наночастиц в жидкости под действием импульсного лазерного нагрева. Моделирование процесса проведено на основе решения кинетического уравнения для функции распределения наночастиц по размерам с учетом температурной зависимости теплофизических параметров среды. Показано, что фрагментация происходит через отделение от расплавленной наночастицы фрагментов меньшего размера. Результаты моделирования находятся в хорошем согласии с экспериментальными данными, полученными при фрагментации наночастиц золота в воде под действием излучения лазера на парах меди при пиковой интенсивности излучения в среде ~106 Вт/см2.

Ключевые слова : наночастицы, коллоидные растворы, лазерная абляция металлов, плазмонный резонанс, фрагментация.

Text 2

Взаимодействие неколлинеарных фемтосекундных лазерных филаментов в сапфире

Аннотация . Численно и экспериментально исследовано взаимодействие двух когерентных фемтосекундных лазерных импульсов, распространяющихся под малым углом друг к другу в кристалле сапфира в режиме филаментации. Получены распре деления поверхностной плотности энергии и концентрации свободных электронов в образующихся лазерно-плазменных каналах. Обнаружено образование дополнительных филаментов вне плоскости первоначального распространения импульсов.

Ключевые слова : филаментация, фемтосекундное излучение, лазерная плазма, взаимодействие филаментов.

Text 3

Влияние электрического поля на приповерхностные процессы при лазерной обработке металлов

Аннотация . Показано, что при изменении напряженности внешнего электрического поля различной полярности от 0 до 106 В/м в ходе воздействии лазерного излучения со среднй плотностью потока ~106 Вт/см2 на поверхности ряда металлов (Cu, Al, Sn, Pb) изменение особенностей эволюции плазменного факела на ранних стадиях носит количественный, а не качественный характер. В то же время характерные размеры капель вещества мишени, вынесенных из облученной зоны, существенно (в несколько раз) уменьшаются при увеличении амплитуды напряженности внешнего электрического поля независимо от его полярности.

Ключевые слова : лазерное излучение, электрическое поле, плазмообразование, гравитационно-капиллярные волны.

Text 4

Об ассоциациях невзаимодействующих частиц (кристаллоподобные нейтронные структуры)

Аннотация . Обсуждается физическая реализуемость ассоциаций невзаимодействующих друг с другом частиц, возникающая в соответствии с соотношением неопределенности при ’корпоративном’ пространственном ограничении ансамбля частиц в целом. Рассмотрение проводится на примере ансамбля ультрахолодных нейтронов, помещенных в общую потенциальную яму бесконечной глубины. Представлены количественные оценки и указаны ожидаемые свойства образующихся кристаллоподобных пространственно-периодических структур.

Ключевые слова : квантовая нуклеоника, ультрахолодные нейтроны, лазерные способы производства ультрахолодных нейтронов, нейтронные ассоциации, нейтроны в потенциальной яме бесконечной глубины.

Text 5

Эллиптически поляризованные кноидальные волны в среде с пространственной дисперсией кубической нелинейности

Аннотация . Найдены новые частные аналитические решения системы нелинейных уравнений Шредингера, соответствующие эллиптически поляризованным кноидальным волнам в изотропной гиротропной среде с пространственной дисперсией кубической нелинейности и частотной дисперсией второго порядка при выполнении условий формирования волноводов единого профиля для каждой из циркулярно поляризованных компонент светового поля.

Ключевые слова : кубическая нелинейность, пространственная дисперсия, нелинейные уравнения Шредингера, эллиптическая поляризация, кноидальные волны.

The corresponding translations were taken from http://iopscience.iop.org/1063-7818/42/2 .

For an automatic analysis, we used the relevant software that is publicly available from http://www.languagestudio.com/LanguageStudioDesktop.aspx#Pro.

Language Studio^TMLite is a free tool that provides key metrics for translation quality. This tool can be used to measure not only the quality, but also the improvements in quality because custom translation engines are constantly being updated via the quality improvement feedback cycle. Language Studio^TMLite currently supports such metrics as BLEU, F-Measure, and TER.

From the point of view of syntax, the abstracts presented for the analysis are characterized mainly by simple sentences, i.e., smth is presented or smth is investigated. Besides, most frequently used are compound sentences with an object clause, for example, it is shown that ... or it is found that … . As to the vocabulary, translators most often use one-word terms—waveguide, two-word terms—light wave, uncertainty relation, and three-word terms—target material droplets, whereas four-word terms—crystal-like spatially periodic structure—are extremely rare.

For the program to correctly score the translations, we preliminary processed the reference translations and candidate translations made by Google and PROMPT. Each sentence started a new paragraph, and the texts were converted into .txt format.

Initially, we compared the reference translation and the outputs from Google and PROMPT, using n-gram metrics. The results of the translation evaluation summary are presented below.

Translation Evaluation Summary

Job Start Date:	9/18/2012 1:52 PM
Job End Date:	9/18/2012 1:52 PM
Job Duration:	0 min(s) 2 sec(s)
Reference File:	reference_1.txt
Candidate File:	candidate_google_1.txt
Evaluation Lines:	28
Tokenization Language:	EN

Results Summary:	62.554

Reference	Candidate	1 Gram	2 Gram	3 Gram	4 Gram	Score
1. Evolution of the distribution function of Au nanoparticles in a liquid under the action of laser radiation	1. The evolution of the distribution function of Au nanoparticles in a liquid under the action of laser radiation	18/19	16/18	15/17	14/16	92.490
2. Abstract.	2. Abstract.	3/3	2/2	1/1	0/0	100.000
3. Fragmentation of nanoparticles in a liquid under the action of pulsed laser heating is studied theoretically and experimentally.	3. Studied theoretically and experimentally the process of fragmentation of nanoparticles in a liquid under the action of pulsed laser heating.	19/22	15/21	13/20	11/19	73.753
4. Fragmentation is simulated by solving the kinetic equation for the nanoparticle size distribution function, taking into account the temperature dependence of the thermophysical parameters of the medium.	4. Simulation of the process carried out by solving the kinetic equation for the distribution function of nanoparticles size, taking into account the temperature dependence of the thermophysical parameters of the medium.	27/34	22/33	18/32	16/31	67.629
5. It is shown that fragmentation occurs after separation of smaller fragments from a molten nanoparticle.	5. It is shown that fragmentation occurs after separation from the molten fragments of smaller nanoparticles.	15/17	9/16	7/15	6/14	58.502
6. The simulation results are in good agreement with experimental data obtained in the fragmentation of gold nanoparticles irradiated in water by a copper vapor laser with a peak radiation intensity of about ##.	6. The simulation results are in good agreement with experimental data obtained in the fragmentation of gold nanoparticles in water under irradiation of copper vapor laser with a peak intensity of the radiation in the environment of about ##.	33/41	28/40	23/39	20/38	69.933
7. Keywords: nanoparticles, colloids, laser ablation of metals, plasmon resonance, fragmentation.	7. Keywords: nanoparticles, colloids, laser ablation of metals, plasmon resonance, the fragmentation.	17/18	15/17	13/16	12/15	88.671
8. Interaction of noncollinear femtosecond laser filaments in sapphire	8. Noncollinear interaction of femtosecond laser filaments in sapphire	9/9	5/8	3/7	2/6	59.673
9. Abstract.	9. Abstract.	3/3	2/2	1/1	0/0	100.000
10. The interaction of two coherent femtosecond laser pulses, propagating at a small angle with respect to each other in a sapphire crystal in the filamentation regime, has been investigated numerically and experimentally.	10. Numerically and experimentally investigated the interaction of two coherent femtosecond laser pulses propagating at a small angle to each other in the sapphire crystal in the regime of filamentation.	29/31	19/30	14/29	9/28	54.746
11. Distributions of the fluence and free-electron density in the laser-plasma channels formed in the crystal are obtained.	11. Obtained by dividing the distribution of the surface energy density and the concentration of free electrons in laser-produced plasma channels.	15/24	3/23	0/22	0/21	19.291
12. Additional filaments are found to form outside the plane of initial pulse propagation.	12. Revealed the formation of additional filaments outside the plane of the initial distribution of momenta.	9/17	4/16	2/15	1/14	26.224
13. Keywords: filamentation, femtosecond radiation, laser plasma, filament interaction.	13. Keywords: filamentation, femtosecond radiation, laser plasma interaction of filaments.	12/14	9/13	8/12	7/11	71.312
14. Influence of an electric field on near-surface processes in laser processing of metals	14. Effect of electric field on the near-surface processes during laser processing of metals	13/16	8/15	5/14	2/13	46.421
15. Abstract.	15. Abstract.	3/3	2/2	1/1	0/0	100.000
16. It is shown that by varying the external electric field with different polarity from 0 to ## in the course of laser processing with the mean radiation flux density ## the change in the evolution features of the plasma torch at the surface of some metals (Cu, Al, Sn, Pb) at early stages is quantitative rather than qualitative.	16. It is shown that when the external electric field of opposite polarity from 0 to ## during the action of laser radiation with an average flux density of ## at the surface of some metals (Cu, Al, Sn, Pb) modified features of the evolution of the plasma torch in the early stages is quantitative, not qualitative.	54/66	39/65	27/64	20/63	53.525
17. At the same time the characteristic size of the target material droplets, carried out from the irradiated zone, becomes essentially (by several times) smaller as the amplitude of the external electric field strength grows, independently of its polarity.	17. At the same time, the characteristic droplet size of the target material, made from the irradiated zone, significantly (several times) decrease with increasing amplitude of the external electric field, regardless of its polarity.	33/41	23/40	16/39	10/38	48.883
18. Keywords: laser radiation, electric field, plasma formation, gravity-capillary waves.	18. Keywords: laser light, electric field, plasma formation, gravity-capillary waves.	16/17	14/16	12/15	10/14	83.262
19. On associations of noninteracting particles (crystal-like neutron structures)	19. On associations of non-interacting particles (neutron crystal-structure)	10/14	5/13	2/12	1/11	35.248
20. Abstract.	20. Abstract.	3/3	2/2	1/1	0/0	100.000
21. We discuss the physical feasibility of association of particles noninteracting with each other, which arises in accordance with the uncertainty relation under the ’corporate’ spatial confinement of the particle ensemble as a whole.	21. We discuss the physical realizability of association of non-interacting particles with each other, which arises in accordance with the uncertainty in the ’corporate’ spatial limitation of the particle ensemble as a whole.	33/39	27/38	22/37	17/36	66.469
22. Investigation is conducted by the example of an ensemble of ultracold neutrons placed in a common potential well of infinite depth.	22. Examination conducted by the example of an ensemble of ultracold neutrons placed in a common potential well of infinite depth.	21/22	19/21	18/20	17/19	89.173
23. We present quantitative estimates and indicate the expected properties of the arising crystal-like spatially periodic structures.	23. Quantitative estimates and expectations are the properties of the crystal-space-periodic structures.	10/14	5/13	2/12	0/11	25.854
24. Keywords: quantum nucleonics, ultracold neutrons, laser methods of production of ultracold neutrons, neutron associations, and neutrons in the potential well of infinite depth.	24. Keywords: quantum nucleonics, ultracold neutrons, laser methods of production of ultracold neutrons, neutron Association, the neutrons in the potential well of infinite depth.	28/30	25/29	23/28	21/27	84.865
25. Elliptically polarized cnoidal waves in a medium with spatial dispersion of cubic nonlinearity	25. Elliptically polarized cnoidal waves in media with spatial dispersion of cubic nonlinearity	12/13	10/12	8/11	6/10	73.900
26. Abstract.	26. Abstract.	3/3	2/2	1/1	0/0	100.000
27. We present new specific analytic solutions of a system of nonlinear Schrodinger equations, corresponding to elliptically polarized cnoidal waves in an isotropic gyrotropic medium with spatial dispersion of cubic nonlinearity and second-order frequency dispersion under the conditions of formation of the waveguides of the same type for each of the circularly polarized components of the light field.	27. We present new analytic solutions of partial system of nonlinear Schrödinger equations, corresponding to an elliptically polarized cnoidal waves in an isotropic gyrotropic medium with spatial dispersion of cubic nonlinearity and frequency dispersion of the second order under the conditions of formation of the waveguides single profile for each of the circularly polarized components of the light field.	56/63	45/62	36/61	29/60	67.754
28. Keywords: cubic nonlinearity, spatial dispersion, nonlinear Schrodinger equations, elliptic polarization, cnoidal waves.	28. Keywords: cubic nonlinearity, spatial dispersion, nonlinear Schrodinger equation, elliptic polarization, the cnoidal wave.	17/20	13/19	11/18	9/17	68.713

-- Report End --

Translation Evaluation Summary

Job Start Date:	9/18/2012 1:53 PM
Job End Date:	9/18/2012 1:53 PM
Job Duration:	0 min(s) 2 sec(s)
Reference File:	reference_1.txt
Candidate File:	candidate_prompt_1.txt
Evaluation Lines:	28
Tokenization Language:	EN

Results Summary:	35.528

Reference	Candidate	1 Gram	2 Gram	3 Gram	4 Gram	Score
1. Evolution of the distribution function of Au nanoparticles in a liquid under the action of laser radiation	1. Evolution of function of distribution of nanoparticles of Au in liquid under the influence of laser radiation	15/18	8/17	3/16	0/15	37.286
2. Abstract.	2. Summary.	2/3	0/2	0/1	0/0	22.222
3. Fragmentation of nanoparticles in a liquid under the action of pulsed laser heating is studied theoretically and experimentally.	3. Theoretically also process of fragmentation of nanoparticles in liquid under the influence of pulse laser heating is experimentally investigated.	15/21	7/20	4/19	1/18	34.101
4. Fragmentation is simulated by solving the kinetic equation for the nanoparticles size distribution function, taking into account the temperature dependence of the thermophysical parameters of the medium.	4. Modeling of process is carried out on the basis of the solution of the kinetic equation for function of distribution of nanoparticles in the sizes taking into account temperature dependence of heatphysical parameters of the environment.	22/38	10/37	5/36	1/35	28.465
5. It is shown that fragmentation occurs after separation of smaller fragments from a molten nanoparticle.	5. It is shown that fragmentation occurs through separation from the melted nanoparticle of fragments of the smaller size.	14/20	6/19	5/18	4/17	41.519
6. The simulation results are in good agreement with experimental data obtained in the fragmentation of gold nanoparticles irradiated in water by a copper vapor laser with a peak radiation intensity of about ##.	6. Results of modeling are in a good consent with the experimental data received at fragmentation of nanoparticles of gold in water under the influence of radiation of the laser on pairs of copper at peak intensity of radiation in the environment of ~##.	27/47	9/46	1/45	0/44	22.454
7. Keywords: nanoparticles, colloids, laser ablation of metals, plasmon resonance, fragmentation.	7. Keywords: nanoparticles, colloidal solutions, laser ablyatsiya of metals, plazmonny resonance, fragmentation.	14/18	10/17	6/16	3/15	50.002
8. Interaction of noncollinear femtosecond laser filaments in sapphire	8. Interaction of not collinear femtosekundny laser filament in sapphire	6/10	3/9	1/8	0/7	27.947
9. Abstract.	9. Summary.	2/3	0/2	0/1	0/0	22.222
10. The interaction of two coherent femtosecond laser pulses, propagating at a small angle with respect to each other in a sapphire crystal in the filamentation regime, has been investigated numerically and experimentally.	10. Chislenno also experimentally investigated interaction of two coherent femtosekundny laser impulses extending under a small corner to each other in a crystal of sapphire in a mode of a filamentatsiya.	19/32	8/31	5/30	3/29	26.357
11. Distributions of the fluence and free-electron density in the laser-plasma channels formed in the crystal are obtained.	11. Conflicts of division of superficial density of energy and concentration of free electrons in being formed laser and plasma channels are received.	12/24	1/23	0/22	0/21	13.877
12. Additional filaments are found to form outside the plane of initial pulse propagation.	12. Formation of additional filament out of the plane of initial distribution of impulses is revealed.	7/17	3/16	2/15	1/14	21.432
13. Keywords: filamentation, femtosecond radiation, laser plasma, filament interaction.	13. Keywords: Filamentatsiya, femtosekundny radiation, laser plasma, interaction of filament.	12/15	6/14	4/13	2/12	44.149
14. Influence of an electric field on near-surface processes in laser processing of metals	14. Influence of electric field on pripoverkhnostny processes at laser processing of metals	11/13	7/12	4/11	1/10	42.105
15. Abstract.	15. Summary.	2/3	0/2	0/1	0/0	22.222
16. It is shown that by varying the external electric field with different polarity from 0 to ## in the course of laser processing with the mean radiation flux density ## the change in the evolution features of the plasma torch at the surface of some metals (Cu, Al, Sn, Pb) at early stages is quantitative rather than qualitative.	16. It is shown that at change of intensity of external electric field of various polarity from 0 to 106 ## in a course impact of laser radiation with average density of a flow of ~## on a surface of a number of metals (Cu, Al, Sn, Pb) change of features of evolution of a plasma torch at early stages carries quantitative, instead of qualitative character.	48/76	28/75	17/74	10/73	36.477
17. At the same time the characteristic size of the target material droplets, carried out from the irradiated zone, becomes essentially (by several times) smaller as the amplitude of the external electric field strength grows, independently of its polarity.	17. At the same time the characteristic sizes of drops of substance of the target, taken out of the irradiated zone, essentially (several times) decrease at increase in amplitude of intensity of external electric field irrespective of its polarity.	30/44	21/43	12/42	6/41	39.596
18. Keywords: laser radiation, electric field, plasma formation, gravity-capillary waves.	18. Keywords: laser radiation, electric field, plazmoobrazovaniye, gravitational and capillary waves.	13/16	10/15	8/14	6/13	60.731
19. On associations of noninteracting particles (crystal-like neutron structures)	19. About associations of noninteracting particles (kristallopodobny neutron structures)	9/11	6/10	4/9	2/8	47.946
20. Abstract.	20. Summary.	2/3	0/2	0/1	0/0	22.222
21. We discuss the physical feasibility of association of particles noninteracting with each other, which arises in accordance with the uncertainty relation under the ’corporate’ spatial confinement of the particle ensemble as a whole.	21. Physical feasibility of associations noninteracting with each other the particles, arising according to an uncertainty ratio is discussed at ’corporate’ spatial restriction of ensemble of particles as a whole.	23/34	12/33	7/32	3/31	31.964
22. Investigation is conducted by the example of an ensemble of ultracold neutrons placed in a common potential well of infinite depth.	22. Consideration is carried out on an example of ensemble of the ultracold neutrons placed in the general potential hole of infinite depth.	17/24	8/23	4/22	2/21	34.064
23. We present quantitative estimates and indicate the expected properties of the arising crystal-like spatially periodic structures.	23. Quantitative estimates are presented and expected properties of being formed kristallopodobny spatial and periodic structures are specified.	10/19	4/18	1/17	0/16	19.655
24. Keywords: quantum nucleonics, ultracold neutrons, laser methods of production of ultracold neutrons, neutron associations, and neutrons in the potential well of infinite depth.	24. Keywords: quantum a nukleonik, ultracold neutrons, laser ways of production of ultracold neutrons, neutron associations, neutrons in a potential hole of infinite depth.	25/30	19/29	14/28	10/27	58.972
25. Elliptically polarized cnoidal waves in a medium with spatial dispersion of cubic nonlinearity	25. Elliptically the polarized knoidalny waves in the environment with spatial dispersion of cubic nonlinearity	11/15	7/14	4/13	3/12	46.451
26. Abstract.	26. Summary.	2/3	0/2	0/1	0/0	22.222
27. We present new specific analytic solutions of a system of nonlinear Schrodinger equations, corresponding to elliptically polarized cnoidal waves in an isotropic gyrotropic medium with spatial dispersion of cubic nonlinearity and second-order frequency dispersion under the conditions of formation of the waveguides of the same type for each of the circularly polarized components of the light field.	27. New private analytical decisions of system of the nonlinear equations of Schrodinger, corresponding elliptically to the polarized knoidalny waves are found in the isotropic girotropny environment with spatial dispersion of cubic nonlinearity and frequency dispersion of the second order at performance of conditions of formation of wave guides of a uniform profile for each of tsirkulyarno polarized a component of a light field.	45/66	19/65	9/64	5/63	30.796
28. Keywords: cubic nonlinearity, spatial dispersion, nonlinear Schrodinger equations, elliptic polarization, cnoidal waves.	28. Keywords: cubic nonlinearity, spatial dispersion, Schrodinger’s nonlinear equations, elliptic polarization, knoidalny waves.	18/20	13/19	10/18	8/17	67.052

-- Report End --

The results of comparison show that Google scored 62.554, while the PROMPT scored only 35.528. All this suggests that Google copes well with the vocabulary, while PROMPT experiences some difficulties in translating unknown words (however, we believe that proper training of this MT system may yield better results). In fact, this is not surprising, since statistical translation relies on n-gram models. All the advantages of statistical systems manifest themselves when the system is trained for a sufficiently long time and high-quality corpora of parallel texts are available. Moreover, qualified linguists are not required in this case, and the system can be trained during its operation. These systems have however some drawbacks: Large parallel corpora of texts are needed for training; such systems rely on a complex mathematical apparatus; high-quality translation is only possible for phrases that match the n-gram model, and translation strongly depends on the corpora, which were used for training.

The second analysis was performed using metrics such as BLEU, F-measure and TER. The two outputs were compared simultaneously with the reference translation. As a consequence, we have the following results:

Translation Evaluation Summary

Job Start Date:	9/18/2012 1:50 PM
Job End Date:	9/18/2012 1:50 PM
Job Duration:	0 min(s) 6 sec(s)
Number of Reference Files:	1
Number of Candidate Files:	2
Evaluation Lines:	28
Tokenization Language:	EN
Evaluation Metrics:	BLEU, F-Measure, TER (Inverted Score)

Results Summary

Candidate File:
BLEU Case Sensitive
BLEU Case Insensitive
F-Measure Case Sensitive
F-Measure Case Insensitive
TER Case Sensitive
TER Case Insensitive

1	2
28.41	59.59
29.83	61.72
67.62	83.09
68.89	84.72
45.77	67.26
46.42	68.08

Candidate Files:
1 : candidate_prompt_1.txt 2 : candidate_google_1.txt

Reference Files:
1 : reference_1.txt

Candidate File 1:		candidate_prompt_1.txt
	*BLEU*		*F-Measure*	*TER*
*Case Sensitive:*	28.41		67.62	45.77
*Case Insensitive:*	29.83		68.89	46.42

Candidate File 2:		candidate_google_1.txt
	*BLEU*		*F-Measure*	*TER*
*Case Sensitive:*	59.59		83.09	67.26
*Case Insensitive:*	61.72		84.72	68.08

-- Report End --

As in the previous test, Google shows better results, which is not surprising, because scientific texts are highly standardized. The syntactic features of scientific and technical texts include syntax and semantic completeness, frequent use of clichéd structures, a comprehensive system of connecting elements (coordinating and subordinating conjunctions), etc. Scientific speech is characterized by complicated syntax, which is reflected in the use of sophisticated coordinated and subordinated sentences and in the complexity of simple sentences, mainly with appositives. In adddition, scientific and technical texts are characterized, first of all, by the frequent use of highly specialized and scientific terms. This is explained by the fact that scientific terminology evolves due to the need for experts in a field to communicate with precision and brevity, but often has the effect of excluding those who are unfamiliar with the particular specialized language of the group. Modern terminology is accurate, efficient, nominative, stylistically neutral, and lacks emotional bias.

All the above-said allows Google to cope so well with standardized texts. Nevertheless, it should be noted that PROMPT does much better job when it comes to grammar. Thus, there are more grammatically correct sentences in the PROMPT output than in the Google output. This is not surprising, because PROMT relies on rule-based machine translation (RBMT). RBMT is based on linguistic description of two natural languages (bilingual dictionaries and other databases containing morphological, grammatical and semantic information), formal grammars, and proper translation algorithms. The quality of translation depends on the size of linguistic databases (dictionaries) and depth of description of natural languages [21].

4. Conclusions

An overview of the most commonly used metrics of MT evaluation is presented. Automatic evaluation of MT quality by such metrics as BLEU, F-measure, and TER has significantly improved statistical MT. Typically, these metrics show good correlation of candidate translations with reference translations. One of the major drawbacks of these metrics is that they cannot provide an assessment of the MT quality at the semantic or pragmatic levels. Nevertheless, at the present these metrics are the only systems of automatic translation quality assessment.

The quality of the outputs from Google and PROMPT is compared with the reference translations, using n-gram models and different metrics. In both cases, the Google output shows good correlation with the reference translation. The best match is registered at the vocabulary level which is to be expected, because the basis of the statistical translation is the n-gram model. The worst results in terms of grammar is also shown by Google, which is also understandable because PROMPT relies on the RBMT-model in which translation depends on the size of linguistic databases (dictionaries) and the depth of description of natural languages, i.e., the maximum number of features of grammatical structures.

Since the translation into English is a priority for Google, this MT system is constantly being improved. All this suggests that the potential of transfer translation systems will be sooner or later exhausted, while the translation quality of statistical MT systems will eventually improve. Nevertheless, we believe that in the future, machine translation will combine these two—rule-based and statistical—approaches, as well as the universal semantic hierarchy (USH) approach [22] in order to produce a correct translation.

The development of efficient and reliable evaluation metrics MP has been actively investigated in recent years. One of the most important tasks is to go beyond the N-gram statistics, while continuing to use a fully automatic regime. The need for a fully automated metric cannot be underestimated, as it should provide the highest rate of development and progress of MT systems.

Acknowledgements
The author thanks S.N. Vekovishcheva for valuable advice during the preparation of the manuscript.

References

1. White, J., O’Connell, T., and Carlson, L. (1993) “Evaluation of Machine Translation.” In Human Language Technology: Proceedings of the Workshop (ARPA), pp 206–210.

2. Melamed, I.D. (1995) “Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons.” In Third Workshop on Very Large Corpora (WVLC3), pp 184–198, Boston.

3. Brew, C., and Thompson, H. (1994) “Automatic Evaluation of Computer Generated Text: A Progress Report on the TextEval Project.” In Human Language Technology: Proceedings of the Workshop (ARPA/ISTO), pp 108–113.

4. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J.. (2002) “BLEU: a Method for Automatic Evaluation of Machine Translation.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp 311–318, Philadelphia.

5. Doddington, G. (2002) “Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics.” In Human Language Technology: Notebook Proceedings, pp 128–132, San Diego.

6. Turian, J.P., Shen, L., and Melamed, I.D. (2003) ”Evaluation of Machine Translation and its Evaluation.” In Proceedings of MT Summit IX; New Orleans, USA, 23-28 September 2003.

7. http://en.wikipedia.org/wiki/Approximate_string_matching

8. http://en.wikipedia.org/wiki/Levenshtein_distance

9. Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. (2004) “Confidence Estimation for Machine Translation.” In Proceedings of COLING, pp 315–321, Geneva.

10. http://en.wikipedia.org/wiki/Evaluation_of_machine_translation

11. http://www.lrec-conf.org/proceedings/lrec2008/pdf/785_paper.pdf

12. Cancedda, N., and Yamada, K. (2005). “Method and Apparatus for Evaluating Machine Translation Quality.” US Patent Application 20050137854.

13. http://www.intsys.msu.ru/invest/speech/articles/rus_lm.htm

14. http://www.languagestudio.com/TranslationQualityMetrics.aspx

15. Melamed, I.D., Green, R., and Turian, J.P. (2003) “Precision and Recall of Machine Translation.” In Proc. HLT-03, pp 61–63.

16. http://ru.wikipedia.org/wiki/Информационный_поиск

17. Lavie, A., Sagae, K., and Jayaraman, S. (2004) “The Significance of Recall in Automatic Metrics for MT Evaluation.” Proceedings of the Sixth Conference of the Association for Machine Translation in the Americas (AMTA’04), pp 134–143.

18. Banerjee, S., and Lavie, A. (2007) “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments.” In Proceedings of the Second Workshop on Statistical Machine Translation, pp 228–231, Prague.

19. Ulitkin, I. (2011) “Computer-assisted Translation Tools: A Brief Review.” Translation Journal, Vol. 15, No. 1, January 2011.

20. http://www.quantum-electron.ru/

21. http://ru.wikipedia.org/wiki/ПРОМТ

22. http://www.abbyy.ru/science/technologies/business/compreno/

Published - March 2013

This article was originally published at Translation Journal (http://translationjournal.net/journal/).

Submit your article!