The Applications of Keywords and Collocations to Translation-Studies and Teaching

A Tentative Research on the Parallel Corpus of the 17th NCCPC Report

Dai Guangrong photoAbstract: This paper investigates the corpora applications to translation studies and teaching. Using the parallel corpus of 17th NCCPC (17th National Congress of the Communist Party of China) as the investigation corpus, this paper disagrees with the opinion of the keywords and collocations which based on an abrupt conclusion without scientific statistic strategy. It is very important for the translator(s) to identify the keywords and key collocations in translation studies and teaching. Keywords provide a useful way to characterize a text or a genre which can offer a clear and comprehensive understanding of the original texts. Collocation is a mode of meaning which can help the translator(s) to decide the field and style of word-usage in translation.

With the help of parallel corpora and some famous native language corpora such as BNC, BROWN etc, we can learn some useful strategies to deal with the collocation of keywords and their translations.

Keywords: corpus-based translation, concordance, collocation, translation teaching and research.

1. Introduction: Corpora Applications to Translation Studies and Teaching

In recent years Translation Studies became a scholarly discipline in China, and translation teaching has become the focus of many translation researchers and teachers. This paper attempts to apply translation corpora to translation studies and teaching, showing the tentative research methods in daily translation studies and teaching.

Different approaches have been taken to Translation Studies, from the earlier workshop approach, the philosophical and linguistic approach, the functionalist approaches, to Descriptive Translation Studies, the post-structuralist and post-modernist approaches, and the cultural studies approach (Munday 2001; Richard Xiao 2008). Translation theories develop quickly, but practice, especially the translation teaching, is lagging far behind theories.

With the help of parallel corpora and some native language corpora, we can learn some useful strategies to deal with the collocation of keywords and their translations.
With the rapid development of corpus linguistics in the mid-1980s, corpus linguistics started to be of interest to translators. Firth is convinced that linguistic description can be used as a basis for translation: linguistic analysis at the grammatical, lexical, collocational and situational levels can be used as a basis for total translation (Firth 1968: 76-78). The Firthian approach, and its latter-day incarnation in corpus linguistics, might bear fruit in studies of translation (Kenny 2001: 21).

Corpora are useful for revealing "relations between frequency and typicality, instance and norm" (Stubbs 2001:151). According to Baker (1993:243), "the most important task that awaits the application of corpus techniques in translation studies is the elucidation of the nature of translated text as a mediated communicative event". Some hypotheses such as "translated texts tend to be more explicit, unambiguous, and grammatically conventional than their source texts" have already been investigated using translation corpora (Baker & Malmkjaer 1998: 52). Furthermore, some scholars point out the usefulness of corpora and corpus linguistic techniques in translation, such as providing a powerful tool to identify the characteristic features of translational language, and helping translators understand what translation is and how it works (Baker 1993:243).

One of the advantages of the corpus-based approach to translation studies and teaching is that it can reveal the "regularities of actual behavior" (Toury 1995:265). Toury stresses the need to observe regularities—and provide explanatory hypotheses for those regularities—on the basis of ever-expanding corpora of texts, claiming that explanations of features observed even in a single translation must rely on the study of bigger corpora (Kenny 2001: 57). Tymoczko (1998:652) predicates that "Corpus Translation Studies is central to the way that Translation Studies as a discipline will remain vital and move forward." This statement has been confirmed by an ever-growing number of corpus-based translation studies (cf. Øverås 1998; Baker 1999; Baker 2000; Granger et al 2003; Wang Kefei 2004; McEnery et al 2006, 2007; Richard Xiao 2008, etc).

In this paper, we will present a tentative study that seeks to uncover some features of collocation and keywords in translation with the parallel corpora including 17th NCCPC—National Congress of the Communist Party of China report.

2. Keywords used in Translation Studies and Teaching

In translation studies and practices, we may encounter all kinds of texts, the contents and plots of some of which are complex and difficult to analyze, especially for the literary texts. It's a great challenge for the beginners to translate the texts word by word, sentence by sentence without a holistic analysis and understanding of the source texts.

In this paper, we examine corpus-based translation studies and teaching. With the help of corpus software, we can retrieve some useful information such as word frequency, collocations, patterns, clusters, dispersion of the search words and other statistical data. In this section, we apply "Keywords" to our research. "Keywords" is program within the corpus software (i.e. Wordsmith) for identifying the key words in one or more texts.

Keywords are those whose frequency is unusually high in comparison with some norm. Key words provide a useful way to characterize a text or a genre. "Keywords" offers an important measure for the content and language characteristics of the texts (Wang 2007: 27). It also can be used for analyzing the style and retrieving texts. The program compares two pre-existing word-lists, which must have been created using the program of WordList in Wordsmith. One of these is assumed to be a large word list which will act as a reference file. The other is the word list based on one text which is to be studied. The aim is to find out which words characterize the text we are most interested in, which is automatically assumed to be the smaller of the two texts chosen. The larger one will provide background data for reference comparison. Using keywords, we can understand the translation materials (Scott 2004: 95). We can analyze the texts which should be translated and obtain important hints such as the theme, main characters and plot. The translator can then form a schema for the content of the source texts and find the proper words for the equivalents.

We once asked students to do some translation practices including the materials of government documents (in this research, we deal with the text of the 17th NCCPC report). Before the discussion of 17th NCCPC, some students found a research paper about the keywords in it. The paper[1] shows the keywords of the Chinese version of the report. Figure 1 is the result of the research:

Keywords of the 17th NCCPC

Figure 1 Keywords of the 17th NCCPC

It reports that the number of sentences with keyword of "party" is 125; "people" 102; "economics" 77; "democracy" 48; "science" 46; "education" 31; "cadre" 27; "army" 19; "income" 13; "supervision" 13; "law" 9; and "herbalist doctor" zero, etc. Judging from the results, we can say that the research is based on the author's imagination or his own purpose. It wants to prove the researcher's point by offering the data but it breaks the rule of objectivity in the research.

Students were puzzled by the result for its subjectivism. In order to help the students to understand the long passage clearly, we use some corpus software to retrieve data. The text of the 17th NCCPC is a kind of government documents and has its specific language characteristics. We make a "Keywords" (17th NCCPC English text as the observed corpus, and "Selected works of Mao Zedong" as the reference corpus) using Wordsmith (Version 4.0). Figure 2 shows part of the result of the Keywords.

Keywords of the 17th NCCPC and Selected works of Mao Zedong

Figure 2: Keywords of the 17th NCCPC and Selected works of Mao Zedong

From Figure 2, we can see the frequency of the key words in the observed corpus and the reference corpus. We can therefore draw the conclusion that: "development" is the most important key word in the 17th NCCPC and this can shed light on theme and help the students to understand the translation material.

We make parallel word lists of the bilingual versions of the 17th NCCPC, retrieving only the content words (not the grammatical or functional words), and some data from the lists. This can shed some light on the contrastive study of the language and translations. Figures 3 and 4 represent the first twenty content words of the English and Chinese versions, respectively.

Key words of the English version Key words of the Chinese version
Figure 3: Key words of the English version Figure 4: Key words of the Chinese version

From figure 3 and figure 4, the two word lists show some difference in the first 20 keywords. In the English version, "development" appears 226 times, and other keywords not included in Figure 3 such as "economy" (36), "government" (36), "education" (35), "force" (34), "management" (34), "progress" (34), "support" (34), "market" (33), "cadre" (31), "need" (31). In Chinese version, "发展(fazhan)" appears 281 times. So, we can see the figures show some difference in the two lists. The results provide some information: the translation has produced changes in the word order, frequency etc. As translation teachers, we should ask the questions: why, where, when and how to change the translation strategies?

Judging from the analysis of the passage, we know that the most important keywords is "发展(development)" in the report. With the data, we can direct the translation practice in many ways.

The students once translated the sentence in Chinese "推动科学发展,促进社会和谐" (tuidong kexue fazhan, cujin shehui hexie) into "Promote the development of science, accelerate social harmony". In the Chinese sentence, "科学 (kexue)" used as an adverbial, modifying "发展 (fazhan)", because "发展(development)" is the theme of the report. This sentence is very important in the whole passage. With the help of this information, we can translate it into "pursue development in a scientific way and promote social harmony".

3. Collocations used in Translation Studies and Teaching

Peter Newmark once pointed out: "He (one who writes or speaks in a foreign language) will be 'caught' every time, not by grammar, which is probably suspiciously 'better' than that of educated natives, not by his vocabulary, which may well be richer, but by his unacceptable or improbable collocations"(1981:180).

In the translating process, when translators transfer the native language into the foreign language, they have to choose the proper collocations which is difficult to decide in many situations, such as the idiomatic usage, restricted collocations etc.

Collocation is not only connected to lexis, but also relates with the cohesion of the whole text. The cohesive effect of collocations is a key factor to avoid "translationese." Halliday and Hasan (1976:286) said, "The cohesive effect of such pairs depends not so much on any systematic relationship as on their tendency to share the same lexical environment, to occur in COLLOCATION with one another. In general, any two lexical items having similar patterns of collocation—that is, tending to appear in similar contexts—will generate a cohesive force if they occur in adjacent sentences." These remarks remind translators of paying attention to the collocations in the translating process; otherwise, they will stumble into the problem of "translationese."

Sinclair once said, "There are virtually no impossible collocations, but some are more likely than others" (1966:411). In this research, we follow Wei Naixing's definition of collocation: A collocation is a conventional syntagmatic association of a string of lexical items which co-occur in a grammatical construct with mutual expectancy greater than chance as realization of non-idiomatic meaning in texts (Wei 2002: 100).

Firth (1957:12) put forward that "You shall know a word by the company it keeps." So, collocation is a mode of expressing meaning: Meaning by collocation is an abstraction at the syntagmatic level and is not directly concerned with the conceptual approach to the meaning of words. One of the meanings of night is its collocability with dark, and of dark, of course, collocates with night (ibid: 196).

Bowker (1998: 631) observes that "corpus-assisted translations are of a higher quality with respect to subject field understanding, correct term choice, and idiomatic expressions."

Here we can retrospect the research which puzzled our students. It says, "This kind of statistic cannot count the collocability, especially the collocation of adjectives and adverbs with the keywords which can explain the importance of the problem."[1] There are many factors which affect collocability, such as semantic, grammatical, and conventional ones. All these factors should be considered in our daily translation practices.

Using parallel corpora, we can decide the proper collocations in the target language. For example, when we encounter the word "结果(jieguo)," the dictionary shows only some limited results and collocations. But a large corpus will avoid these kinds of limitations. A parallel corpus will show us several collocations and it will give the translators several choices to decide on the proper collocation. There are different translations of "结果", such as "result/ outcome/ aftermath/ consequence/ effect/ conclusion/ sequel/ finding/ end" etc. With the parallel corpus, we can find the proper collocation of this keyword.

In translating "前任 (qianren)" into English, students face difficulties in deciding on the collocation even with the help of dictionaries.

With the parallel corpus, we can solve this kind of problem. "前任 (qianren)"as the search item, we can draw some examples like the following:

[1] "Especially with the growth of the amateur program, it's here to stay," says former heavy-weight champion Evander Holyfield.

[2] "You'll recognize the ornaments," said my former daughter-in-law.

[3] Her arrogance has disenchanted many of her former admirers.

[4] He is working hard to excel his predecessors.

[5] John Smith is a past president of our club.

[6] Mr. Heath is the former Prime Minister of Britain.

[7] The saddest were the eight ex-cadres who lost their executive jobs.

[8] The late Prime Minister attended the ceremony.

From these examples, we can say, "前任 (qianren)" is not simply equal to "former", the collocations show that "predecessor/past/ex-/late" etc can used the equivalences (Dai, 2008: forthcoming).

Just as the same issue, when the students encounter the phrases, such as "台湾问题 (Taiwan wenti)", maybe we cannot find the translation from the dictionary, and it is difficult for the beginners to choose the collocations, "Taiwan problem/Taiwan issue/Taiwan question" etc. We can consult the parallel corpora and can obtain abundant examples. Figure 5 shows us the result of input collocation "台湾问题 (Taiwan wenti)" from the parallel corpus of 17th NCCPC.

The Applications of Keywords and Collocations to Translation-Studies and Teaching

Figure 5

In the practice of translating the Chinese version OF THE 17th NCCPC into English, students encounter anumber of collocation problems. We can find a lot of translationese in their translation practice. How to improve this kind of situation? We can use the parallel corpus. First, we suggest to the students to make some KWICs practices from the corpus and learn to find the restricted collocations, such as the Cluster showed in Figure 6:

The Applications of Keywords and Collocations to Translation-Studies and Teaching

Figure 6

Maybe, it's difficult for the beginners to decide the collocation whether it's a strong collocation or has a high mutual information of the collocation. With the help of corpus software, such as Wordsmith, A Corpus Worker's Toolkit (Hongyin Tao) etc, we can obtain the MI value of the collocation. If the students are not sure whether the collocation "scientific development" is proper, we can get the confirmation from the result of Figure 7:

The Applications of Keywords and Collocations to Translation-Studies and Teaching

Figure 7

From the figure, we can have a strong confidence about the collocation of "scientific development" for its mutual information value is 5.81.

4. Conclusions

Corpus-based translation studies and teaching encounter a number of challenges and opportunities in the information era. The present paper has explored a case study of keywords and collocations in corpus-based translation studies and teaching. The keywords can provide some useful information about the materials for translation and help the translators to decide on the translation strategies. The collocation strategy can help the translators to decide on the proper collocations in target language and check the naturalness in the translation with the help of translation corpora. The two aspects give us some confidence in translation and teaching. All these can stimulate the improvement and innovations of translation studies and teaching.


We are grateful to the China National Education Pattern of the Practical Talents Research for supporting our project A Research and Innovation on the English Teaching Pattern Based on Corpus and Campus-Net (Grant Reference FIB070335-A15-11). We also thank the Scientific Research Division of Fujian University of Technology for supporting our project: A Corpus-based Investigation for Chinese Learners' English Writing Competence (Grant Reference GY-S0827).


[1] Accessed on 7-16-2008.


Published - September 2009


