The World Wide Web and Translators

By Tengku Sepora Tenkgku Mahadi,
Helia Vaezian,
Mahmoud Akbari

School of Languages, Literacies and Translation,
Universiti Sains Malaysia, Penang, Malaysia

tsepora at usm my
heliavaezian at yahoo com
mahmoud akbari at gmail com

In today’s translation market with online jobs and tight deadlines, the Internet has turned into one of the new resources translators commonly use to obtain fast and easy access to translationally-relevant information. The online translation resources translators draw on may vary from online glossaries to online corpora. This, however, is not the only application of the Web for translators. In fact, the Web itself is a valuable source of linguistic information for translators far beyond what dictionaries have to offer. The core of this new application of the Web as a source of linguistic information is based on the concept of the Web as a corpus. The present paper discussing the concept of the Web as a corpus elaborates on the applications of the World Wide Web for translators.

There is no doubt that the Internet is one of the greatest inventions of the 20^th century which has had a tremendous impact on our everyday lives. It not only has eased the access of information, but has also provided a new mode of communication across the globe. This valuable source of information has also a lot to offer translators. The present paper elaborates on the applications of the Internet for translators. It further focuses on the concept of the Web as a big corpus and its implications for translators.

Some years ago, dictionaries and possibly typewriters were translators’ best friends. Today, however, computers with the Internet connection seem indispensable to what translators do. As the results of a comprehensive survey in 2005 within an EU-funded project (MeLLANGE project[1]) show, around 95% of the participants who were translation students and professional translators made use of the Web for translation-related tasks. In fact, it can be claimed that the Web has turned into a new resource for today’s translators. Now let us see what the Web has to offer translators.

There are quite a large number of free dictionaries, glossaries and term banks in various languages and specialized fields available online which can be easily accessed by users around the world. The Internet thus can be considered as a virtual library with a huge number of dictionaries, glossaries and term banks in various languages and specialized fields available to translators around the world. While in the past translators may have had to leave the comfort of their homes or offices to get access to such resources or have had to invest greatly in them, today they have literally all kinds of resources at their finger tips.

The present translation market is above all marked by its online nature. A considerable number of translation jobs today are posted to online translation portals where translators from all over the world can quote and get the jobs. Besides being a source for translation-related tools like the dictionaries, glossaries, etc. the Internet thus can be said to turn into a primary channel for clients and translators to communicate and work together.

With the Internet at their disposal, translators can get jobs from clients all over the world. As Aula (2005: 134) states, “previously, translators were available in and provided their services in a specific geographical area. There are no longer such limitations; in fact, most translation services are offered and supplied through the Internet”. In other words, the Internet has changed considerably the nature of the translation market.

We come back to the vital role of the Internet as a provider of resources. The translator’s work is made even easier and more efficient by the building of a corpus as a tool for translation. With respect to this tool, a DIY (Do-It-Yourself) corpus can be said to be particularly useful for translators.

A DIY corpus is defined as “a collection of Internet documents or more precisely of web pages in HTML format created ad hoc as a response to a specific text to be translated” which is an open and disposable corpus (Zanettin 2002: 242). DIY corpora are in fact compiled to answer a specific question the complier has and accordingly they are usually abandoned after the need has been met (Fletcher: 2004).

The World Wide Web is usually used as the main resource to get texts for inclusion in DIY corpora, though other resources such as textual archives of various types such as archives of newspapers or CD-ROM can also be used in this context. There are various approaches to the use of the Web for building DIY corpora. The basics, however, are the same; the ordinary search engines are used to find the relevant texts for inclusion in DIY corpora.

DIY disposable corpora have one important advantage over larger non-disposable corpora; they are rather easy to build. It implies that translators who may not have access to ready-made corpora would still be able to enjoy the benefits of corpora by compiling their own disposable corpora. DIY corpora would be especially useful for translators working on less dominant languages, for which the number of ready-made corpora available may be limited. For instance, for a translator translating into Malay, getting access to corpora of Malay language may not be easy. This is while the translator can easily build his or her own corpus of Malay texts by drawing on the Malay texts available online.

There is also a number of free machine translation systems available on the Internet. Though such online MT systems are generally intended to be used by non-professional occasional users, they may come in handy in professional translation as well. Babelfish[2], Systran[3], SDL Free Translation[4] and Google Translate[5] are some of the most well-known machine translation systems available online.

The Web itself can be regarded a valuable source of translationally-relevant information for translators. Considering the Web as a big corpus of texts in various languages, translators can extract valuable translationally-relevant information from the Web using either search engines or web concordancers. Before going on any further into this discussion, it is necessary to first elaborate on the concept of the Web as a corpus.

To see whether the Web has what it takes to be considered a corpus, the present section is devoted to a comparison between the idiosyncrasies of the Web and the features of corpora.

Based on the various definitions put forward for corpora in the literature, corpora have three main features, i.e. they contain authentic texts; they contain texts sampled to be maximally representative of the language variety under study and finally they are in machine readable form (Tengku Mahadi, Vaezian & Akbari: 2010). Tognini-Bonelli (2001: 55) defines authenticity of texts in corpora in the following words: “All the material included in the corpus, whether spoken, written or gathered along any intermediate dimension, is assumed to be taken from genuine communication of people going about their normal business”. There is no doubt that the Web shares this feature of corpora with online texts being real instances of language in use. As for the second feature, i.e. representativeness, the Web with its vast amount of texts in various languages can be said to contain representative samples of texts in various languages. Finally, the Web certainly shares the third feature of corpora with online texts being in machine readable form.

The Web, thus, can be said to more or less share the basic features of corpora. The web in fact can be considered as a large multilingual monitor corpus which is constantly updated with new texts. Now let us see why translators would want to draw on the Web as a big corpus instead of real corpora.

Corpora have been proved very useful to translators. As stated by Tengku Mahadi, Vaezian and Akbari (2010), corpora can provide translators with conceptual, collocational, terminological, linguistic and orthographical information. There are, however, some limitations in the use of corpora by translators.

First, accessing ready-made corpora may not be always easy. In fact, the number of available ready-made corpora is limited, especially for less dominant languages. Second, most existing corpora are domain specific and supply a limited range of genres and text types. Third, the existing corpora may not always contain the exact information the translator is looking for. In fact, even a very large specialized corpus may not always contain the information needed to translate texts on the respective specialized subject.

In contrast, the Web is accessible to all the users around the world; it contains an abundance of texts in almost all languages of the world and it has texts in a wide range of genres and text types. This situation implies that using the Web as a corpus may be a viable option in settings where translators do not have access to ready-made corpora or when the available corpora do not contain the exact information the translator is looking for.

There are, of course, some downsides to the use of the Web as a big corpus. The main problem with using the World Wide Web as a big corpus has to do with the fact the users have no control over the texts on the Web. This lack of control over the online texts implies that it would be difficult for users to judge the texts in terms of their representativeness and authenticity (Fletcher: 2004). Apart from that, as states by Gatto (2009: 46), working with huge amounts of data available on the Web “requires an enormous amount of processing power” which implies that the users have to invest more time and energy into finding the kind of information they need from among the large quantities of available information on the Web.

5.0 EXTRACTING TRANSLATIONALLY-RELEVANT INFORMATION FROM THE WEB AS A CORPUS

If we consider the Web as a big corpus, we can simply consider the various search engines such as Google, Yahoo, and Bing as corpus analysis tools used to query this giant corpus. Search engines are in fact programs that examine sites and store information about the contents of the sites. So when a search is performed, the search engines start searching the documents stored on the Web for the specified keywords and returns a list of documents with the respective keywords. What search engines do is, in fact, very close to what concordance features in corpus analysis tools do in that they both search the documents for the specified keywords and find the instances in which the keywords are used.

Some search engines such as Google has a number of features which allow users to refine their searches in order to make the most of their searches. Some of the most useful search features of the Google search engine, from a translation perspective, are phrase search, search to exclude terms and wildcard search. Phrase search allows users to look for an exact phrase by putting double quotation ("") marks around a set of words. In the search to exclude terms, by using a minus sign (-) immediately before a word, Google would not include the pages containing the respective word in the results, and finally by using an asterisk (*) within a query in wildcard search, Google treats the star as a placeholder for any unknown term(s) and then finds the best matches (Google guide: online).

As Maniez states, search engines can well be used by translators to retrieve rare terms or syntactic patterns or to check the frequency of the terms or expressions in the target language (2007). He further explains how Google wildcard search can be applied to find translation equivalents when the exact equivalent of a complex term is unknown (ibid). Fuji too shows how using Google search engines helps students improve their translations in the following areas: articles and singular/plural, formation of the tense forms, conversion of parts of speech, prepositions, flow of words and sentences, relative pronouns, choosing the right word in the context and finally switching from passive voice to active (2007).

Apart from ordinary search engines, there are a number of concordancing programs specifically designed to query the Web as a corpus. Such concordancing programs as Gatto states, “provide contextualized examples of language usage from the web in a form tailored for linguistic analysis” (2009: 80). Currently, there are a number of web concordancers freely available online. WebCorp[6] designed by the Research and Development Unit of English Studies at the University of Birmingham and WebAsCorpus[7] designed by William Fletcher are two well-known web concordancers freely available online.

Nowadays professional translators, more often than not, work with variety of texts and texts types. As Ulrych (2005: 22) has stated “the idea that professional translators work predominantly in one or two specialist fields is in fact swiftly losing grounds…”. This situation asks for translators to have an encyclopedic knowledge of various specialized fields or else have the necessary resources to gain such knowledge when the need arises.

As Pym states (1993: 114) the specialization in translation market implies that “a good translator is not someone who knows many things but someone who has the skills and contacts to find specific information when necessary”. The key to success in the present translation market, thus, can be said to lie in being resourceful and the Web seems to have a lot to offer to translators from this perspective. The World Wide Web in fact can be considered as an invaluable resource for translators in that it not only contains a vast amount of linguistic information about various languages and text types, but also provides translators with a channel to communicate with fellow translators, subject matter experts and above all the clients.

Aula, I. ,2005, Translator Training and Modern Market Demands. Perspectives: Studies in Translatology, 13(2), Pp. 132-42.

Fletcher, W.H. ,2004, Facilitating the Compilation and Dissemination of Ad-hoc Web Corpora, In G. Aston, S. Bernardini and D. Stewart (eds), Papers from the Fifth International Conference on Teaching and Language Corpora. Amsterdam: Benjamins.

Fujii, Y. ,2007, Making the Most of Search Engines for Japanese to English Translation: Benefits and Challenges. Asian EFL Journal, (23), Pp. 41-77.

Gatto, M. ,2009, From body to web: An introduction to the web as corpus. Bari: Laterza.

Google guide ,2009, [online] [Accessed 8^th September 2009], available from: http://www.googleguide.com/

Maniez, F. ,2007, Using the Web and computer corpora as language resources for the translation of complex noun phrases in medical research articles. Panace@, (9), Pp. 162-167

Pym, A. ,1993, On the Market as a Factor in the Training of Translators. Koiné, (3), Pp. 109-121.

Tengku Mahadi, T. S., Vaezian, H. and Akbari, M. ,2010, Corpora in Translation: A Practical Guide. Linguistics Insight: Studies in language and communication (120). Bern: Peter Lang AG, International Academic Publishers.

Tognini-Bonelli, E. ,2001, Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benjamin Publishing Company.

Ulrych, M. ,2005,Training Translator: Programs, curricula, practices.In M. Tennent (Ed) Training for the New Millennium. Amsterdam and Philadelphia: John Benjamin’s.

Zanettin, F. ,2002, DIY Corpora: The WWW and the Translator. In B. Maia, J. Haller and M. Urlrych (eds.) Training the Language Services Provider for the New Millennium, Porto: Faculdade de Letras, Universidade do Porto, Pp 239-248. [On line] [Accessed 11^th May 2008], Available from: http://www.federicozanettin.net/DIYcorpora.htm

Paper presented at the International Conference on Translation. Globalization through Translation: A Catalyst for Knowledge and Technological Excellence. Malaysia: 2011

[1] MeLLANGE (Multilingual eLearning in LANGuage Engineering) is a European Union funded project aimed at providing student and professional translators with an opportunity to update their translation-related skills in accordance with the market demands. In line with this objective, a survey was carried out among translation students and professional translators in UK, France, Germany, Italy and Spain to have a better overview of the needs of translation students and professionals. The results of this comprehensive survey are available online at: http://www.iti.org.uk/uploadedFiles/surveys/Mellange_Survey_Updated%2805-06%29.pdf

[2] http://babelfish.yahoo.com/

[3] http://www.systransoft.com/

[4] http://www.freetranslation.com/

[5] http://translate.google.com/#

[6] http://www.webcorp.org.uk/

[7] http://webascorpus.org/searchwac.html