However, although
corpus analysis tools have been extensively used
for research purposes, it seems that, at least
in translator education in Finland, the systematic
use of such tools as actual translation aids has
until now been rather neglected. It also seems
that electronic corpora are not used widely by
practising translators either, probably because
they have not been exposed to the potential of
corpus analysis tools during their own education
and probably because of the unavailability of
ready-made special-field corpora. Thus Jääskeläinen
and Mauranen (2004, p 53) propose that courses
on how to compile and use corpora should not only
be integrated into translator training at the
undergraduate level but also be offered as continuing
education to practising translators.
With this in
mind, I began compiling a corpus of English-language
tourism brochures in spring 2004, with the aim
of using it to teach students how the competent
use of electronic text corpora in conjunction
with corpus analysis tools can help both the trainee
translator and the professional translator to
become better language service providers by enhancing
both the quality of their work and their productivity,
particularly when translating special field texts
into a foreign language. (Many translators of
non-literary texts in Finland frequently translate
into their L2).
3 The Tourism
Corpus
There were a
number of reasons for deciding to compile a TL-corpus
of tourist brochures. Firstly, there is a high
demand in Finland for tourism texts to be translated
from Finnish into English, not only for various
kinds of brochures but also for websites. Secondly,
I myself have extensive experience in this field,
having done a large amount of language checking
for various professional translators as well as
a certain amount of translating of tourism texts
from Finnish into English. Thirdly, many printed
tourist brochures also appear in PDF format on
their owners' websites, and thus are relatively
easy to convert into the plain text format required
by many corpus analysis tools. And last but certainly
not least, students seem to be attracted to this
field--perhaps because there is a certain amount
of glamour attached to travel and tourism, and
perhaps also because the concepts are relatively
easy for even the non-expert to understand compared
with many other special fields.
Nevertheless,
translating tourist brochures can appear at first
sight to be deceptively easy. For example capturing
the right style, conforming to conventions of
the target language and culture, and finding a
consistent and logical strategy for translating
names of places, resorts and establishments as
well as for translating culture-specific terms
are just a few of the difficulties that face the
translator. In Finland, another problem is that
although the source texts of some brochures are
written with a foreign audience in mind, more
often than not they are written first for the
Finnish audience and it is this text that serves
as the basis for the foreign language versions.
The content is not necessarily geared towards
a foreign audience, and thus there are, for example,
frequent allusions to information that will be
implicitly understood by the Finnish audience
but not by the foreign audience.
The texts of
the Tourism Corpus were mainly derived from tourist
brochures that appear on the Internet in PDF format.
In many cases, converting these into plain text
format was quite straight-forward, though in most
cases careful post-editing needed to be done,
since headings, and in some cases even complete
paragraphs, frequently tended to switch positions
in the conversion process. Usually, the more sophisticated
and attractive the brochure, the trickier it was
to convert into text format.
By September
2004, with the help of a student assistant, I
had compiled a corpus amounting to 670,000 words.
There are various types of corpora and various
ways of classifying them. The Tourism Corpus could
be described as being an untagged monolingual
target-language corpus. It contains mainly texts
from brochures from the British Isles and from
North America, especially Canada. When compiling
the corpus, a major reason for including Canadian
brochures was that they contain descriptions of
activities that are often featured in Finnish
source texts--e.g. snowshoe treks, skiing, snowmobile
trips, wilderness adventures--which are rarely
mentioned in British brochures.
The file names
have been labelled with one of the following codes:
BI, CA, US, so that the user can immediately identify
whether a concordance line is from the British
Isles, Canada, or the United States, as illustrated
in Figure 1.
4 Exploiting
the Tourism Corpus
During the 2004-2005
academic year, I integrated corpus exploitation
into my translation courses. Students received
instruction in using the corpus analysis package
WordSmith Tools (Scott, 2004), were
taught various strategies for exploiting corpora
when translating, and were given tourist brochure
texts as translation assignments from Finnish
into English. Examples are given below illustrating
ways in which students have been able to exploit
the Tourism Corpus in order to improve the quality
of their translations.
4.1 Collocation
The corpus has
proved very useful for finding information about
collocates, especially adjectives that collocate
with nouns. For example, when translating sentences
containing the noun rapids, the KWIC display
provides a rich menu of adjectives to choose from,
as illustrated in Figure 2.
Figure
2: Display of some of the concordance lines generated
by WordSmith Tools for the search word rapids
When searching
for collocates, the corpus often leads to somewhat
unexpected discoveries. For example when looking
for translation equivalents for hoidettu
or kunnostettu when referring to cross-country
ski trails, traditional resources suggest,
for example, conditioned, maintained,
restored and reconditioned as possible
translation candidates. However, of the 1000-plus
concordance lines generated by the search word
trails, none of the above adjectives appear
immediately to the left of the search word, while
there are over 40 occurrences of the adjective
groomed. Native speakers, especially North
Americans, will probably be familiar with this
term. However, most novice translators, and even
those professional translators that have little
experience in translating tourism texts, are not
usually familiar with this adjective. A new concordance
with groomed as the search word generates
128 hits, and provides evidence of, for example,
groomed bicycle and walking trails, groomed
classic and skating trails, groomed cross-country
ski trails, groomed fairways, groomed off-road
trails, groomed runs, groomed slopes, and
groomed wilderness trails, as illustrated
in Figure 3.
Figure
3: Display of some of the concordance lines generated
by WordSmith Tools for the search word groomed
However, even
the seasoned concordance user may "miss"
the 40-plus occurrences of groomed when
scrolling through the 1000-plus hits for trails.
Therefore, when a search word generates a large
number of concordance lines, students are taught
to turn to the collocates display and the clusters
display. For example, Figure 4 shows the words
that occur most frequently within a span of five
words to the left of trails, while Figure
5 shows the most common 3-word clusters containing
trails. Each of these displays helps to
highlight the frequent co-occurrence of groomed
and trails.
|
Figure 4: Fifteen most
frequent collocates
occurring to the left of trails |
Figure 5: Fifteen most
frequent 3-word clusters
containing trails |
4.2 Finding
and choosing between terms
When deciding
on a translation equivalent for a specific term
or phrase, the corpus has been of great help in
verifying or rejecting decisions based on other
tools such as dictionaries and the Internet. An
example of this is the Finnish term koiravaljakkoajelu.
After hunting through traditional translation
aids, student translators came up with the terms
dog sled, dog sledge & dog sleigh,
each of which is also often written with hyphens
or as one word. The corpus helps in deciding on
which of these alternatives to use. Figure 6 illustrates
some of the concordance lines generated for the
search pattern dog*. The original KWIC
display contained 22 hits for dog sled,
27 hits for dogsled, and 6 hits for dog-sled,
with no hits at all for dog sledge or dog
sleigh or variations thereof. Moreover there
were 68 hits for dogsledding, often written
also as two words. The display also shows that
adventure, excursion, ride,
trip, and tour are amongst the nouns
that collocate with dog sled.
Figure
6: Display of some of the concordance lines generated
by WordSmith Tools for the search word dog*
4.3 Serendipity
Researchers such
as Bernardini (2000, 2001) and Varantola (2003)
have pointed out that corpora allow unpredictable,
incidental learning: the user may notice and explore
unknown or unfamiliar uses in a concordance and
go off at a tangent to follow them up. Bowker
& Pearson (2002, pp 200-202) show how creative
search techniques, for example concordancing with
contextually-relevant search words, can increase
the likelihood of "accidentally" finding
relevant information.
As shown earlier,
a search of the Tourism Corpus for trails
led to the serendipitous discovery of the adjective
groomed. The KWIC display in Figure 6 provides
further examples of the kind of previously "unknown"
information the translator might acquire when
browsing through a KWIC display. This information
may be relevant to the translation assignment
at hand, or may come in handy for future assignments.
Lines 1, 2 & 14 contain references to dog
musher and dog mushing that may warrant
further consideration; lines 6, 17 & 21 refer
to ice-fishing, while line 14 encourages
the tourist to fish through a hole in the ice--two
possible translations for the Finnish term pilkkiminen;
lines 10 & 11 mention ATV tours, lines
18 & 24 aurora viewing, line 21 snowshoeing,
and line 22 illuminated skating loop, all
of which may lead to further exploration by viewing
in fuller context or by entering new search patterns.
For example a search for ATV, will quickly
reveal that this is a widely used abbreviation
for All Terrain Vehicle--a possible translation
candidate for mönkijä, a Finnish
term that is difficult to find an equivalent for
using traditional resources.
4.4 Language
chunks
Gavioli &
Zanettin (1997) point out that a corpus acts as
a continual source of additional raw material
and consider that the greatest benefit of using
TL corpora is that they can suggest multi-word
"chunks" that students are able to use
to produce texts that sound more natural in the
target language. According to Gavioli & Zanettin,
achieving such "naturalness" is probably
the greatest benefit of using corpora in translation,
particularly into the foreign language, where
naturalness is more difficult to achieve.
Finnish tourist
brochures often contain references to ruska-aika,
the period in autumn when the leaves change colour
leading to breathtakingly beautiful landscapes.
The translator may decide that the concept
of ruska contains implicit information
that needs to be expressed more explicitly for
a foreign audience, and thus some sort of
description is necessary. Figure 7 shows some
of the concordance lines produced by a search
for autumn. Words and phrases could be
extracted from them and incorporated into the
translator's own description.
Figure
7: Display of some of the concordance lines generated
by WordSmith Tools for the search word autumn
If one had searched
for fall, the American synonym
for autumn, one would also have found references
to the fall foliage season, brilliant
foliage in fall and stunning fall foliage.
5 Words of
Warning
Some researchers,
e.g. Ball (1997), have warned that the use of
electronic text may tempt the analyst to seek
only that which is easy to find--you notice only
what you get back; you will not notice what you
did not find. However the experience that I have
had when integrating corpora analysis into translation
courses suggests that creative searching
is likely to result in a wealth of discoveries
and answers to questions that the translator did
not even think of asking in the first place.
There have also been some concerns that corpora
may reinforce the tendency of translated texts
towards "normalisation" (i.e. making
texts more standardised and conventional):