Getting More from Translation Memory
is reprinted with the kind permission of the Localisation
Resource Centre. It first appeared in the March 2004
issue of Localisation Focus.
Translation Memory (TM) technology and its benefits are widely known in the localization industry. TM technology is mainly used for reusing text, thus saving time and reducing costs by using previously translated units. Through the reuse of text, we can achieve three objectives, namely improved consistency, minimized turnaround time and reduced translation costs.
(Editor’s note: To learn more about TMX, and to help contribute to standards in the GILT industry, plan on attending the LISA Global Strategies Summit 2004 in San Francisco, where members of OSCAR will discuss the future of standards and how you can help ensure that they will meet your needs.)
If we take a look at the latest Translation Memory eXchange (TMX) specifications and the tags used in the TMX file format, we can get more than just time-and-cost savings and can maximize benefits in previously unexplored ways. Since the TMX file format is based on XML, it offers a lot of flexibility. Having your TM in XML-compliant TMX file format offers you more opportunities to make the most of the information stored in it and ensures faster Return on Investment (ROI). This article focuses on nonstandard TMX file format usage.
Having your TM in XML-compliant TMX file format offers you more opportunities to make the most of the information stored in it and ensures faster Return on Investment (ROI).
TM in TMX
The TMX standard and the specifications for the file format were developed by OSCAR, a special interest group of the Localization Industry Standards Association (LISA). The purpose of the TMX standard is to create a standard file format, which can be imported or exported easily using any translation memory tool. Thus, a TM stored in TMX file format makes it easy to transfer the TM and use it in any tool that supports the TMX standard. In a TM, each unit and its related information is stored using various tags. The OSCAR team is continuously working on improving the TMX standard and revise and publish TMX specifications periodically. TMX specification documents are available at http://www.lisa.org/tmx/tmx.htm
TMX can be used for other purposes besides translation
More than just translations
Translation Memory technology, as its name suggests, is mainly used for translation purposes. It is used in leveraging a current project to save time and money. However, it can be used for other purposes. If you open a TM saved in TMX file format, you will find other tags as well as source and translated units. These tags contain information such as client ID and project ID type (whether it is documentation, software or HELP project) etc. This extra information is saved using the <prop> element within the <tu> element. The <prop> element is used to define the various properties of the parent element (or of the document when <prop> is used in the <header> element). These properties are not defined by the standard format.
Figure 1. <prop> element giving more information about the translated text.
As we can see from Figure 1, various unpublished property types have been used (they start with prefix “x-”). From these property types, you can ascertain that this translation is related to the Food Processing industry (see the <prop type="x-Domain"> tag), and refers to a product titled Label Maker Pro (see the <prop type="x-Product"> tag) that has the project ID; labelmaker_v12 (see the <prop type="x-Project"> tag). Furthermore, you can also ascertain the file type of the localizable item (see the <prop type="x-Format"> tag). The “x-Terminology” property type is used to specify whether the item is included in the glossary.
This additional information can be useful when you want to know how a string was translated in a particular version of the product and you want to correct it based on the validator’s comments. Or if you want to know the differences between the translation of a string in one version of the product and in another version. Also, tagging a unit with the glossary property can help you to create a glossary by selecting only glossary-tagged units. This can be accomplished using XSL stylesheets or by writing a script.
Customized string changes
During User Interface (UI) testing, we are faced with UI specific problems. Strings that do not fit in a given space are a common example. For this, you will either need to abbreviate or rephrase the translation, so that it will fit in the screen-space provided.
When reflecting these changes in the TM, you can use the <note> element inside <tu> (see Figure 2), this element allows you to record details about why you abbreviated or rephrased the target string.
Figure 2. <note> element describing customized string changes.
Furthermore, if the TM software has a feature capable of showing notes in red after leveraging, then these notes will be useful to translators. For example, during the translation of future versions, if a translator sees such a note in red, he/she will keep the abbreviated translation as it is and will not retranslate it. This will avoid unnecessary effort wasted in dialogue box and string-resizing (See Figure 3).
Figure 3. Leveraged text, showing a higlighted note to the translator.
Using XSL with TMX
With XSL, you can show selective parts of your TM. You can show source and target units in a tabular format in the browser. You can also choose to show only glossary related items that are tagged with glossary property tags. You may find this useful, for example, if you want to put localized material on your company’s intranet. The Opentag website has some useful XSL template collections which are available here. Using these XSL templates, you can generate a tab-delimited file from a TMX file, view translated units of the TMX file, and also change between older and more recent versions of TMX.
Using TM for FAQs and Knowledgebases
User documentation contains a lot of useful information such as Frequently Asked Questions (FAQs), quick installation steps, troubleshooting etc. After you have localized the user manual, your TM will have this information stored in different languages. You can use this TM (in TMX file format) as a searchable database, so that users (customers, business partners, translators, validators etc.) can search for necessary information via your website and can get the search results in their native language.
Through specially designed web interfaces, the user can perform various types of searches based on keywords, phrases, product-names etc. The search engine will then search through the company’s master TM (stored in TMX file format) and will display the search results in the user’s preferred language. Since a file using the TMX file format is a structured document with XML tags, the search function will be performed more efficiently and search criteria will yield more precise results. This can help reduce costs in many areas. For example, as translators will have ready access to previous translations, they will have fewer queries and thus, it will minimize time wasted on unnecessary communication. This will also help increase the speed of translation. The technical support team may get less support calls as users can get necessary information, in their native language, from the searchable knowledgebase on the Internet. Overall it can reduce the time and money spent on different modes of communication.
Supporting TMX Features
In order to make the most of your TM and the different usages mentioned in this article, companies producing TM software, should implement different TM-related features in their software. One such feature would be an export feature that allows file-export based on tags, so that users can export selective tags (such as property types shown in Figure 1).
An example of where this could be useful would be when localising a user manual for localized software. Often a translator may not know how to handle software UI references appearing in the localized manual. This may be because the translator has no access to the localized software or relevant reference material. Most of the time, leaving the UI reference in English and putting a translation in the brackets that follow would handle the situation. If we could export the TM units that have UI terms into a separate file, then it would help the translator and reduce a large amount of work in subsequent phases.
Furthermore, there are some software packages on the market that allow you to edit the translated units in the TM. Heartsome’s TMX Editor is one such package. It has an attractive graphical user interface which makes the editing job more user friendly. Such editors help to improve the quality of a TM and make TM maintenance easy.
One way to improve the TMX standard, would be to add new tags, elements and attributes. One of the main objectives behind TM technology is the reuse of text. Let’s hope that the suggestions made in this article will help to achieve this objective.
Shailendra Musale has worked in software localization for over ten years in Singapore and Finland. He periodically writes on various topics related to the localization industry. He currently works as a Globalization Engineering Project Manager (GEPM) at Veritas India. He can be reached at email@example.com
Reprinted by permission from the Globalization Insider,
11 May 2004, Volume XIII, Issue 2.2.
Copyright the Localization Industry Standards Association
(Globalization Insider: www.localization.org, LISA: www.lisa.org)
and S.M.P. Marketing Sarl (SMP) 2004
Please see some ads as well as other content from TranslationDirectory.com: