The Guide to Translation and Localization: Engineering and Computer-aided Tools
By Lingo Systems,
Portland, OR, U.S.A.
info [at] lingosys . com
www.lingosys.com

4,400+ Translation Agencies! Click Here to Buy the Database!
[ Table of
Contents ]
Chapter
6: Engineering and Computer-aided Tools
Roles of the Engineering Group
True localization is a multi-discipline activity that
includes linguistics, formatting, engineering, and quality
assurance. Engineering is an integral component of this
service and is one of the services offered by a localization
provider that differentiates simple translation from comprehensive
localization.
Localization
engineers are involved at every stage of the localization
process. Often they consult on internationalization matters
before the materials are even developed. Once source files
are created, the engineers' analysis of them provides vital
information for project planning and budget estimation.
Linguists rely on engineers to extract text strings from
source content and prepare marked-up files to facilitate
translation. They also manage the ensuing translation memory
and use tools such as Trados, Catalyst, and Multilizer to
improve consistency and lower costs. Prior to delivery,
engineers may perform the functional testing of the localized
products.
At Lingo
Systems, members of our engineering group closely interact
with the other production departments to provide further
support. For our formatting group, they import and export
text from desktop publishing applications. For our QA department,
they perform functional testing of technical projects such
as user interfaces, websites, and help systems. And they
are always available for a quick game of pool over lunch.
First Things First: Internationalization
Many companies develop their products with only a U.S. customer in mind.
When these domestic products are slated for distribution
to foreign markets, the process of localization often reveals
limitations in the product design. Internationalization
is the process of engineering a product so that it can be
localized for export to any country.
Often, internationalization is quite simple. For example,
some languages use more characters and take up more space
than others. A properly internationalized source file will
leave room for text expansion. Another common internationalization
step is to resize an 8 1/2" x 11" document to
European A4 paper size.
In addition
to considering overall design and layout, the internationalization
process focuses on, but is not limited to, the following
points:
|
Cedric
Vezinet
Director of
Engineering
After 10 years in this
industry, I still find every day as exciting as the
very first one on October 9th, 1996 when I started
my Lingo career as a French linguist. By looking at
my pictures in the previous versions of the guide,
one can tell that I have lost a lot of hair over the
years but I have not lost the motivation. |
1) Does the design account for cultural differences in
various metrics such as currency, units of measure, date
format, phone numbers, and addresses?
2) Are all the localizable strings isolated from variables
and other code for easy extraction?
3) Are unique strings re-used in different contexts throughout
the product?
4) Is the product free of embedded and concatenated strings?
5) Is the interface designed for dynamic layout so that it can
accommodate text expansion?
6) Do automated lists take into account any sorting
order differences in the target locale?
To avoid
internationalization surprises, involve your localization
provider during the product design stage so that localization
requirements can be taken into consideration during development.
If this is not done, your localization vendor will likely
have to perform some product internationalization prior
to beginning localization. This may not only compromise
timelines, but may also have an adverse effect on your budget.
Encoding: Pick Your Poison
A major question to address when you begin the internationalization
process is: can or will your application use some flavor
of Unicode as its encoding format? Before Unicode was invented,
there were dozens of different encoding systems. No single
one contained enough characters to represent every possible
language. For example, the European Union alone required
several different encodings just to cover its languages.
Even for a single language like English, no single encoding
was adequate for all the letters, punctuation, and technical
symbols in common use.
To add to the challenge, many of these encoding systems also conflict with
one another. That is, two encodings will use the same numeric
assignment for two different characters, or use different
numeric assignments for the same character. Computers (especially
servers) must be able to support many different encodings
- but it still may not be enough. Whenever data is passed
between different encodings or platforms, it runs the risk
of being corrupted.
Unicode eliminates most of these problems. It is well established, works
on all platforms, and supports many more characters than
most of us have ever heard of or will ever use. Unicode
provides a unique number for every character, no matter
what the platform, no matter what the program, no matter
what the language. It also allows data to be transported
between many different systems without corruption. Due to
the natural progression of technology, there are many different
Unicode formats: Big-Endian, UTF-7, UTF-8, UTF-16, UTF-32,
and on into the future. In general, UTF-8 will be most common
on the Web. UTF16, UTF16LE, UTF16BE are mostly used by Java
and Windows. UTF32, UTF32LE, UTF32BE are mostly used by
various UNIX systems. Fortunately, the conversions between
all of them are algorithmically based and quick to implement.
Pick Early,
Test Often
Many internationalization
issues can be identified early in the development process
by performing internationalization testing of the source
material. Machine translation (MT) technology is often used
for this purpose since it can generate pseudo-translated
content that has the look and characteristics of translated
material without a costly investment in translation. Machine
Translation is based on advanced computational linguistic
analysis and, because it is cheap, can quickly generate
lots of translated content for testing purposes. Such testing
can help pinpoint issues in the localization project before
they become major headaches. For instance, a pseudo-translation
can identify variables in the software that should not be
translated, allowing you to isolate them prior to actual
linguistic work.
It is important to note that MT has many drawbacks when
it comes to actual translation (e.g., it requires the use
of constrained vocabulary or it may not convey complex or
abstract concepts), but it is a very valuable tool in the
internationalization phase.
Internationalization is not a service commonly offered by localization
companies as it requires highly skilled and specialized
personnel with a very strong understanding of the platforms
and development environments being used. Moreover, a well
executed internationalization review will not necessarily
rid your files of all potential localization headaches -
but it will reduce them to a manageable level and avoid
the introduction of additional defects during the localization
process.
In general,
the difference between a successful project and one plagued
by problems is a direct function of the amount of interaction
between the client and vendor's engineering departments
in the early stages of the project. Internationalization
evaluation and testing is a very cost effective way to ensure
that your product is ready for localization - especially
when measured against the delays and costs associated with
trying to resolve these issues during the localization process.
On Our Way: Localization Begins
Once all internationalization issues have been addressed, the localization
process can begin on a good foundation. For the engineering
group, this usually means preparing the source files for
translation. How this is done varies depending on the type
of materials. The four main categories are: documentation
localization, help localization, UI localization, and web
localization.
Page by Page: Documentation
Want to see us pull a rabbit out of our hat? Well, perhaps that's a stretch,
but this is where the magic starts. Imagine you have just
purchased the latest and greatest Widget. The first thing
you do is read the manual, right? (C'mon, work with us here.)
Now, imagine lifting all the English out of that manual,
crunching it all up and then carefully unfolding it to reveal
a brand new language. It's a bit of a strange notion, but
that's pretty much how document localization works. In the
simplest terms, documentation engineering is the process
of importing and exporting text from a desktop publishing
application.
|
Chris
van Grunsven
Senior
Localization Engineer
In
my many years here, I have become the "Keeper
of Useless Knowledge," like: How to count to
31 with one hand, 1,023 with two hands. How to convert
Word 6.0 RTFs to Word 2.0 RTFs, with a text editor
so they will work on Windows 3.11. Where the copies
of Swedish Windows 3.0, Japanese Word 6.0, and the
DOS 4.0 user's manuals are shelved. And, if you can't
find the holiday decorations, I know where they are,
too.
|
Since most translators work within Microsoft Word using Computer-Aided
Translation (CAT) software, the source material (which can
be in any medium) must be converted to an RTF file or TTX
file while preserving the formatting of the original document
in order for the linguist to be able to work with it. This
is done by using different tagged text formats (codes) to
isolate the formatting from the translatable text. By protecting
the formatting, the translators can then use their CAT tools
and focus exclusively on what needs to be translated without
being confused by formatting codes, which can be very numerous
(especially in the case of Quark documentation).
The vast majority of documentation is developed using Adobe InDesign, Adobe
FrameMaker, QuarkXPress, and Adobe PageMaker. As you can
see, Adobe Systems Inc. has quite a few different writing
tools, but as time goes by, they seem to be moving toward
one versatile application that will address all documentation
development needs. In January 2004, Adobe began transitioning
its PageMaker users to InDesign. If InDesign continues to
gain popularity in the technical writing community, it will
make the localization process much easier. InDesign is a
terrific application for localization. It offers full Unicode
support and is well-suited for cross-platform work. It also
allows for XML integration with content management systems.
But we must admit that there are quite a few deficiencies
in the INX (InDesign Exchange) and Tagged Text format which
makes the localization process a little bit tricky. But
we always have a bit of engineering magic up our sleeves
to solve the problem.
Regardless of the application used to develop the materials, when the RTF
or TTX files come back from translation they head straight
to Engineering. With a wave of a wand and some feverish
keyboard tapping, engineers pour the localized text back
into the source documents and hand them off to the DTP department
where they are polished to perfection.
Stop the Presses: Help File
Localization
As a means
of disseminating information, print documentation is quickly
losing ground to interactive help systems. Well-structured
online help provides users with incredible search capability,
allowing them to find more information in less time than
with conventional print documentation. Many help users say
this leads to a richer experience. We could not agree more.
Help systems are not only getting bigger, they are getting smarter. Perhaps
most importantly, however, they are becoming easier and
less costly (if not downright cheap) to create. Single-source
publishing tools such as AuthorlT, ArborText, Web Works,
or RoboHelp are now able to import previously generated
Word or FrameMaker documents and then leverage them to create
interactive help systems. Let's note here that localization
savvy applications such as AuthorlT, which offer built-in
localization support, make the translation process a walk
in the park. As more companies discover these benefits,
this trend will only accelerate. The main help formats we
see being used are WinHelp, HTML Help, WebHelp, JavaHelp,
Oracle Help, and the relatively new FlashHelp. Even though
all these formats have their own specific uses, when it
comes to localizing help systems, the approach is similar.
Interface This: Software Localization
An engineering group really shines during the localization of software.
We take on all comers: any flavor of Windows, Mac OS, UNIX,
Linux, Palm OS, Symbian, mainframe, and Java based applications.
And we will take any variety: web-based, server-based, or
client-based.
For some
programming language and platform combinations, software
localization requires a process not unlike the one used
for documentation. The localization engineer extracts the
text from the application and then creates a tagged RTF
or TTX file for the translator that protects the underlying
codes. When it comes to protecting the codes, TTX is by
far the better choice. In other cases, the localization
engineer uses a proprietary tool or off-the-shelf application
like Catalyst or Multilizer that allows the translator to
work directly on compiled files and executables. All things
being equal, however, it is more common and easier to work
in the resource (RC) files or properties files to minimize
the amount of preparation work and reduce the potential
for defects being introduced during the localization process.
Whatever
method is used, one thing is sure: the continuing evolution
of Unicode technology and the greater understanding of the
needs of the international market has made localization
engineering much easier. The latest OS editions from Microsoft
and Apple are perfect illustrations.
The combination
of Windows XP and Office 2003 is a must-have when dealing
with multiple languages in your day-to-day operations. It
is now possible to easily generate text files in many encodings
for the most widely used languages on any Western operating
system. The manipulation of Eastern languages, double-byte,
and even right-to-left languages has been made much easier,
too. We previously had to navigate from one native operating
system to the next just to manipulate localized files. Much
of this tussle has now disappeared and native operating
systems are only used for online functional testing of the
final localized product.
Also widely
used and indispensable is Apple and its Mac OS X, and especially
its latest "Tiger" release. Not only is it a great
system for localization but, in our opinion, the most localization-friendly
operating system on the market. With just a simple drag
of the mouse, users are able to switch the UI and/or the
system's language!
No matter
what the platform, the best way to make your UI localization-friendly
is to externalize all localizable strings (similar to Java's
properties file). Whenever possible, design your UI so that
most of the strings are located in well-formatted files
where the variables are followed by the string and the interface
is dynamically laid out. Another important rale is to avoid
string concatenation.
Going World Wide: Web Localization
|
Mike van Grunsven
Senior
Localization Engineer
Unlike
my brother who is the "Keeper of Useless Knowledge"
here at Lingo, I like to think my knowledge is useful.
Yellow and blue make Grun,2 + 2= H, etc...
|
User Interfaces are increasingly web-based
because they are easier to maintain and offer more support
than client-based applications. In most cases, both web-based
applications and commercial websites have a database such
as Oracle, SQL Server, MySQL, or Access as the back end.
Fortunately, no matter what the type of database, the same
process and tools (e.g., Multilizer) are used for localization.
From an engineering perspective, the most important step
in localizing a database is to use well defined spec sheets
listing the tables and the fields requiring localization.
It also helps to have the database designed in such a way
as to facilitate either field or table localization. From
there, the only other hurdle could be string length limitations,
but these are easily managed with tools such as Multilizer.
As with many things in life, however, what is good for the goose
may not be as good for the gander. Websites built with dynamic
content are usually very localization friendly for engineers.
In most cases, it is relatively easy for us to extract the
text strings from the underlying database. Unfortunately,
once the text has been extracted, it is not so friendly
for the linguists who translate the strings.
Rather than
working with a complete document, all the translator sees
are random, out-of-context strings - a difficult challenge
for even the most skilled professional. It is therefore
a good idea to use a description field in your database
to give some guidance to the linguists. With proper instructions,
the engineer will be able to include non-translatable fields
in the translation packages that are provided to the translator.
The most compelling advantage of a database-backed website is the downstream
benefits. Updates (including localization maintenance) become
very easy and very cheap. As changes are made to the site,
the new and modified strings are extracted, translated,
and then reinserted. In many cases, localization delivery
can even be automated using a translation portal such as
Lingo Systems' "LingoNet."
There can be other challenges to localizing a website besides the database
component. For example, using multiple programming languages
can create parsing difficulty when generating RTF files
for the linguists. The most common programming languages
found on most websites are PHP, JSP, Perl, ASP, ColdFusion,
and JavaScript.
Last, but
far from least, working on the graphical assets of a website
can be difficult when source materials (Photoshop, Illustrator,
or CorelDraw files) are not available. With their omnipresent
gradient backgrounds and obscure fonts, nothing is worse
than being asked to recreate localized versions of these
elements. This invariably requires design expertise from
(and budget for) our DTP department.
Repeat after us: It is always a good idea to keep the
source files in a safe place and to isolate localizable
layers. This useful feature is offered by most, if not all,
image and graphic editing software such as Photoshop, Fireworks,
and Illustrator.
Talking the Talk: Terminology Management and Translation Memory
The last function that a localization engineer performs
may be the most important. Terminology management, including
the creation and maintenance of translation memories (TMs),
has a huge effect on both quality and consistency. It may
also be the single most important factor in reducing localization
costs.
TMs are a must-have for any localization project. Some localization firms
assign the task of terminology control to the project manager.
At Lingo Systems, we believe in using the right person for
the job and have no doubt that when it comes to managing
hundreds of multilingual translation memories this person
is the localization engineer. An inaccurate or corrupted
TM (whether it is a linguistic corruption or an encoding
corruption) can reduce leveraging, adding to the cost of
the project and ultimately hurting the quality of the translation.
There are several players in the CAT tool market. Since the acquisition
of Trados by SDL on June 20, 2005, the largest share belongs
to SDL TRADOS. A few other smaller, more specialized players
like DejaVu and TermStar are also worth mentioning. The
principle behind each of these products is the same: the
translator uses the tool interactively within a word processor
to automatically retrieve existing translations from a database
(Translation Memory). For localization engineers, it does
not really matter which tool is used since most, if not
all, of them are TMX compliant, meaning that the TM content
can be exchanged between CAT tools through an XML-based
export file (TMX file). All of them also offer fuzzy matching,
which gives the translator close matches to a localizable
sentence thereby speeding up the translation.
Another linguistic tool that is often integral to the development and maintenance
of an effective TM is glossary management. In the world
of localization, glossaries represent a list of key terms
and definitions that the translator will need to properly
localize the source materials. Many of the TM tools include
a glossary management module to facilitate the compilation
of a glossary and the subsequent translation of the key
terms whenever (and wherever) they appear. These modules,
such as MultiTerm from SDL TRADOS, run in the background
as the translation is being done in a word processing application.
They then flag for the linguist any term that is located
in the MultiTerm glossary, minimizing the time a linguist
typically needs to go back and forth between reference materials
and applications.
The newest glossary management tools are so customizable that they even
allow the user to add multimedia content to the term definition.
The possibilities are infinite. The next generation of translation
tools even allow the localization vendors to share their
glossaries as well as their translation memories over the
web (SDL Trados TM Server and MultiTerm Online are good
examples), which greatly facilitates the interaction between
the localization company the client, the linguists, and
the in-country reviewer.
Wrapping Up
Let's speak plainly. Localization engineering is not
rocket science, but in our estimation, it comes close. As
you prepare for a localization project, be sure to leave
a seat at the table for an engineer. From the initial internationalization
planning, through actual translation and implementation
stages, and ongoing translation memory development
and maintenance, an engineer will be directly involved.
We may be biased, but we believe that a top-notch engineering
department can anticipate the potential issues you may face
well ahead of time: from nagging technical oddities to esoteric
cultural differences. So, as you plan your next project,
keep us engineers in mind. The extra time you invest up
front will pay off in terms of reduced timelines and cost
savings in the long run.
[ Table
of Contents ]
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!
|