In May, OSCAR
unanimously passed motions to move two new
draft standards to the public review phase. This
means that OSCAR has announced that the versions
of these standards are now complete enough to
use, and that the committee would like the public
to provide feedback prior to official adoption.
For years, OSCAR and other groups like the XLIFF
and TransWS
committees in OASIS have been working on creating
standards for all aspects of the globalization
process. The two new standards that have been
proposed by OSCAR fill some important holes in
the overall standards picture.
The first standard to have been
moved to draft status is one that regular readers
of the Globalization Insider will be familiar
with from last year’s article by Andrzej Zydroń:
GMX-V
or GILT (Globalization, Internationalization,
Localization and Translation Metrics) Metrics
eXchange – Volume. Part one of a proposed
three-part standard, GMX-V provides a standard
way of determining various classes of word and
character counts for textual content. In addition,
GMX-V provides separate counts of the numbers
of in-line tags and other elements in a text that
can give some idea of the volume of work associated
with DTP and other collateral tasks in the localization
process. GMX-V also takes into account what is
produced by linguistic tools, through providing
separate counts for TM matches, fuzzy matches
and translatable content not covered by leveraged
material.
GMX-V thus addresses the perennial
problem in the localization industry of determining
word counts, but it goes beyond this need. By
providing volume metrics for many different aspects
of the localization process, pricing can be simplified
and made more transparent. In addition, clients
and service providers can negotiate prices for
services that account for the actual nature of
particular jobs, rather than applying a blanket
price per word that covers all aspects of the
localization process in a rather indiscriminate
matter.
One danger in standardizing word
counts, or any aspect of volumetrics in localization,
is that service providers may resist the standard
because it directly impacts their pricing. Freelance
translators, in particular, tend to be very leery
of any attempt to redefine metrics in ways that
may reduce word counts and thus directly affect
their ability to make a living as translators.
So, what advantages does GMX-V
offer to localization service providers? One advantage
is that, by providing various classes of counts,
GMX-V will help service providers to better understand
a task in advance, enabling them to allocate resources
and charge appropriately. Every localization provider
has undoubtedly dealt with the occasional job
that had a low word count but which turned into
a money-losing nightmare because of some aspect
of the job that a simple word count obscured.
GMX-V counts, while not targeted at the complexity
of a localization task (complexity will be addressed
in a separate component of GMX called GMX-C),
do provide some rudimentary idea of complexity
through inline code counts and numeric counts.
The primary advantage of GMX-V
for clients is perhaps more obvious: GMX-V promotes
transparency of pricing and process. By eliminating
the uncertainty of tool-specific word counts,
clients are freer to compare prices and to be
able to understand the costs associated with their
projects. A number of years ago, this author ran
trial word counts on sample documents using both
translation tools and word processors commonly
used in the localization industry. Although the
results varied by text type, for some documents
the difference in word count between the highest
and the lowest tools was as much as 30%. This
indicates that price per word is a highly variable
measure that depends on what tool is doing the
counting. GMX-V counts, on the other hand, will
not vary by tool, so prices per GMX-V word will
have a fixed meaning.
The second draft standard is TBX
Link. TBX Link is a very simple standard,
designed to allow XML documents to link terminology
to a termbase in TBX format. This standard arose
because OSCAR and LISA
Terminology SIG Members identified
the lack of such a linking mechanism as one significant
limitation in current terminology solutions. It
is also a factor that may prevent some potential
users of TBX from moving forward with TBX. Essentially,
the problem is that terms in XML texts lose their
linkage to termbases, limiting the usefulness
of terminology markup in a localization environment.
TBX Link, on the other hand, provides a simple
XML name space-based mechanism for linking to
TBX terminology repositories.
Although TBX Link is a very simple
standard, its simplicity belies its power. When
terms can be linked to termbases, it is not only
localization that stands to benefit. Users of
search technology can benefit from using termbase
data to disambiguate search data in a way that
is currently impossible. For instance, if an English-speaking
user searches for the word frog, a number
of possible meanings may be returned. Among them
are various amphibians, a portion of the joint
between two railway tracks, and a piece of a violin
bow. If a document is marked up with TBX Link
that points to an authoritative term base for
a specific subject, those occurrences of frog
with a specific meaning can be given priority
over other meanings in a search. Therefore, the
user looking for information on a railway track
frog will not have to deal with 100,000
irrelevant hits about amphibians to find what
s/he is looking for.
Obviously, such a future depends
on a lot of other things, like search providers
implementing support for TBX and termbases in
general, but TBX Link points the way to the future.
Imagine a word in which you could find information
from anywhere in the world, regardless of
the language it is in and whether or not you know
that language, because tests contain terminolgoy
markup. With authoritative term-bases and TBX
Link, such presently impossible feats will be
possible.
Finally, the OSCAR committee also
agreed to consider xml:tm,
donated by XML International, as a potential new
OSCAR standard. Consideration of xml:tm in the
OSCAR context has just begun, but this move by
OSCAR shows that it is strongly committed to the
future of translation and localization technologies.
OSCAR welcomes your feedback on
these proposed standards. To submit your feedback,
please visit http://www.lisa.org/oscar,
where you will find links to GMX-V
and TBX
Link, as well as information on other
standards.