The Guide to Translation and Localization: Writing and Displaying Asian Characters
By Lingo Systems,
Portland, OR, U.S.A.
info [at] lingosys . com
www.lingosys.com

4,400+ Translation Agencies! Click Here to Buy the Database!
[ Table of
Contents ]
Chapter 12: Writing and Displaying Asian
Characters
Localizing into Asian languages can present unique challenges.
Asian character sets often contain many more characters
than Western alphabets; some sets number in the thousands
of characters. How does your computer know how to display
all of these characters? And how does anyone get them typed
in?
|
Ting
Fan
Systems Administrator
Lingo Systems is a
great place to work. Everyone here is friendly and
willing to help others. Sometimes, we need to stay
up late and work through the weekend. But we work
as a team and get the job done. With my family being
a few thousand miles away in China, Lingo feels like
a second family to me. |
In a typical Western font, such as Arial, each character
is represented by a single byte of information (or 8 bits),
which results in a total of 256 possible characters. This
is not nearly enough for many Asian character sets (particularly
Chinese, Japanese, and Korean), so Asian characters are
programmed as "double-byte;" that is, each Asian
character is made up of 2 bytes (or 16 bits) worth of information.
Double-byte character sets have over 65,000 characters available.
This resolves the issue of displaying characters, but it
can cause other problems for you if you are developing software
that needs to support Asian characters (see "Localizing
Asian Software" on page 74).
As for typing Asian characters - you are not likely to
find a computer keyboard that contains individual keys for
every Chinese or Japanese character. A Chinese keyboard
would have to contain over 10,000 keys! Fortunately, some
clever methods have been devised to use the standard keyboard.
The discussion below is specific to Chinese, but the general
concepts are also true of Japanese and Korean.
Entering Chinese Characters
In order to enter Chinese characters into a computer,
you need an operating system that supports Chinese character
input methods. This could be a native Traditional Chinese
or Simplified Chinese operating system, or some other operating
system that has either built-in support or third-party software
installed for Chinese character input. Once you have the
right software, there are three general methods of entering
Chinese characters into a computer: typing, handwriting,
and speaking.
Typing Chinese characters involves breaking down each
ideogram (or character) into a series of alphanumeric characters
using a set of defined rules. These rules allow you to create
the characters with a standard keyboard. This process is
an input method. A steep learning curve is required
to master the rales of any input method, but these are still
the fastest and most effective means of inputting Chinese
characters using today's technologies.
Numerous input methods have been developed since Chinese
computing was first introduced. Two of the most popular
methods used for Traditional Chinese are Zhu Yin and Chang
Jie. The most popular input method for Simplified Chinese
is Pin Yin. The Zhu Yin and Pin Yin methods break down a
Chinese character by how it sounds, representing those sounds
with keys on the alphanumeric keyboard. The Chang Jie method
breaks down a Chinese character using the character's shape.
Zhu Yin is based on the pronunciation system of 37 sounds
and 5 tones that are used in Taiwan. This pronunciation
system is familiar to most Taiwanese school children. Chinese
characters can be "spelled" with this system.
Native keyboards come with the 37 sounds printed on them,
so that native speakers can type in Traditional characters.
For example, if you want to type the word "Chinese"
on an English keyboard, you would type in "5j/"
and "jp6." If you were using a native keyboard,
the Zhu Yin pronunciation symbols would be indicated on
the keys. Together, these symbols form the two-character
pair that means "Chinese" .
Pin Yin is very similar to Zhu Yin, except that the tone
of the character is not considered as part of the input
method. Only the component sounds are typed.
Chang Jie, in contrast, is shape-based. It uses 24 familiar
characters, each of which stands for a set of related shapes.
All but a few Chinese characters can be readily broken up
into pieces within this relatively small set of shapes.
From this sequence of shapes, no more than five will be
selected, by regular rales, to form the code for typing
the character. For example, the character
is broken into
and .
To follow the Zhu Yin example, you would type the word "Chinese"
in Chang Jie by typing "L" for
and "YK" for 
All of these input methods require the support of your
computer operating system. Originally, because Asian characters
are double-byte, users had to either use a native operating
system or purchase a third-party software bridge for an
English system. Today, as a result of Unicode technology,
most input methods are supported directly by Windows XP
and Mac OS X even on the English version of the operating
system. Moreover, since applications such as Microsoft Internet
Explorer and Mozilla Firefox support double-byte characters,
it is usually easy to write and display Asian languages
using Western hardware and software.
|
Peter
Kavanagh
DTP Specialist
If I win the lottery,
I know exactly what I will do. I will go to Japan
and buy a big fishing boat to catch a Japanese giant
crab. They are one of the largest arthropods known
to man, up to 12 ft long and 40 lbs. Giant crabs are
very difficult to catch because they live in deep
parts of the ocean (up to 1,000 ft). If I need a deck
hand, I may hire my supervisor, Roger. |
Writing Chinese characters on a computer is now
also possible, thanks to technological improvements. Various
companies have developed Chinese writing pads that connect
direcdy to your computer. Users can write direcdy on the
pad and the software recognizes the handwritten characters
and displays them as the appropriate type-written characters
on the screen.
For example, at libraries in Hong Kong, a Chinese writing
tablet is connected to each computer terminal so that anyone
who is not familiar with a standard input method can nevertheless
write in Chinese to perform a search in the library database.
Finally, speaking can also be used to enter Chinese
characters. This method relies on recent advances in speech
recognition technology. Users speak directly into a microphone
connected to a computer. The software recognizes the phonetics
of each word and displays the appropriate characters.
These writing and speaking methods enable users to enter
Chinese characters without requiring mastery of the complex
rules for standard input methods. Previously speech and
handwriting recognition were only available through third-party
software, but now, these methods are supported by the newest
versions of Microsoft Office.
They are not without drawbacks, however, as the interpretation
of written or spoken characters is far from perfect. The
user is generally required to "teach" the software
how to recognize his/her style of writing or speaking. Also,
these methods are still often slower than the typing methods.
As this technology continues to advance, speaking to the
computer may one day overtake the traditional typing methods
and allow for a more convenient way of entering Chinese
characters.
Localizing Asian Software
A significant challenge sometimes arises when localizing
your software in Asian languages. Before the advent of Unicode,
some software could neither accept Asian character input
nor display Asian characters correctly and some could do
one but not the other. To enable support of Asian (and other
foreign) languages, code pages were defined. For example,
Windows 95, 98, and Millennium Edition all used code pages
that contained 256 code points (one code point represents
one character). For those languages with more than 256 characters,
a Double-byte Character Set (DBCS) was developed.
A major drawback of the code page concept
is that a system using a code page can support only one
language at a time, since the same code point may need to
map to different characters for different languages. For
example, under the DBCS system, Chinese could not be mixed
with Japanese within the same application.
With the invention of Unicode, these types of issues
have been successfully eliminated. However, applications
not based on Unicode will require special attention during
localization. Read more about Unicode in Chapter 6 of this
guide.
[ Table
of Contents ]
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!
|