The Guide to Translation and Localization: Writing and Displaying Asian Characters Globalization translation jobs
Home More Articles Join as a Member! Post Your Job - Free! All Translation Agencies
Advertisements

The Guide to Translation and Localization: Writing and Displaying Asian Characters



Become a member of TranslationDirectory.com at just $12 per month (paid per year)





[ Table of Contents ]

Chapter 12: Writing and Displaying Asian Characters

Localizing into Asian languages can present unique challenges. Asian character sets often contain many more characters than Western alphabets; some sets number in the thousands of characters. How does your computer know how to display all of these characters? And how does anyone get them typed in?

Ting Fan photo

Ting Fan

Systems Administrator

Lingo Systems is a great place to work. Everyone here is friendly and willing to help others. Sometimes, we need to stay up late and work through the weekend. But we work as a team and get the job done. With my family being a few thousand miles away in China, Lingo feels like a second family to me.

In a typical Western font, such as Arial, each character is represented by a single byte of information (or 8 bits), which results in a total of 256 possible characters. This is not nearly enough for many Asian character sets (particularly Chinese, Japanese, and Korean), so Asian characters are programmed as "double-byte;" that is, each Asian character is made up of 2 bytes (or 16 bits) worth of information. Double-byte character sets have over 65,000 characters available. This resolves the issue of displaying characters, but it can cause other problems for you if you are developing software that needs to support Asian characters (see "Localizing Asian Software" on page 74).

As for typing Asian characters - you are not likely to find a computer keyboard that contains individual keys for every Chinese or Japanese character. A Chinese keyboard would have to contain over 10,000 keys! Fortunately, some clever methods have been devised to use the standard keyboard. The discussion below is specific to Chinese, but the general concepts are also true of Japanese and Korean.

Entering Chinese Characters

In order to enter Chinese characters into a computer, you need an operating system that supports Chinese character input methods. This could be a native Traditional Chinese or Simplified Chinese operating system, or some other operating system that has either built-in support or third-party software installed for Chinese character input. Once you have the right software, there are three general methods of entering Chinese characters into a computer: typing, handwriting, and speaking.

Typing Chinese characters involves breaking down each ideogram (or character) into a series of alphanumeric characters using a set of defined rules. These rules allow you to create the characters with a standard keyboard. This process is an input method. A steep learning curve is required to master the rales of any input method, but these are still the fastest and most effective means of inputting Chinese characters using today's technologies.

Numerous input methods have been developed since Chinese computing was first introduced. Two of the most popular methods used for Traditional Chinese are Zhu Yin and Chang Jie. The most popular input method for Simplified Chinese is Pin Yin. The Zhu Yin and Pin Yin methods break down a Chinese character by how it sounds, representing those sounds with keys on the alphanumeric keyboard. The Chang Jie method breaks down a Chinese character using the character's shape.

Zhu Yin is based on the pronunciation system of 37 sounds and 5 tones that are used in Taiwan. This pronunciation system is familiar to most Taiwanese school children. Chinese characters can be "spelled" with this system. Native keyboards come with the 37 sounds printed on them, so that native speakers can type in Traditional characters. For example, if you want to type the word "Chinese" on an English keyboard, you would type in "5j/" and "jp6." If you were using a native keyboard, the Zhu Yin pronunciation symbols would be indicated on the keys. Together, these symbols form the two-character pair that means "Chinese" hieroglyph.

Pin Yin is very similar to Zhu Yin, except that the tone of the character is not considered as part of the input method. Only the component sounds are typed.

Chang Jie, in contrast, is shape-based. It uses 24 familiar characters, each of which stands for a set of related shapes. All but a few Chinese characters can be readily broken up into pieces within this relatively small set of shapes. From this sequence of shapes, no more than five will be selected, by regular rales, to form the code for typing the character. For example, the character hieroglyph is broken into hieroglyph and hieroglyph. To follow the Zhu Yin example, you would type the word "Chinese" in Chang Jie by typing "L" for hieroglyph and "YK" for hieroglyph

All of these input methods require the support of your computer operating system. Originally, because Asian characters are double-byte, users had to either use a native operating system or purchase a third-party software bridge for an English system. Today, as a result of Unicode technology, most input methods are supported directly by Windows XP and Mac OS X even on the English version of the operating system. Moreover, since applications such as Microsoft Internet Explorer and Mozilla Firefox support double-byte characters, it is usually easy to write and display Asian languages using Western hardware and software.

Peter Kavanagh photo

Peter Kavanagh

DTP Specialist

If I win the lottery, I know exactly what I will do. I will go to Japan and buy a big fishing boat to catch a Japanese giant crab. They are one of the largest arthropods known to man, up to 12 ft long and 40 lbs. Giant crabs are very difficult to catch because they live in deep parts of the ocean (up to 1,000 ft). If I need a deck hand, I may hire my supervisor, Roger.

Writing Chinese characters on a computer is now also possible, thanks to technological improvements. Various companies have developed Chinese writing pads that connect direcdy to your computer. Users can write direcdy on the pad and the software recognizes the handwritten characters and displays them as the appropriate type-written characters on the screen.

For example, at libraries in Hong Kong, a Chinese writing tablet is connected to each computer terminal so that anyone who is not familiar with a standard input method can nevertheless write in Chinese to perform a search in the library database.

Finally, speaking can also be used to enter Chinese characters. This method relies on recent advances in speech recognition technology. Users speak directly into a microphone connected to a computer. The software recognizes the phonetics of each word and displays the appropriate characters.

These writing and speaking methods enable users to enter Chinese characters without requiring mastery of the complex rules for standard input methods. Previously speech and handwriting recognition were only available through third-party software, but now, these methods are supported by the newest versions of Microsoft Office.

They are not without drawbacks, however, as the interpretation of written or spoken characters is far from perfect. The user is generally required to "teach" the software how to recognize his/her style of writing or speaking. Also, these methods are still often slower than the typing methods. As this technology continues to advance, speaking to the computer may one day overtake the traditional typing methods and allow for a more convenient way of entering Chinese characters.

Localizing Asian Software

A significant challenge sometimes arises when localizing your software in Asian languages. Before the advent of Unicode, some software could neither accept Asian character input nor display Asian characters correctly and some could do one but not the other. To enable support of Asian (and other foreign) languages, code pages were defined. For example, Windows 95, 98, and Millennium Edition all used code pages that contained 256 code points (one code point represents one character). For those languages with more than 256 characters, a Double-byte Character Set (DBCS) was developed.

A major drawback of the code page concept is that a system using a code page can support only one language at a time, since the same code point may need to map to different characters for different languages. For example, under the DBCS system, Chinese could not be mixed with Japanese within the same application.

With the invention of Unicode, these types of issues have been successfully eliminated. However, applications not based on Unicode will require special attention during localization. Read more about Unicode in Chapter 6 of this guide.

[ Table of Contents ]









Submit your article!

Read more articles - free!

Read sense of life articles!

E-mail this article to your colleague!

Need more translation jobs? Click here!

Translation agencies are welcome to register here - Free!

Freelance translators are welcome to register here - Free!









Free Newsletter

Subscribe to our free newsletter to receive news from us:

 
Menu
Recommend This Article
Read More Articles
Search Article Index
Read Sense of Life Articles
Submit Your Article
Obtain Translation Jobs
Visit Language Job Board
Post Your Translation Job!
Register Translation Agency
Submit Your Resume
Find Freelance Translators
Buy Database of Translators
Buy Database of Agencies
Obtain Blacklisted Agencies
Advertise Here
Use Free Translators
Use Free Dictionaries
Use Free Glossaries
Use Free Software
Vote in Polls for Translators
Read Testimonials
Read More Testimonials
Read Even More Testimonials
Read Yet More Testimonials
And More Testimonials!
Admire God's Creations

christianity portal
translation jobs


 

 
Copyright © 2003-2024 by TranslationDirectory.com
Legal Disclaimer
Site Map