When creating websites
or applications which are designed to be multi-lingual,
software developers who are new to globalisation tend
to make the same mistakes. Presented here are some
of the more common mistakes we see on a regular basis.
1. String Concatenation Problems
It is common amongst English-speaking developers
to simply add an "s" to the end of a word expecting
it to pluralise the word, but this should be avoided
as it is an English only construct. Other string concatenation
problems can occur, such as when constructing a sentence
which may have masculine or feminine word endings
in other languages. Because some cultures present
a value at the end of the sentence rather than the
beginning it's a good idea to use tokens to represent
variables so that the sentence can be restructured
by the translator depending on the language. The C#
language provides the System.String.Format method
for this purpose, but most programming languages have
equivalent functionality.
int count = 30;
string output = string.Format({0} files processed, count);
2. Text Encodings
When creating Web pages or creating applications
which write to the file system of a computer, it is
important to use a text encoding which can support
the wide range of character sets that you want to
express. Many English speaking developers use ASCII
or the Windows 1252 character set by default, however,
a better choice is UTF-8 which can support Chinese,
Japanese and other difficult languages while maintaining
backwards compatibility with ASCII for English text.
UTF-8 is the default text encoding of XML as defined
by the W3C, as well as the default output encoding
of ASP.Net. It's also important to consider text encoding
when designing database structures, for instance by
using Unicode database fields (which may take up twice
as much space on the database for fixed width columns
such as the Microsoft SQLServer nvarchar type) and
marking the language used within a text field so that
it can be easily identified later.
3. Graphics
When designing the user interface of an application,
a common problem is that text in other languages can
be longer than the English equivalent, causing text
to be hidden by other elements or causing undesired
wrapping effects. German, in particular, can be up
to 50% longer than its English equivalent. When designing
a user interface or when sending text for translation,
it may be worth investing the time up front to provide
translators with examples of where the text may be
used, or designing the user interface to adjust elegantly
to different text lengths. Problems with graphics
can be far reaching, from country specific telephone
numbers (forgetting to add the country code) to a
lack of space to write the translation, whilst another
common problem is simply forgetting to send graphical
elements for translation.
4. Lack of Context
Sometimes, our customers send us lists of strings
in XML format without providing any context around
the list, such as a description of where the string
is used (next to a text box for example). This causes
many of our translators to ask for a description of
where the label is to be used so that they can provide
a proper translation, resulting in an increase in
turnaround time. When creating a list of terms for
translation, it is very useful for translators if
the customer creates screen shots or writes a description
of where the string will be used, especially if it
is to be an urgent job.
5. Input and Output Styles
Many English developers are not familiar with the
ways that other cultures express dates and numbers.
For instance, in most of the world, the comma is used
to express a decimal point, whereas in English, the
comma is used as a thousand separator. This means
that we see a lot of JavaScript where number parsing
fails and a lot of confusion regarding the correct
input method.
The .Net Framework provides the System.Globalization.CultureInfo
class which you can use to express variables as strings,
or parse input in a culturally sensitive way. You
can set the System.Threading.Thread.CurrentUICulture
to a CultureInfo, which will automatically force the
ToString() method to output in the correct culture
format, in addition, the month name will be expressed
in the correct language. In Java, similar functionality
is handled within java.util.Locale.
When sharing date formats between systems, the best
format to use is the ISO 8601 date format, which is
close to the Japanese date format YYYY-MM-DD. The
use of the hyphen rather than the slash shows that
the ISO format is to be used. For reference, the US
expresses dates as MM/DD/YYYY whereas the UK expresses
dates as DD/MM/YYYY which shows how different cultures
which speak the same language can have different ways
of expressing the same information.
Many developers use 3 drop down lists (combo boxes
/ select boxes) for date entry, but it is important
to make sure that the method makes sense across all
cultures, for instance, using 4 numbers to represent
the year rather than 01,02,03,04,05 and by using month
names rather than numbers, which could be confused
with the day of the month.
Conclusion
So, there you have it. Some of the most common mistakes
as seen by the project
management teams at thebigword. Our teams
are responsible for the translation of many FTSE 100
companies' software and Web pages, so whatever your
problem, it's likely that we will have experienced
the problem before and can help you. For more information,
please contact our sales team on +44 (0)870 748 8000
or email adrian.hesketh@thebigword.com.