Is Gutenberg Guilty? A South American Adventure Through the Word Counting Standardization Jungle
So many words and no way to count them… Frederico Carvalho at All Tasks Technical Translations in Brazil takes a quick historical and humorous look at what this means in the “real world” on a daily basis for project managers and business people and makes a plea for word count standardization. To review a proposal currently being submitted to LISA, please read Andrzej Zydron’s article in this same issue, GILT Metrics – Slaying the Word Count Dragon.
A while ago, I was busy at work when my boss called me up on the phone to assign me to a very special task: I was to investigate the standard (if any existed) for word counting Chinese documents.
I work in Brazil at a translation agency, so we are very accustomed to handling multiple language pairs, albeit mainly European ones, e.g., English-Portuguese, Portuguese-Spanish, English-German, etc. But the world has changed. Just a short time ago, the Brazilian president went on a six-day trip to China to sign trade agreements between the two countries – the developing world's two biggest economies. Even before this official visit, the translation demand for Chinese was increasing in Brazil. Thus, our business department has been wondering what the best practice would be for evaluating Chinese text volumes.
We localized a mobile device into 28 languages last year, including Traditional and Simplified Chinese, but back then we had just one client and only one translator (fortunately in Brazil), so we could keep things under control. But now, we have new clients everyday, and at least four Chinese translators (half of them abroad) available full-time. And, we have many doubts about the word counts now. The Chinese language is formed by characters that represent a word or a part thereof, and which have ideographic roots. Therefore, we cannot assume that one Chinese character is equal to one word, so the traditional method for word count, as we apply it to Western languages, is likely to fail.
“Standard” pages can contain anywhere from 1200 to 2046 characters!
So, I set to work on my special project. I contacted agencies from Asia and translators, and I asked LISA staff to help clarify the issue, only to discover that there is no international standard for word counts in any language, let alone Asian languages. However, I did find out that LISA has a Special Interest Group called OSCAR (Open Standards for Container/Content Allowing Re-use) that is actively working to establish a standard in this area (see the section at the end of the article for more information).
I must admit it wasn’t a big surprise that no real standard exists globally, since it’s no different in Brazil. The usual procedure adopted by most agencies is an old-fashioned method inherited from the ancient time of typewriters – the Standard Page (O.K., I’m not blaming anyone since I also had my own Olivetti). This page is supposed to contain around 200 words, and can vary anywhere from around 1200 to 1250 to 1400 to 1500 characters, depending on agency policy (better not to mention the journalistic standard page, which is composed of 1700 characters). Recently, a translators’ labor union came forward to bravely suggest that agencies should stop shooting at random and adopt the standard page of 1250 characters.
When we need to outsource translation services abroad, we face the same problem caused by the lack of word count standardization. Some international agencies also use the standard page system. But what is needed is a true global standard. For instance, German and Russian (including Polish and Czech) pages have 1800 characters, while in Italy I have found that the standard varies between 1250 – 1500 characters/page. In Thailand, agencies evaluate text by using an A4 paper with text size set at 16 points to arrive at a standardized page of 2046 characters.
Paddle Your Own Canoe
Right now, customers and translators are forced to have special rate tables for each standard. All of this could be avoided through the implementation of a real standardized word count procedure, providing consistent counts so that all of us can stop wasting so much money and effort juggling the different methods and trying to bring them into synch.
Some people blame the agencies for not speaking the same language, some blame the international translator organizations (that are not able to establish a uniform standard), and others even blame Gutenberg as the great villain. He standardized letter forms for movable type in order to generate the first printed books, and now we are not able to create a standardized procedure to evaluate the text that goes on them.
The Chinese Solution
The discussion goes on. Even in Asia, they do not agree upon an equal basis for measuring texts. However, most Chinese agencies have adopted a simple model that text be evaluated at 1000 characters per page. I think, myself, that this is the best solution for word counting, since it doesn’t matter if it’s for source or target language.
By the way, historical research indicates that the earliest dated (868 A.C.) printed book also comes from China. A sign that the Chinese were already working on standardization long before our technological innovations.
LISA Is Working on a Solution
Because of this lack of standardization, OSCAR, LISA’s Special Interest Group for GILT standards, is examining the issue of word count and other metrics for translation work. Unfortunately, as can be seen in my examples above, finding a single metric that will satisfy the different expectations in various countries may prove difficult. It might seem intuitive, and even obvious, that word counts should be used as a basis for determining translation costs, but this is only obvious for Indo-European languages that have a well-defined concept of a word, and, in particular, a concept of a word that can be somewhat easily counted. Word count won’t work for all languages, so any standard solution will need to address a basis for dealing with languages that cannot be counted in the same way as English, Portuguese or Russian. Thai, Chinese, and Japanese, to name some of the more common ones, cannot be easily counted without tremendously sophisticated linguistic knowledge that is not generally available in computer programs. In today’s global marketplace, it is no longer possible to have a myopic view of the world that only considers the needs of European languages.
Although it is impossible to say what solution OSCAR will arrive at, a single standard will finally lead companies and translators out of the jungle and into the light. It will simplify bidding, subcontracting and billing, and help prevent disputes and surprises. When everyone can agree what a word or a character is and how to count it, we will all be better able to focus on our jobs, not on the aggravating task of counting words.
Frederico Carvalho is a graduate in Communications Science and a Project Manager at All Tasks Technical Translations in Brazil. He is currently working on the SEED (Schlumberger Excellence in Educational Development) website localization project. Carvalho can be reached at fred_DELETE_THIS@alltasks.com.br.
Reprinted by permission from the Globalization Insider,
17 June 2004, Volume XIII, Issue 2.3.
Copyright the Localization Industry Standards Association
(Globalization Insider: www.localization.org, LISA: www.lisa.org)
and S.M.P. Marketing Sarl (SMP) 2004
Please see some ads as well as other content from TranslationDirectory.com: