Making Reuse Intelligent: Improving Enterprise Information Quality Management
Reuse has become a buzzword in technical communication and localization. For one thing, businesses want to be sure that they write the same information only once. They also want to avoid translating information repeatedly, because it is expressed using different words or a different word order. But in a distributed writing environment where disparate groups are contributing to huge content repositories, how can you make sure that content is created only once? This article looks at the role technology can play in promoting content reuse. In particular, innovations in linguistic technology are making it possible for companies to take a systematic approach to this challenge.
We can think about reuse under two broad headings, topic-based reuse and linguistic reuse. Companies already recognize the tremendous benefits of DITA, the Darwin Information Typing Architecture. It is an emerging trend in XML and the leading technology infrastructure for topic-based reuse.
Through a process often called “chunking,” DITA helps create recyclable, transferable units from extensive documents by breaking them into smaller topics. It provides a structure that eliminates the need for user-defined DTDs, while letting users create customized topic extensions for their own needs. In essence, DITA provides a framework for breaking often enormous documents into manageable packages.
But the key advantage of DITA involves reuse. Imagine five different products with a power supply that has to be connected in a standard way. DITA helps create a single topic to describe the setup process instead of five. It thus eliminates 80% of the content that companies previously had to manage - edit, maintain, and translate.
The current state of linguistic reuse
Although more and more organizations recognize that topic-based reuse is good for them, sentence-level (or sentence-fragment level) reuse remains a relatively unexplored territory. Yet reusing linguistic segments ensures consistency across documents and makes localization more cost-effective by eliminating the need for retranslation. Remember - translation memory systems do not work at the topic level, but at the sentence or segment level. Working on this level is therefore the key to controlling translation costs.
Most of the current solutions to this challenge rely on “fuzzy matching” algorithms. These algorithms measure the similarity between two character strings (sentences or sentence fragments). On a superficial level, fuzzy matching seems like a useful solution to the problem. But the reality is quite different. Fuzzy matching works in translating, but is far less suited to writing environments.
Consider this example:
Fuzzy matching offers the following potential suggestions having different or even opposite, meanings:
There are similar problems for sentences with variables.
For the example:
fuzzy matching offers:
and so on.
In terms of usability, authors might have to wade through tens of suggestions for a single input, or thousands for a single document, which not only discourages writers from using the tool, but also increases the risk that they will introduce inaccurate information.
Up until now, technology has not met the challenges of applying reuse at the sentence level. The technologies that are available have delivered few tangible results for authoring and editing. The tools currently available do not address the single most important aspect of linguistic reuse: matching sentences or sentence fragments in terms of meaning. At the same time, the tools have often proven unwieldy or unusable in practice.
Acrolinx has recently introduced a new Intelligent Reuse component for its Information Quality software that meets these two challenges, combining meaning-based reuse with usability.
Consider the following examples:
These segments are simply different ways of saying the same thing, but translating them individually increases costs. Tools based on fuzzy matching are not useful here because the words and word sequences are too different. However, Intelligent Reuse identifies the similarity in meaning so that authors do not have to write different sentences to express the same thing.
Behind the scenes, a technology based on Artificial Intelligence extracts sentences from a translation memory or content management system. It groups sentences with similar meanings into so-called “micro-clusters.” The previous example is one such cluster. The following sentences are drawn from a cluster of approximately 25 sentences:
Typically, content repositories or translation memories contain many segments that are redundant or of questionable quality. Based on initial experiences with the tool in business settings, Intelligent Reuse reduces redundancy in content by 15-35%. It also filters the micro-clusters for quality, checking for spelling and grammar, corporate style, and terminology. Compare the second and first sentences in the micro-cluster. “Start date” is capitalized in the first, but not in the second; the second contains a double period at the end. Juxtaposing the sentences in this way enables users to detect issues that typical spellcheckers might not catch. Intelligent Reuse provides spellchecking and quality assurance on a sentence level, rather than a word level, with an overhead similar to regular spellchecking.
After checking for quality, Intelligent Reuse chooses a representative sentence, a “winner” in terms of representativeness and quality. For this cluster, Intelligent Reuse chooses the following sentence, which is highlighted in a web-based interface:
At this point, linguistic administrators can accept the suggested representative sentence, choose another one, or even move sentences from one cluster to another using the interface. This validation process is a key aspect of quality assurance because it helps administrators choose only correct sentences. Once a representative sentence has been chosen, the administrator activates its cluster for document checking.
Putt ing Intelligent Reuse into practice
From the perspective of writers, the tool now functions exactly like a spellchecker. For any sentence that approximates a representative sentence in meaning, writers receive a single standard sentence as a suggestion. Intelligent Reuse provides suggestions for sentences already stored in a content repository. But what is truly new about the tool is that it makes suggestions for newly authored sentences that match a representative sentence in meaning. For the preceding example, a writer comes up with the sentence:
Intelligent Reuse would suggest the validated representative sentence:
Even though the new sentence is not part of the original micro-cluster. In addition, the tool does not detract from productivity because it makes only one, high-quality suggestion for any input.
This reuse is always intelligent because the suggestions match in meaning, not proximity of letters or words. The system can understand numbers and units and other complex entities. If we turn back to our previous example for fuzzy matching:
Intelligent Reuse recognizes that the temperature variables in the sentences make a difference. It would place these sentences in the same micro-cluster, but recognize the temperature values as variables. Let us say that the linguistic administrator validates the first sentence of the cluster. If an author writes:
Intelligent Reuse suggests:
In other words, it offers the validated representative sentence, but preserves the value (80 degrees) that the author typed. More than translation cost or usability is at issue in this case, since the difference in operating temperature affects product safety.
While this new technology has the potential to cut costs significantly in the translation and localization cycles, one of its most promising fields of application concerns text authored by non-native speakers. Intelligent Reuse helps non-native speakers meet the challenge of formulating text in a foreign language by offering them a representative sentence that has already been checked for quality and validated. As more and more companies employ nonnative speakers to author their technical documentation, Reuse could offer enormous benefits.
Finally, Intelligent Reuse extends beyond technical documentation to software strings, where developers confront significant issues in deciding whether a message is available. Here, Intelligent Reuse represents a novel approach to a problem where there are currently few solutions. For acrolinx, Intelligent Reuse comprises part of a holistic view of enterprise information quality management.
Its initial results have been immensely promising, in terms of decreasing redundancy, improving quality, increasing productivity, and cutting costs. For the first time, information developers can implement a linguistic-based reuse strategy that makes sense.
acrolinx is market leader in quality assurance tools for professional information developers. These tools help companies worldwide to maintain their corporate image, address compliance issues, improve quality, and control document production and localization costs. Its flagship product, acrocheck™, is used internationally by thousands of customers in a variety of industries, including software, automotive, life sciences, and aerospace. acrocheck has been deployed at global enterprises like SAP, Symantec, SAS, Philips, Siemens, Motorola, and Bosch.
acrolinx maintains its headquarters in Berlin, Germany with a sales and support subsidiary in North America.
News Magazine - www.clientsidenews.com
Please see some ads as well as other content from TranslationDirectory.com: