Assisted writing and editing at SAS
In last month’s CSN, Uwe Muegge speculated about why we don’t hear about more companies using controlled languages. According to Muegge, "anyone new to the field may have a hard time fi nding reliable, vendor-independent information on what [controlled-language] solutions are available and what the costs and benefi ts of deploying those solutions are." (1)
We found that to be true in our investigations at SAS Institute as well, but at least part of the problem is in the interpretation of "controlled language." In the course of our investigation, we found that the technology has evolved to controlled-authoring, and that it is no longer limited to helping authors conform to a strictly controlled language such as Simplified Technical English. Instead, many companies use controlled-authoring software to:
Our investigations led us to one such controlledauthoring product: acrocheck™. The acrocheck suite of Content Quality Management tools is based on a Natural Language Processing engine that evolved over the course of 15 years of R&D at the German Research Institute for Artifi cial Intelligence (DFKI in Saarbrücken). Acrocheck is sold and supported by acrolinx GmbH in Berlin, with offi ces in the US.
OVERVIEW OF SAS
SAS is the largest privately owned software company in the world, and it is the global leader in business intelligence and analytical software. It has 10,000 employees worldwide and annual revenues of about $1.9 billion. In our Documentation Division we have 53 technical writers and 12 editors. Of course, we have content creators in other divisions as well, but so far we have implemented acrocheck only in the Documentation Division.
Our implementation was motivated partly by the need to standardize and control terminology. In recent years, SAS products have become more integrated. We also began publishing documentation on the Web with a consolidated index and full-text search. Terminology issues became more visible to us, and to customers, than ever before.
The intensifi ed pace of globalization also meant that we had to fi nd an effi cient way of making our documentation more suitable for translation and easier for nonnative speakers of English to understand. To address this second issue, we have developed a detailed set of "Global English" guidelines. But even the best technical writers fi nd it diffi cult to apply complex style guidelines or to consistently conform to lists of approved and deprecated terms. Deadlines and time pressures make it impractical for authors and editors to refer to style guides and glossaries frequently.
Since SAS is all about using technology to support business processes and decision-making, it is only natural that we would look for a technological solution to help our authors follow our style and terminology guidelines. We also anticipate that the increased consistency in our documentation will make the use of translation memory more effective, and that consistent terminology and phrasing will make our documentation more usable for all our audiences.
We were fortunate to have an active executive-level champion, in addition to great management support throughout the company. To emphasize the goal of helping our authors communicate clearly and consistently, we used Assisted Writing and Editing (AWE) as the name of the project. Although we realize that we are controlling the English language to some degree, we wanted to avoid the negative connotation of the term "controlled authoring." Besides, we’re not really depriving authors of anything that they can’t easily do without; we’re just helping them make optimal choices.
The rollout, which began in May of this year, has gone quite smoothly. Overall, the response from our writers and editors has been extremely favorable. We’ve gotten quite a number of positive comments, including the following:
Because acrocheck gives authors immediate feedback on their own writing, they quickly learn to follow guidelines that they never quite grasped before. After an initial productivity hit, this training effect leads to the opposite: a signifi cant productivity increase. Writers fi x grammar, spelling, style and terminology issues early in the writing process, so there are fewer corrections to be made late in the documentation cycle, when the pressure to deliver is greatest. Because much of the copy editing work is now done during the writing process, our editors have more time to devote to more substantive issues.
The whole implementation process, from the initial decision to proceed through our rollout, took a little more than a year- although I hasten to add that our experience was atypical. Most companies have done it in half that time, or less.
The first issue that contributed to the extended timeline was our extensive collection of deprecated terms, in addition to approved terms, for which we had some background information that we did not want to lose track of. Because that information was scattered around in several places, it took a while to consolidate it into one Excel spreadsheet that we could then use as the basis for an acrocheck term bank. Then we had to specify what the Help topics for those terms should look like, because we wanted to structure them differently than in acrocheck’s default approach.
Second, SAS documentation contains many oddities that we had to "teach" acrocheck to handle. For example, acrocheck initially interpreted the word "%tm-filter" (the name of a software concept called a macro) as two "tokens"-the percent sign and "tmfi lter." That issue became apparent when "tmfi lter" was fl agged as a spelling error, as if it had no percent sign attached to it. Acrolinx defi ned dozens of token classes such as "PercentLowercaseWord" for us so that acrocheck would recognize that these "word shapes" were single terms and that they should not be checked for spelling. According to the chief linguist at acrolinx, SAS has more token classes than any other acrolinx customer!
Third, a few months into our implementation, acrolinx rebuilt their batch client, which we planned to use for checking HTML documents. We gladly accepted a two-month delay because the new client included support for checking SGML documents. That was a huge benefi t to SAS. We are moving to an XML-based publishing system, but a lot of our content is still authored in SGML.
Fourth, we tested and optimized the acrocheck rules quite thoroughly in order to reduce "false alarms" to a minimum. In hindsight, it wasn’t necessary to be so thorough-most of the standard acrocheck rules are quite accurate "out of the box"-but at the time we were perhaps more concerned about user acceptance than we needed to be.
To facilitate testing, we assembled a large collection of our documentation as our test corpus, which took quite a while. We ended up with 64,000 fi les and 17,000,000 words in our collection. We used a 1,000,000- word subset of the collection for early testing of new rules that we asked acrolinx to develop. We would run acrocheck in batch mode against the entire collection, review the output from each rule, work with acrolinx to make corrections, and test again.
Most of the refi nements to the grammar and style rules refl ect the nature of our content. For example, acrocheck fl ags the following sentence as an error because "the at" seems to be an ungrammatical sequence of words:
The remaining seven characters can include letters, digits, underscores, the dollar sign ($), or the at sign (@).
But you can prevent that "false alarm" from being triggered by modifying the rule so that it ignores any occurrence of "the at" that is immediately followed by "sign."
Here’s another example. Acrocheck fl ags "a HMDA" as an error and suggests "an HMDA" instead:
To view a HMDA Edit Analysis Report, complete these steps:
But "HMDA" is pronounced as an acronym (HUM-dah), not as an initialism (H-M-D-A). So "a HMDA" is correct, and we needed to tweak the rule to prevent this false alarm.
Pilot projects were another part of the implementation process. It takes time to solicit volunteers, do initial training, and collect feedback. So that’s another factor on the customer end that affects how long an acrocheck implementation takes. And we heavily customized the Help fi les for grammar rules and style rules to use examples that came from our own documentation.
With our initial set of style rules and terminology guidance, we’ve found that acrocheck eliminates a lot of unnecessary variation. Even at this early stage of our implementation, we can see that the writing/editing process is more effi cient, and that our terminology is much more consistent.
It’s too soon to have hard fi gures, but we certainly expect to see reductions in translation time and cost as a result of standardizing our terminology, our phrasing, and even our punctuation. Since we don’t have a content management system, writers often cut and paste information between documents, sometimes modifying the material according to their stylistic preferences. Then the editors suggest additional changes. Acrocheck’s consistent, objective feedback minimizes most of this variability. Its ability to fl ag unnecessary words and phrases also reduces the volume of words to be translated.
Another major and unexpected benefi t is that the deprecated terms that we collected for use with acrocheck are now being used by our R&D divisions. Developers run scripts that detect any deprecated terms that are in their software messages and user-interface labels. As I’m sure all CSN readers will understand, fi xing terminology problems that far upstream in the development process is a dream come true.
In the coming months, we plan to put more attention on exploiting acrocheck’s "intelligent reuse" functionality-the ability to identify variant sentences, not just variant terms and phrases. We’ll identify standard sentences, and acrocheck will fl ag linguistically equivalent variants to be replaced by the standard sentence.
I’m also hoping that we will put more attention on content reduction. I totally agree with Hans Fenstermacher that that is really the best way to reduce localization costs. Now that our editors are spending less time marking up the issues that acrocheck detects, they will be able to focus more on eliminating unnecessary content.
It’s also quite likely that other divisions at SAS will begin using acrocheck as soon as we are ready to support them. We need to better understand the ongoing support requirements before we reach out to other divisions.
Currently I provide most of the support, but a few of my colleagues assist with training, systems support, and by supporting the interaction between acrocheck and our XML-based publishing system. We get very few trouble reports from users-maybe because we were so thorough in eliminating most of those false alarms-but I have no shortage of work. Unlike most acrocheck administrators, I use the acrocheck development environment to develop and test my own style rules, in addition to collaborating with acrolinx on other shared interest rules.
We need to determine how far we want to take acrocheck’s functionality. The amount of support that is required isn’t that great, but you defifi nitely get out of it what you put into it. You can develop your own controlled language if you are that ambitious and have the skills. (I have dreamed of doing that for years!) However, at SAS we don’t ever want to constrain our authors too much or produce language that sounds stilted to native speakers. So the challenge will be to see how far we can go toward controlled language without crossing that line.
(1) Muegge, Uwe. Controlled Language. ClientSideNews Magazine 7.7 (July 2007): 21-24.
ClientSide News Magazine - www.clientsidenews.com
Please see some ads as well as other content from TranslationDirectory.com: