Assisted writing and editing at SAS
By John Kohl,
SAS Institute
Get the List of 4,500+ Translation Agencies Now! No Recurring Membership Fees!
In
last month’s CSN, Uwe Muegge speculated about why
we don’t hear about more companies using controlled
languages. According to Muegge, "anyone new to
the field may have a hard time fi nding reliable,
vendor-independent information on what [controlled-language]
solutions are available and what the costs and benefi
ts of deploying those solutions are." (1)
We found that to be true in our investigations
at SAS Institute as well, but at least part of the
problem is in the interpretation of "controlled
language." In the course of our investigation,
we found that the technology has evolved to controlled-authoring,
and that it is no longer limited to helping authors
conform to a strictly controlled language such as
Simplified Technical English. Instead, many companies
use controlled-authoring software to:
- ensure a high degree of language
quality and consistency in their publications
- increase the productivity of content
authors, editors, and translators
- help non-native authors produce
betterquality English source texts
- or other business reasons.
Our investigations led us to one such
controlledauthoring product: acrocheckâ„¢. The acrocheck
suite of Content Quality Management tools is based
on a Natural Language Processing engine that evolved
over the course of 15 years of R&D at the German
Research Institute for Artifi cial Intelligence (DFKI
in Saarbrücken). Acrocheck is sold and supported by
acrolinx GmbH in Berlin, with offi ces in the US.
OVERVIEW OF SAS
SAS is the largest privately owned
software company in the world, and it is the global
leader in business intelligence and analytical software.
It has 10,000 employees worldwide and annual revenues
of about $1.9 billion. In our Documentation Division
we have 53 technical writers and 12 editors. Of course,
we have content creators in other divisions as well,
but so far we have implemented acrocheck only in the
Documentation Division.
WHY ACROCHECK?
Our implementation was motivated partly
by the need to standardize and control terminology.
In recent years, SAS products have become more integrated.
We also began publishing documentation on the Web
with a consolidated index and full-text search. Terminology
issues became more visible to us, and to customers,
than ever before.
The intensifi ed pace of globalization
also meant that we had to fi nd an effi cient way
of making our documentation more suitable for translation
and easier for nonnative speakers of English to understand.
To address this second issue, we have developed a
detailed set of "Global English" guidelines.
But even the best technical writers fi nd it diffi
cult to apply complex style guidelines or to consistently
conform to lists of approved and deprecated terms.
Deadlines and time pressures make it impractical for
authors and editors to refer to style guides and glossaries
frequently.
Since SAS is all about using technology
to support business processes and decision-making,
it is only natural that we would look for a technological
solution to help our authors follow our style and
terminology guidelines. We also anticipate that the
increased consistency in our documentation will make
the use of translation memory more effective, and
that consistent terminology and phrasing will make
our documentation more usable for all our audiences.
IMPLEMENTATION OVERVIEW
We were fortunate to have an active
executive-level champion, in addition to great management
support throughout the company. To emphasize the goal
of helping our authors communicate clearly and consistently,
we used Assisted Writing and Editing (AWE) as the
name of the project. Although we realize that we are
controlling the English language to some degree, we
wanted to avoid the negative connotation of the term
"controlled authoring." Besides, we’re not
really depriving authors of anything that they can’t
easily do without; we’re just helping them make optimal
choices.
The rollout, which began in May of
this year, has gone quite smoothly. Overall, the response
from our writers and editors has been extremely favorable.
We’ve gotten quite a number of positive comments,
including the following:
- "This looks like a great
tool and I’m really glad you all have gone ahead
with it. It should free up editors to do some deeper
edits."
- "I’m really impressed with
this software. I had no diffi culty installing or
running it."
- "I’ve used it on some HTML
Help fi les. Very pleased with the results. It was
straightforward and really nailed me in a few places.
(Ouch!)"
- "This tool is AWEsome! I
love it!"
Because acrocheck gives authors immediate
feedback on their own writing, they quickly learn
to follow guidelines that they never quite grasped
before. After an initial productivity hit, this training
effect leads to the opposite: a signifi cant productivity
increase. Writers fi x grammar, spelling, style and
terminology issues early in the writing process, so
there are fewer corrections to be made late in the
documentation cycle, when the pressure to deliver
is greatest. Because much of the copy editing work
is now done during the writing process, our editors
have more time to devote to more substantive issues.
IMPLEMENTATION DETAILS
The whole implementation process,
from the initial decision to proceed through our rollout,
took a little more than a year- although I hasten
to add that our experience was atypical. Most companies
have done it in half that time, or less.
The first issue that contributed to
the extended timeline was our extensive collection
of deprecated terms, in addition to approved terms,
for which we had some background information that
we did not want to lose track of. Because that information
was scattered around in several places, it took a
while to consolidate it into one Excel spreadsheet
that we could then use as the basis for an acrocheck
term bank. Then we had to specify what the Help topics
for those terms should look like, because we wanted
to structure them differently than in acrocheck’s
default approach.
Second, SAS documentation contains
many oddities that we had to "teach" acrocheck
to handle. For example, acrocheck initially interpreted
the word "%tm-filter" (the name of a software
concept called a macro) as two "tokens"-the
percent sign and "tmfi lter." That issue
became apparent when "tmfi lter" was fl
agged as a spelling error, as if it had no percent
sign attached to it. Acrolinx defi ned dozens of token
classes such as "PercentLowercaseWord" for
us so that acrocheck would recognize that these "word
shapes" were single terms and that they should
not be checked for spelling. According to the chief
linguist at acrolinx, SAS has more token classes than
any other acrolinx customer!
Third, a few months into our implementation,
acrolinx rebuilt their batch client, which we planned
to use for checking HTML documents. We gladly accepted
a two-month delay because the new client included
support for checking SGML documents. That was a huge
benefi t to SAS. We are moving to an XML-based publishing
system, but a lot of our content is still authored
in SGML.
Fourth, we tested and optimized the
acrocheck rules quite thoroughly in order to reduce
"false alarms" to a minimum. In hindsight,
it wasn’t necessary to be so thorough-most of the
standard acrocheck rules are quite accurate "out
of the box"-but at the time we were perhaps more
concerned about user acceptance than we needed to
be.
To facilitate testing, we assembled
a large collection of our documentation as our test
corpus, which took quite a while. We ended up with
64,000 fi les and 17,000,000 words in our collection.
We used a 1,000,000- word subset of the collection
for early testing of new rules that we asked acrolinx
to develop. We would run acrocheck in batch mode against
the entire collection, review the output from each
rule, work with acrolinx to make corrections, and
test again.
Most of the refi nements to the grammar
and style rules refl ect the nature of our content.
For example, acrocheck fl ags the following sentence
as an error because "the at" seems to be
an ungrammatical sequence of words:
The remaining seven characters can
include letters, digits, underscores, the dollar sign
($), or the at sign (@).
But you can prevent that "false
alarm" from being triggered by modifying the
rule so that it ignores any occurrence of "the
at" that is immediately followed by "sign."
Here’s another example. Acrocheck
fl ags "a HMDA" as an error and suggests
"an HMDA" instead:
To view a HMDA Edit Analysis Report,
complete these steps:
But "HMDA" is pronounced
as an acronym (HUM-dah), not as an initialism (H-M-D-A).
So "a HMDA" is correct, and we needed to
tweak the rule to prevent this false alarm.
Pilot projects were another part of
the implementation process. It takes time to solicit
volunteers, do initial training, and collect feedback.
So that’s another factor on the customer end that
affects how long an acrocheck implementation takes.
And we heavily customized the Help fi les for grammar
rules and style rules to use examples that came from
our own documentation.
RESULTS
With our initial set of style rules
and terminology guidance, we’ve found that acrocheck
eliminates a lot of unnecessary variation. Even at
this early stage of our implementation, we can see
that the writing/editing process is more effi cient,
and that our terminology is much more consistent.
It’s too soon to have hard fi gures,
but we certainly expect to see reductions in translation
time and cost as a result of standardizing our terminology,
our phrasing, and even our punctuation. Since we don’t
have a content management system, writers often cut
and paste information between documents, sometimes
modifying the material according to their stylistic
preferences. Then the editors suggest additional changes.
Acrocheck’s consistent, objective feedback minimizes
most of this variability. Its ability to fl ag unnecessary
words and phrases also reduces the volume of words
to be translated.
Another major and unexpected benefi
t is that the deprecated terms that we collected for
use with acrocheck are now being used by our R&D
divisions. Developers run scripts that detect any
deprecated terms that are in their software messages
and user-interface labels. As I’m sure all CSN readers
will understand, fi xing terminology problems that
far upstream in the development process is a dream
come true.
THE FUTURE
In the coming months, we plan to put
more attention on exploiting acrocheck’s "intelligent
reuse" functionality-the ability to identify
variant sentences, not just variant terms and phrases.
We’ll identify standard sentences, and acrocheck will
fl ag linguistically equivalent variants to be replaced
by the standard sentence.
I’m also hoping that we will put more
attention on content reduction. I totally agree with
Hans Fenstermacher that that is really the best way
to reduce localization costs. Now that our editors
are spending less time marking up the issues that
acrocheck detects, they will be able to focus more
on eliminating unnecessary content.
It’s also quite likely that other
divisions at SAS will begin using acrocheck as soon
as we are ready to support them. We need to better
understand the ongoing support requirements before
we reach out to other divisions.
Currently I provide most of the support,
but a few of my colleagues assist with training, systems
support, and by supporting the interaction between
acrocheck and our XML-based publishing system. We
get very few trouble reports from users-maybe because
we were so thorough in eliminating most of those false
alarms-but I have no shortage of work. Unlike most
acrocheck administrators, I use the acrocheck development
environment to develop and test my own style rules,
in addition to collaborating with acrolinx on other
shared interest rules.
We need to determine how far we want
to take acrocheck’s functionality. The amount of support
that is required isn’t that great, but you defifi
nitely get out of it what you put into it. You can
develop your own controlled language if you are that
ambitious and have the skills. (I have dreamed of
doing that for years!) However, at SAS we don’t ever
want to constrain our authors too much or produce
language that sounds stilted to native speakers. So
the challenge will be to see how far we can go toward
controlled language without crossing that line.
(FOOTNOTES)
(1)
Muegge, Uwe. Controlled Language. ClientSideNews Magazine
7.7 (July 2007): 21-24.
ClientSide
News Magazine - www.clientsidenews.com
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!
|