Machine Translation Used by the US Government
By Mike Dillinger, PhD
& Laurie Gerber,
By Translation Optimization Partners
Get the List of 5,400+ Translation Agencies Now! No Recurring Membership Fees!
This
time, we look at the "parallel universe" of government translation
work and how machine translation and some variants are employed
there. Many of the new developments reported in this series
came from the AMTA (Association for Machine Translation
in the Americas) conference held October 21-25, 2008 in
Hawaii. That event was noteworthy among AMTA conferences
for the excellent Government MT Users program track. Nick
Bemish of the U.S. Defense Intelligence Agency, organized
the government program track, and has agreed to do so again
for the MT Summit conference, to be held this coming August
26-30 in Ottawa, Canada (summitxii.amtaweb.org). If you
are interested in government uses of machine translation
and you missed the conference in Hawaii, it will be worthwhile
to put the Ottawa MT Summit on your calendar now!
In the first article in this series, I described the differing
characteristics of translation for assimilation and dissemination. Whereas commercial translation is overwhelmingly
for dissemination, government translation is overwhelmingly for assimilation – information gathering purposes. There is also significant translation for communication.
Certainly many government agencies do dissemination,
translating public service information aimed at non-English speakers in the U.S. and abroad, but it is the need to assimilate information and communicate on the ground that has put government-focused L-3 Communications
on Common Sense Advisory’s "top 20" translation company list. These areas also drive the use of machine translation in the U.S. Government.
Parallel Universe
If you were a fan of the original Star Trek series, you may remember an episode in which viewers were introduced
to a parallel universe in which the familiar characters’
personalities were the opposite of what we knew. In the government translation world has a similar relationship
to the commercial translation world: the focus is on languages of the developing world, and languages of conflict rather than languages of commerce – so there are regular requirements for African, middle-eastern, and pacific region languages. Because the majority of translation
is into English from languages that few Americans have learned (Pashto, Tigrinya), human translation is often
done by source native linguists, rather than target native translators.
While the commercial world embraced translation memory first and is only now getting comfortable with machine translation, the opposite is true in government translation. The characteristics of disseminating product documentation into multiple languages that made translation
memory so effective in the commercial world are absent in an information assimilation task. The texts to be translated rarely contain sentence-level repetitions. Formatting, which is a significant part of the value of many texts being translated commercially, does not get the same attention, since it is often discarded so that translations can be searched and digested automatically. In addition, because of the volume of materials to be scanned, and the need to find "nuggets" of information within them, few government agencies have used their human translators to do full text translations. A government
linguist’s job is often to analyze a foreign language text and provide an abstract or commentary, or perhaps to select just a few passages to translate verbatim. For this reason, and because of legal/security issues surrounding
many of the texts translated, the government has not accumulated large bilingual corpora, in spite of the volume of "translation" work going on.
Machine translation has found its primary market in the government historically because of the characteristics
of assimilation work. It is often necessary for analysts
to evaluate materials of uncertain value. Only when the analyst can scan a rough translation do they know if any part of the information merits an authoritative human
translation. In addition, analysts frequently come across documents or snippets of information in foreign languages of unknown urgency. Again, machine translation
can help to clarify this and guide subsequent actions. In law enforcement and intelligence, the value of a text, and justification for a polished translation is often in the presence of information about people, places and organizations
of interest. So machine translation may be combined in sequence with other text analytic tools. Information
extraction software may identify and extract names and numbers from a text. Once extracted into a database, data mining software may be used to detect connections among the entities. In fully automated text analytics pipelines like this, sometimes no human ever looks at a full text translation.
When software vendors try to approach the US Government,
there are mysterious security hurdles, and few clear sales targets. Aside from the highly specialized language technology components, translation and text processing workflow and collaborative systems used in the government are often developed and maintained by the familiar and trusted government contractors. If you consider yourself familiar with language tools and vendors
but have only been to commercial conferences, you might indeed feel you have landed in a parallel universe at a government language tools and technology conference
when you find a well-populated tradeshow with few or no familiar vendors or tools!
Software Solutions in Government Environments
This section introduces the most common and widespread
applications that incorporate machine translation for U.S. Government use. Note that the developers of applications mentioned below typically do not develop their own machine translation software, but incorporate commercial translation software, most often from Apptek, Language Weaver, Sakhr and Systran.
Ad Hoc Translation
Many government agencies have internally developed and hosted enterprise machine translation services available
for ad-hoc translation of individual documents or cut-and-paste texts. Typically these services aggregate MT engines from multiple vendors and government sources,
making them accessible via a standard dashboard.
DOCEX
Shorthand for DOCument EXploitation, DOCEX systems enable users to translate hardcopy documents. Generally speaking, the documents must be machine printed (not handwritten). DOCEX systems include a scanner, and a computer with OCR and machine translation software. Other text processing software, workflow management and archiving capabilities are often part of such systems. DOCEX systems may be designed for large scale "document
conversion" at a permanent installation, but there are also portable versions that enable soldiers or law enforcement
to quickly assess papers encountered in the field. The primary developers of DOCEX systems are CACI and Northrop Grumman.
Broadcast Monitoring
Broadcast monitoring systems enable digital exploration of television and radio
broadcasts. Broadcast monitoring systems typically include
receivers for satellite signals, video decoding processors,
speech recognition, machine translation, information extraction
(identification of names) and multilingual search software.
In a relatively well-publicized example, the U.S. military’s
CENTCOM Open Source Intelligence unit uses the broadcast
monitoring system developed by BBN to create twice-daily
reports on events and public opinion that emerge in television
and web-based news sources in Arabic. Once the speech signal
is isolated in the broadcast, it is automatically transcribed
with speech recognition software to produce digitized Arabic
text. The information extraction software identifies mentions
of personal, place and organization names in the Arabic
transcript. The entire text is then translated automatically
into English in near real time. At CENTCOM and other places
where such systems are used, broadcast monitoring provides
a complete searchable archive of broadcasts being monitored.
Rather than dedicating an Arabic-speaking analyst to watch
every minute of all broadcasts that might be of interest
in order to capture the one or two minutes per day that
constitute important new information, English-speaking analysts
can search and skim the transcripts, and then enlist the
help of a linguist to assess the segments that may be of
interest. Virage, now a division of Autonomy, offered the
first broadcast monitoring systems and still has excellent
products. Apptek recently developed some innovative and
varied offerings along these lines.

Communication
The U.S. military has had to confront the well-known language challenges of operating in foreign countries, plus new cross-language communication challenges with the extensive international military coalition at work together
in the Middle East.
Chat
Real-time Chat/Instant Messaging incorporating machine
translation has been employed by the U.S. military coalition for several years to enable communication among coalition forces. Chat is used for operational communication, as well as informal fraternizing. The main systems have been built by Mitre Corporation from commercial IM and MT components under various names (Trans-Lingual Instant Messaging or "TrIM", Warfighter Chat, etc. )
Computer Assisted Interpretation
I credit Commonsense Advisory with coining the term Computer Assisted Interpretation, and it is an apt analogy.
Computer Assisted Interpretation is typically embodied
in a handheld device, and enables one-way translation.
Like translation memory, computer assisted translation
enables reuse of previously created authoritative translations. The Voxtec Phraselator is the most widely used system. Versions of the Phraselator are available preprogrammed with the phrases needed in a variety of situations from military checkpoints to medical intake. Phrases are designed to elicit an action or gesture (rather than spoken) response, so that the one-way translation is quite useful and interactive. In a face-to-face communication,
the user utters a phrase or combination of phrases that they know to be among the material in the interpretation system. The device retrieves the translation
and plays it aloud. This is especially important in communications where reading and writing are not practical,
such as medical intake, when communicating in the dark, and when dealing with illiterate people. Another system is the Voice Response Translator by Integrated Wave Technologies, which allows users to say the name of a common "announcement" (for example, the "Miranda rights"). The entire announcement will be played in the desired language. Both are used by law enforcement as well as military.
Speech to Speech translation
The current generation of speech-to-speech translation systems are enabled by impressive leaps in speech recognition
and machine translation technology, as well as user interface design. They are being used in the field primarily by the military. Such systems allow free flowing
conversation between any two speakers of the source and target languages. Reportedly they are being used for communication between the U.S. military and Iraqi security forces in Iraq. The most advanced systems that have been deployed were developed and evaluated in the context of the DARPA TRANSTAC program which aimed at unrestricted communication between a native speaker of American English and a native speaker of Iraqi Arabic. BBN, IBM and SRI are noted developers of such systems.
Beyond Government Use
While production translation is extremely important in the current global business environment, you can see that there are a host of tools and technologies that enable translation in many more environments. I hope that this account of the alternate universe of government translation
technologies will inspire some of you to explore commercial
uses of some of these tools!
Author Bio
Laurie Gerber has worked in the field
of machine translation for over 20 years, including system
development, research, and business development. Laurie
is also one half of Translation Optimization Partners, an
independent consultancy that specializes in translation
processes and technologies together with Mike Dillinger,
a frequent collaborator and co-author of industry-related
articles. Contact: gerbl [at] pacbell . net
Published - April 2009
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice counts!
|