The Okapi Framework:
Q and A with Yves Savourel of ENLASO
By Corinne McKay,
ATA-certified French to English translator based in
Boulder,
Colorado, United States
corinne@translatewrite.com
www.translatewrite.com
Get the List of 4,500+ Translation Agencies Now! No Recurring Membership Fees!
As
of October, 2005, ENLASO
has started to port a set of localization tools to
the open-source Okapi
Framework. The project's developer, Yves
Savourel, has been involved in the development of
standard XML formats for translation, such as XLIFF
and TMX. Open Source Update recently spoke with Yves,
who works as a Localization Solutions Architect at
ENLASO's headquarters in Boulder, Colorado.
Open Source Update:
Please tell us a bit about the tools that are included
in the Okapi framework.
Yves Savourel: There are three main
aspects to the Okapi Framework:
First, we have the interface specifications: These
are just API definitions for a few types of objects.
They are at the core of the framework and allow all
the different pieces to work together. Then we have
the components: These are implementations of the interfaces
and other small piece of re-usable code. Among the
different types of components, two are central to
Okapi: the Filters and the Utilities. And lastly we
have the applications: These are normal end-user applications
that use the components to provide the users with
functionalities (mostly through the filters and the
utilities).
It sounds a bit complicated, but it's
really simple: Imagine Okapi being a box of Legos.
The interface specifications are the description of
how the bricks can connect together, the components
are the bricks (some very simple, some more complex),
and the applications are the different things you
build with the bricks.
Currently all the implementations
are in .NET C# and we have two applications: Tikal
and Olifant. Tikal is a command-line tool that allows
you to run any of the utilities, and Olifant is a
TM manager tool. For the utilities we have currently:
The Text Extraction utility that puts translatable
text from input file into various format for translation,
like RTF, XLIFF, etc. The Text Merging utility that
merges back text extracted to XLIFF into its original
format. The Text Rewriting utility that extract and
merge translatable text in one pass and do some modification
to the text as specified.
For example: you can pseudo-translate
the text, or remove it all (so you can compare two
files with just the codes), etc. All utilities are
fed from filters. Currently there are two filters
implemented: one for PO files and one for Properties
files. All that is for the last release, but we have
already more filters and utilities ready to go for
the next one: a .NET Resource filter (for both ResX
and compiled .resource files), a Encoding Conversion
utility, a Byte-Order-Mark Conversion utility, and
a Line-Break Conversion utility. In addition, Rainbow,
a GUI application to launch utilities, will also be
part of the upcoming release.
OSU: Where does the
name "Okapi" come from?
YS: Just a random choice I think.
Okapis are cool animals that not many know about,
so I thought we would give them a chance to become
a little bit more famous. Seriously, the inspiration
for the name probably comes from about 10 years ago
when I was working at ILE (a localization company
that was bought by ICI/Intl.com, then by LionBridge)
we had some ideas about an [O]pen [K]it [API] for
localization tools....and yes, it's true, okapis have
a blue tongue and can use it to lick their ears.
OSU: How did the
idea to open source these tools come about?
YS: ENLASO has always offered its
localization tools as freeware. The next step was
to make them open-source. There are many reasons to
go open-source: It provides us with a more diverse
and larger set of testers; it allows others to participate
and work on parts of the toolset which are maybe less
urgent for us and for which we have less time to work
on; by offering common interfaces and tools to plug
them in we may get some third-party to also develop
filters and utilities we can use; it ensures a better
continuity in the development: if people working on
the tools leave the company they can still continue
their participation afterward; when needed, we can
provides our customers with solutions based on non-proprietary
software; and many more reasons.
OSU: What kind of
response have you gotten to the release so far?
YS: We got a lot of positive feedback
on the move to go open-source, but so far not many
offers of concrete help. People are understandably
cautious. We have to prove our framework is not just
a way to use open-source as a marketing tool, and
that is fine. In some ways it does not really matter
to us what response we get: we are just doing in open-source
the things we would be doing internally.
The nature of the tools themselves
does not help: filters for example are quite low-level
"boring" programs to write, and they are
much less prone to generate enthusiasm than a translation
editor for example. To a large extend we are still
making the bricks, people will get more interested
when we will be building things with those bricks.
As far as the response in usage: Because
the tools themselves have been freeware for several
years we are not anticipating a lot of change on how
much they are used. For now the current material available
is not yet at the same level as the "old"
freeware. We still have to port a number of parts
to Okapi.
OSU: What advice
would you offer to other for-profit companies that
are interested in open sourcing their in-house software?
YS: I guess you have to do it for
the right reasons. It may not always be the right
path. All depends on many factors that only each company
can judge.
OSU: What do you
hope to do with the tools now that they are available
to the public?
YS: We will continue working on them.
It's a never-ending job: adding new features, new
filters, implementing user feedback, and so forth.
One thing we hope for is to get some help in developing,
testing, and documenting. This will allow us to go
a little further with the tools: to provide new functionalities
that we may have not been able to tackle alone. There
are a lot of things to do before we run out of work.
OSU: Who is the target
user for these tools?
YS: I guess it depends on what part
of the framework you look at: An application like
Olifant (a TM manager) is more oriented toward translators,
project managers and to some degree localization engineers.
The low-level components like the filters, are more
for power-users or engineers, people who write scripts
to automate some of the localization process or testing/QA.
And obviously any part of the Framework can also be
re-used in large applications, so developers can also
find something to utilize. Utilities like the Encoding
Conversion can be handy for just about anyone, even
outside the localization/translation industry.
OSU: What do you
think is standing between the translation/localization
industry and more widespread use of open source software?
YS: I think there is nothing that
stands between the open-source tools and the users.
The main problem is that most of the tools are not
in front of most of the users. I'm going to be maybe
a little blunt, but hopefully no one will get offended,
it's just an opinion based on what I see today:
Most open-source translation/location
tools are Linux/Java oriented, while most of the potential
users are Windows-based. The users exist: if we look
at Wordfast--not
even an open-source tool--we can see how much enthusiasm
it generates. But many tools completely miss the target
because they simply don't aim at it: the users are
on Windows. One could answer that the Java-based tools
can run on Windows. That's true, but often I've noticed
such applications often don't quite "fit"
into the Windows environment: a lot of little details
are not working as the users would expect (shortcuts,
clipboard actions, input method, etc.) and the users
just give up on the tool very quickly. Maybe I'm wrong,
but I don't think you can really develop a good GUI
solution for one platform from another one. You often
have to be yourself a user of the environment where
your users will be working to understand better their
needs and expectations.
Linux is great, and developing open-source
for it is perfectly fine. I suspect there is a broad
use of open-source translation tools by Linux users.
But open-source is wider than that: Make Windows and
Mac open-source tools and they will get used. I've
been a little frustrated by the anti-Microsoft attitude
a part of the open-source community has: Whether we
like it or not freedom includes the freedom of choosing
Microsoft's OS. We should not dictate to the users
what system they should use or not (Isn't it one of
the main reasons behind the Linux and open-source
movements?). They are plenty of good reasons some
users want to stick with Windows and the attitude
"If you are not with us you are against us"
that we see sometime is not helping anyone.
Sorry for venting a bit...But somewhere
along the road the open-source community seems to
have lost sight of the *Users* to become, at least
partially, a stage for a platform war. And I find
that profoundly sad I was expecting better from a
community that uses words like "open", "choice"
and "freedom" so often.
If we want a given open-source tool
to be embraced by many users, it has to answer the
requirements of the majority of these potential users:
and today these users are running Windows. Whether
we can reach them by developing Java or Python tools
better fitted to work in Windows, or pure-Windows
tools, is a choice for each developer to make. I think
there are also two more problems with the open-source
translation tools:
One of them (hopefully) stems from
the first one: Most the tools are the "translation
editor"-type applications, and while there are
plenty of good projects, very few would be able to
win a side-by-side comparison with the mainstream
commercial applications. For example, many open-source
projects work without problems with software strings
but get very quickly difficult to use with large documentation-oriented
input. I think this has a lot to do to the fact that
the current open-source tools don't have a large user
base and don't have to deal with the diversity and
amount of feedback commercial tools have. In other
words, I think it is difficult to be become competitive
if you don't reach a critical mass, and today a lot
of the open-source tools don't reach that threshold
because of problem #1.
The last issue I can think of is fragmentation:
There are many little open-source projects, and it
somehow scatters the energy and good will of the users.
While I do like having several choices, and I'm all
for diversity, I wonder how much of that is impacting
the quality of the tools. If the different developers
were collaborating more maybe they would achieve more.
Maybe concentrating on two or three of the main translation
editor projects would help. The Open Language Tools
and OmegaT seem to be the projects drawing the most
users. I have no idea if grouping the effort of different
developers is possible, but it feels like the projects
are losing something by not having a better collaboration.
OSU: If you could
create a new open source application for the translation
industry, what would it be?
YS: There are a lot of different
tools that are still needed:
Something has to be done about graphics. Not just
filters for Photoshop or Illustrator, but a better
way to select interactively what needs to be translated
in graphics, some bitmap to text conversion mechanism,
etc. all this applied for translation purposes. Another
useful tool would be an application to visualize extractable
text in context and allow a user to mark parts that
should not be translated and refactor the information
in the source file using localization directives.
This way, the next time the text is extracted the
non-translatable parts are recognized. This could
be very useful for preparing JavaScript, PHP files,
etc. where many strings have nothing to do to translatable
text.
How about an open-source machine translation
engine? This could provide the translators with a
channel to make such software more to their liking,
and more useable within their work environment. I
think the old Logos system is being ported to open-source,
so maybe this is already happening. There are also
some interesting applications implementing Web Service
interfaces that could be done.
But more importantly I would try to
finish a project, or at least to reach a stage where
it's not Beta anymore, before starting a new one.
I see some open-source projects never going very far
and these are not helping the users. Obviously all
depends on your initial objectives: it's good to try
and experiment, but it could be helpful to clearly
state the aims of the projects to the user community.
OSU: Do you have
any new or upcoming projects that you can tell us
about?
YS: I've already mentioned some of
the components we will have in the next release of
the Okapi Framework. Here are a few more things that
are either "under construction" or in the
"thinking about it" phase:
We'll have a segmentation component. It does support
SRX (Segmentation Rule eXchange) and has an integrated
editor to create and test rules. We'll try to provide
SRX files for the default rules of the main translation
tools. Hopefully our Script Filter (a regular expressions-based
filter) will be in good enough shape to be useable
at the time of the next release. We'll have an MT
Query interface with a small implementation using
Google's translation engine. Nothing earth-shattering,
but it could be handy in some situations, for example
to do pseudo-translation. The XML Filter is starting
to be ported. Along with it we want to provide support
the upcoming W3C ITS standard (Internationalization
tag Set). And many more little and big ideas. You
don't have to wait for releases to see them: One of
the perks of open-source tools is that you can always
download the full source code of the project and get
a look at the latest "development" version.
But, obviously, for many it's often easier to wait
for the formal release with the installers.
OSU: Yves, thanks
very much for sharing your insights and expertise
with Open Source Update.
YS: Thank you for giving me the opportunity
to talk about open-source tools and best of luck with
the Open Source Update newsletter.
Read
more articles - Free!
E-mail
this article to your colleague!
Need
more translation jobs? Click here!
Translation
agencies are welcome to register here - Free!
Freelance
translators are welcome to register here - Free!
Subscribe
to TranslationDirectory.com newsletter - Free!
Take
part in TranslationDirectory.com poll - your voice
counts!
|