The Okapi Framework: Q and A with Yves Savourel of ENLASO

Home

Join as a Member!

Post Your Job - Free!

All Translation Agencies

Advertisements

The Okapi Framework: Q and A with Yves Savourel of ENLASO

By Corinne McKay,
ATA-certified French to English translator based in Boulder,
Colorado, United States

corinne@translatewrite.com
www.translatewrite.com

Become a member of TranslationDirectory.com at just $12 per month (paid per year)

Corinne McKay As of October, 2005, ENLASO has started to port a set of localization tools to the open-source Okapi Framework. The project's developer, Yves Savourel, has been involved in the development of standard XML formats for translation, such as XLIFF and TMX. Open Source Update recently spoke with Yves, who works as a Localization Solutions Architect at ENLASO's headquarters in Boulder, Colorado.

Open Source Update: Please tell us a bit about the tools that are included in the Okapi framework.
Yves Savourel: There are three main aspects to the Okapi Framework:
First, we have the interface specifications: These are just API definitions for a few types of objects. They are at the core of the framework and allow all the different pieces to work together. Then we have the components: These are implementations of the interfaces and other small piece of re-usable code. Among the different types of components, two are central to Okapi: the Filters and the Utilities. And lastly we have the applications: These are normal end-user applications that use the components to provide the users with functionalities (mostly through the filters and the utilities).

It sounds a bit complicated, but it's really simple: Imagine Okapi being a box of Legos. The interface specifications are the description of how the bricks can connect together, the components are the bricks (some very simple, some more complex), and the applications are the different things you build with the bricks.

Currently all the implementations are in .NET C# and we have two applications: Tikal and Olifant. Tikal is a command-line tool that allows you to run any of the utilities, and Olifant is a TM manager tool. For the utilities we have currently: The Text Extraction utility that puts translatable text from input file into various format for translation, like RTF, XLIFF, etc. The Text Merging utility that merges back text extracted to XLIFF into its original format. The Text Rewriting utility that extract and merge translatable text in one pass and do some modification to the text as specified.

For example: you can pseudo-translate the text, or remove it all (so you can compare two files with just the codes), etc. All utilities are fed from filters. Currently there are two filters implemented: one for PO files and one for Properties files. All that is for the last release, but we have already more filters and utilities ready to go for the next one: a .NET Resource filter (for both ResX and compiled .resource files), a Encoding Conversion utility, a Byte-Order-Mark Conversion utility, and a Line-Break Conversion utility. In addition, Rainbow, a GUI application to launch utilities, will also be part of the upcoming release.

OSU: Where does the name "Okapi" come from?
YS: Just a random choice I think. Okapis are cool animals that not many know about, so I thought we would give them a chance to become a little bit more famous. Seriously, the inspiration for the name probably comes from about 10 years ago when I was working at ILE (a localization company that was bought by ICI/Intl.com, then by LionBridge) we had some ideas about an [O]pen [K]it [API] for localization tools....and yes, it's true, okapis have a blue tongue and can use it to lick their ears.

OSU: How did the idea to open source these tools come about?
YS: ENLASO has always offered its localization tools as freeware. The next step was to make them open-source. There are many reasons to go open-source: It provides us with a more diverse and larger set of testers; it allows others to participate and work on parts of the toolset which are maybe less urgent for us and for which we have less time to work on; by offering common interfaces and tools to plug them in we may get some third-party to also develop filters and utilities we can use; it ensures a better continuity in the development: if people working on the tools leave the company they can still continue their participation afterward; when needed, we can provides our customers with solutions based on non-proprietary software; and many more reasons.

OSU: What kind of response have you gotten to the release so far?
YS: We got a lot of positive feedback on the move to go open-source, but so far not many offers of concrete help. People are understandably cautious. We have to prove our framework is not just a way to use open-source as a marketing tool, and that is fine. In some ways it does not really matter to us what response we get: we are just doing in open-source the things we would be doing internally.

The nature of the tools themselves does not help: filters for example are quite low-level "boring" programs to write, and they are much less prone to generate enthusiasm than a translation editor for example. To a large extend we are still making the bricks, people will get more interested when we will be building things with those bricks.

As far as the response in usage: Because the tools themselves have been freeware for several years we are not anticipating a lot of change on how much they are used. For now the current material available is not yet at the same level as the "old" freeware. We still have to port a number of parts to Okapi.

OSU: What advice would you offer to other for-profit companies that are interested in open sourcing their in-house software?
YS: I guess you have to do it for the right reasons. It may not always be the right path. All depends on many factors that only each company can judge.

OSU: What do you hope to do with the tools now that they are available to the public?
YS: We will continue working on them. It's a never-ending job: adding new features, new filters, implementing user feedback, and so forth. One thing we hope for is to get some help in developing, testing, and documenting. This will allow us to go a little further with the tools: to provide new functionalities that we may have not been able to tackle alone. There are a lot of things to do before we run out of work.

OSU: Who is the target user for these tools?
YS: I guess it depends on what part of the framework you look at: An application like Olifant (a TM manager) is more oriented toward translators, project managers and to some degree localization engineers. The low-level components like the filters, are more for power-users or engineers, people who write scripts to automate some of the localization process or testing/QA. And obviously any part of the Framework can also be re-used in large applications, so developers can also find something to utilize. Utilities like the Encoding Conversion can be handy for just about anyone, even outside the localization/translation industry.

OSU: What do you think is standing between the translation/localization industry and more widespread use of open source software?
YS: I think there is nothing that stands between the open-source tools and the users. The main problem is that most of the tools are not in front of most of the users. I'm going to be maybe a little blunt, but hopefully no one will get offended, it's just an opinion based on what I see today:

Most open-source translation/location tools are Linux/Java oriented, while most of the potential users are Windows-based. The users exist: if we look at Wordfast--not even an open-source tool--we can see how much enthusiasm it generates. But many tools completely miss the target because they simply don't aim at it: the users are on Windows. One could answer that the Java-based tools can run on Windows. That's true, but often I've noticed such applications often don't quite "fit" into the Windows environment: a lot of little details are not working as the users would expect (shortcuts, clipboard actions, input method, etc.) and the users just give up on the tool very quickly. Maybe I'm wrong, but I don't think you can really develop a good GUI solution for one platform from another one. You often have to be yourself a user of the environment where your users will be working to understand better their needs and expectations.

Linux is great, and developing open-source for it is perfectly fine. I suspect there is a broad use of open-source translation tools by Linux users. But open-source is wider than that: Make Windows and Mac open-source tools and they will get used. I've been a little frustrated by the anti-Microsoft attitude a part of the open-source community has: Whether we like it or not freedom includes the freedom of choosing Microsoft's OS. We should not dictate to the users what system they should use or not (Isn't it one of the main reasons behind the Linux and open-source movements?). They are plenty of good reasons some users want to stick with Windows and the attitude "If you are not with us you are against us" that we see sometime is not helping anyone.

Sorry for venting a bit...But somewhere along the road the open-source community seems to have lost sight of the *Users* to become, at least partially, a stage for a platform war. And I find that profoundly sad I was expecting better from a community that uses words like "open", "choice" and "freedom" so often.

If we want a given open-source tool to be embraced by many users, it has to answer the requirements of the majority of these potential users: and today these users are running Windows. Whether we can reach them by developing Java or Python tools better fitted to work in Windows, or pure-Windows tools, is a choice for each developer to make. I think there are also two more problems with the open-source translation tools:

One of them (hopefully) stems from the first one: Most the tools are the "translation editor"-type applications, and while there are plenty of good projects, very few would be able to win a side-by-side comparison with the mainstream commercial applications. For example, many open-source projects work without problems with software strings but get very quickly difficult to use with large documentation-oriented input. I think this has a lot to do to the fact that the current open-source tools don't have a large user base and don't have to deal with the diversity and amount of feedback commercial tools have. In other words, I think it is difficult to be become competitive if you don't reach a critical mass, and today a lot of the open-source tools don't reach that threshold because of problem #1.

The last issue I can think of is fragmentation: There are many little open-source projects, and it somehow scatters the energy and good will of the users. While I do like having several choices, and I'm all for diversity, I wonder how much of that is impacting the quality of the tools. If the different developers were collaborating more maybe they would achieve more. Maybe concentrating on two or three of the main translation editor projects would help. The Open Language Tools and OmegaT seem to be the projects drawing the most users. I have no idea if grouping the effort of different developers is possible, but it feels like the projects are losing something by not having a better collaboration.

OSU: If you could create a new open source application for the translation industry, what would it be?
YS: There are a lot of different tools that are still needed:
Something has to be done about graphics. Not just filters for Photoshop or Illustrator, but a better way to select interactively what needs to be translated in graphics, some bitmap to text conversion mechanism, etc. all this applied for translation purposes. Another useful tool would be an application to visualize extractable text in context and allow a user to mark parts that should not be translated and refactor the information in the source file using localization directives. This way, the next time the text is extracted the non-translatable parts are recognized. This could be very useful for preparing JavaScript, PHP files, etc. where many strings have nothing to do to translatable text.

How about an open-source machine translation engine? This could provide the translators with a channel to make such software more to their liking, and more useable within their work environment. I think the old Logos system is being ported to open-source, so maybe this is already happening. There are also some interesting applications implementing Web Service interfaces that could be done.

But more importantly I would try to finish a project, or at least to reach a stage where it's not Beta anymore, before starting a new one. I see some open-source projects never going very far and these are not helping the users. Obviously all depends on your initial objectives: it's good to try and experiment, but it could be helpful to clearly state the aims of the projects to the user community.

OSU: Do you have any new or upcoming projects that you can tell us about?
YS: I've already mentioned some of the components we will have in the next release of the Okapi Framework. Here are a few more things that are either "under construction" or in the "thinking about it" phase:
We'll have a segmentation component. It does support SRX (Segmentation Rule eXchange) and has an integrated editor to create and test rules. We'll try to provide SRX files for the default rules of the main translation tools. Hopefully our Script Filter (a regular expressions-based filter) will be in good enough shape to be useable at the time of the next release. We'll have an MT Query interface with a small implementation using Google's translation engine. Nothing earth-shattering, but it could be handy in some situations, for example to do pseudo-translation. The XML Filter is starting to be ported. Along with it we want to provide support the upcoming W3C ITS standard (Internationalization tag Set). And many more little and big ideas. You don't have to wait for releases to see them: One of the perks of open-source tools is that you can always download the full source code of the project and get a look at the latest "development" version. But, obviously, for many it's often easier to wait for the formal release with the installers.

OSU: Yves, thanks very much for sharing your insights and expertise with Open Source Update.
YS: Thank you for giving me the opportunity to talk about open-source tools and best of luck with the Open Source Update newsletter.

Submit your article!