Dealing With Those Darn PDFs Translators and computers translation jobs
Home More Articles Join as a Member! Post Your Job - Free! All Translation Agencies
Advertisements

Dealing With Those Darn PDFs



Become a member of TranslationDirectory.com at just $12 per month (paid per year)





If you look through the archives of discussion lists for translators, these are the two questions that are most often asked: First, what are the differences between the different computer-assisted translation tools? Second, do any of them support PDF files, and if not, what's the best way to translate those files?

I've written a lot about the first question, but I've always shied away from answering the second because there is just no real good answer.

First of all, no CAT tool really supports PDF files. Wordfast (see www.wordfast.net) does list this as one of its supported formats, but it adds a lot of disclaimers in its manual about the effectiveness of its method (through MS Word).

Second, no CAT tool ever WILL support PDF files (I would love to be proven wrong on this one!). One of the major reasons for the existence of PDFs is content protection (yes, PDF stands for "Portable Document Format," but in my opinion it could just as well stand for "Protected Document Format"), and this gives you some idea of why it's so hard to get text out of PDFs. It is possible in newer editions of Adobe Acrobat to save a PDF to an RTF, text, or XML file, but these formats have the same set of problems that you also encounter when simply copying and pasting content out of a PDF file: text that used to be a field (page numbers or cross-references) is now plain text, non-visible fields (such as the index) are gone, no styles are preserved, the formatting is gone, graphics are ignored, and, worst of all, every line break is replaced with a paragraph mark (making it essentially unusable for CAT tools). And if you're really out of luck, all the text will be garbled if your system does not support the fonts that were used in the PDFs.

So, not much light on the horizon for PDF translation, except. . .

All PDF files were originally created in a format other than PDF. Many clients send translators the PDF because they're simply too lazy to look for the original files. After all, the translator's headaches are not their headaches -- until you make them their headaches. One easy way of doing this is to charge a hefty surcharge! I have found it quite revealing that suddenly many of the "lost" source files were discovered.

Obviously, there are cases where this does not work. Either the source file truly cannot be found (or accessed), or the source format is some kind of format that you could not support anyway (the file may have been created in Quark, InDesign, or one of the other expensive DTP programs that many translators don't have), or there are legal limitations that prevent the client from simply giving you the source files. Whatever the reason, at that point you will have to find a better solution.

There are a great many conversion programs on the market that convert PDFs to RTF or HTML files (see for instance http://www.pdfstore.com/category.asp?CtgID=7), and over the years I have worked unhappily with a decent number of them. Most of them do not solve the problem of the paragraph mark at the end of each line. Even if they do, they add another layer of complication to the formatting by placing everything in text boxes. And any graphic content is treated as graphics and cannot be directly translated.

The one solution that I like and use in a very productive manner is an optical character recognition (OCR) program for scanning, such as OmniPage (see www.omnipage.com/omnipage) or ABBYY FineReader (see www.abbyy.com/finereader). Newer versions of these programs can now convert PDF files into Word documents without actually scanning them (they scan them internally). If the typeface of the originating PDF was clearly visible the results are great, particularly because even text in graphics is transformed into translatable text! The unnecessary and annoying paragraph markers are eliminated, and the only thing that doesn't work is the re-conversion of former fields into actual fields. This means that there may be some work for you to do once you have your PDF converted, but it's significantly less than with other solutions.

Both OmniPage and ABBYY have realized that this has become an increasingly popular feature of their OCR system (which in itself is pretty expensive), so they have now created much less expensive stand-alone programs that are specifically geared toward that process: PDF Converter (www.omnipage.com/pdfconverter) and PDF Transformer (www.abbyy.com/pdftransformer), with the former even supporting PDF creation in some versions.

 

 

 


© International Writers' Group. Excerpt from the Tool Kit Newsletter, a biweekly newsletter for people in the translation industry who want to get more out of their computers. For more information see www.internationalwriters.com/toolkit









Submit your article!

Read more articles - free!

Read sense of life articles!

E-mail this article to your colleague!

Need more translation jobs? Click here!

Translation agencies are welcome to register here - Free!

Freelance translators are welcome to register here - Free!









Free Newsletter

Subscribe to our free newsletter to receive news from us:

 
Menu
Recommend This Article
Read More Articles
Search Article Index
Read Sense of Life Articles
Submit Your Article
Obtain Translation Jobs
Visit Language Job Board
Post Your Translation Job!
Register Translation Agency
Submit Your Resume
Find Freelance Translators
Buy Database of Translators
Buy Database of Agencies
Obtain Blacklisted Agencies
Advertise Here
Use Free Translators
Use Free Dictionaries
Use Free Glossaries
Use Free Software
Vote in Polls for Translators
Read Testimonials
Read More Testimonials
Read Even More Testimonials
Read Yet More Testimonials
And More Testimonials!
Admire God's Creations

christianity portal
translation jobs


 

 
Copyright © 2003-2024 by TranslationDirectory.com
Legal Disclaimer
Site Map