Archive

Archive for the ‘documentation localization’ Category

We all [heart] PDFs!

December 6th, 2007 Comments off

Any good localization manager (vendor- or client-side) knows that there’s very little you can do with a PDF as a source file. Yet time and again, we confront the best intentions of our customers and co-workers who say, “It’s not a very large file, so it shouldn’t cost much to translate. I’ll send it to you.” They send us a PDF.

This has happened to me with two new clients this week. We’d all like more translation business, and it’s convenient that it exists as a lingua-franca format for us, but PDF is something of a double-edged sword.

PDFs contain everything we need to view a file, but not everything we need to extract the text, formatting, callouts, frames, tags, etc. from it. Creating a localization estimate on a PDF is asking for trouble, because it smooths over a multitude of different issues that we’ll encounter once we have the source files, most of which concern text that we know requires translation, but which is not “live” in the PDF and may not be live in the source file from which the PDF came. It’s the equivalent of hard-coded strings in software, or localizing a binary without the .properties or resource files.

There are, of course, utilities for converting PDF to RTF to capture the live text and formatting, and that’s better than nothing, but it’s probably still a far cry from the Quark or InDesign or even MS Word file from which you started. I’ve sent one of my new clients back to the drawing board several times this week already:

  1. He gave me a PDF and I asked for the source file.
  2. He found the source file (Quark) and I asked for the Photoshop files from which the text-bearing graphics had originated.
  3. There were tables in the Quark file that were Illustrator objects, because these looked much better than Quark’s native tables.
  4. Another PDF of a Word document contains eight graphs created by engineers all over the building. He said he’d try to obtain the original artwork (probably PowerPoint, every engineer’s favorite Etch-a-Sketch), but I’ll be surprised if he can find it.

So, folks, we love to localize your pieces, but try to keep tabs on all the bits that you drop into them. We can do things so much better-cheaper-faster when you do.

Whaddya know? They asked me first this time!

October 19th, 2007 Comments off

Do you spend a lot of your time running to catch up to the train? Have you ever been surprised in the middle of a meeting by project plans that were well underway with no thought given yet to localization? Are you getting used to it?

What if they asked you first (or at least early on) about the project’s implications for internationalization and localization? Would you know how to react?

This certainly caught me by surprise a few months ago. A client called me in for consultation. He didn’t want me to manage the upcoming localization of his user manuals; he wanted me to review and edit the English versions so that they would be ready to localize.

This client, though small, is enlightened. The company is selling English, French, German, Spanish and Japanese versions of several products, and it has a hand-in-glove relationship with its localization company. It knows where its global bread is buttered.

I jumped at the chance to work with people thinking this far in advance, so I reviewed the manuals and submitted changes, almost all of which were acceptable.

How can you review/edit documentation with an eye to translating it?

  1. Take advantage of redundancy. Ensuring that identical sentences and paragraphs remain identical is a good way to lower per-word translation costs. Turn the text into a bookmark at its first occurrence, then invoke or cross-reference that bookmark at subsequent occurrences.
  2. Ensure that the product matches the documentation. Not all organizations get around to this, believe it or not, and it becomes a bit of value added by the internationalization/localization function.
  3. Standardize terms. Especially in companies without a well developed team of writers, manuals end up with pairs or trios of synonyms that will vex translators and add no information, so take the liberty of eliminating one in favor of the other:
    • Determine/specify
    • based on/according to
    • click the button/click on the button/select the button
    • lets you/enables you to/allows you to
  4. Mention errors and inconsistencies that have nothing to do with internationalization. Again, you increase the perceived value of the localization function. Even though the result doesn’t affect the localized products, the Localization Department (you) are contributing to a better core product.
  5. Axe a few “dead” words. They add little to the explanation, will probably not survive translation, and inflate wordcount:
    • unique
    • basically
    • popular
    • congratulations
    • very much

By the way, the review took longer than I’d anticipated, so if you have a similar opportunity, don’t bid a flat fee the first time.

Interested in this topic? Have a look at Improved Docs through Localization.

Machine translation in action

July 20th, 2007 Comments off

Has your boss asked you to use Google or AltaVista or some other flavor of machine translation to lower your translation costs?

Here’s somebody who has put his money where your boss’ mouth is.

Controlled language website attracts visitors from 110 countries

www.muegge.cc, a website dedicated to demonstrating the value of controlled language authoring and machine translation (MT), has attracted visitors from more than 110 countries since its launch in the summer of 2006. One of the unique features of this website is the fact that it uses Google language tools to automatically translate the site’s content into 15 language pairs such as German to English or English to Simplified Chinese. The website was created from the ground up for MT, and all text was written in compliance with the CLOUT rule set, a controlled language designed specifically for MT.

muegge.cc, E-mail: info@muegge.cc, Web: http://www.muegge.cc

How do they do it? By controlling the text that goes into the translation machine. The simpler, more predictable and better structured the text, the more likely it can generate a satisfactory translation. In other words, machine translation would probably work better on a page of Hemingway than on a page of Shakespeare or Faulkner.

Don’t forget, though: What you save in translation, you’ll spend in whipping your writers into line. It may not look like real dollars, but it’s time.

And time, as they say, is money.

Localizing RoboHelp projects

May 11th, 2007 1 comment

Is it time for you to localize you RoboHelp projects? What’s involved?

“RoboHelp project” is shorthand for “compiled help system.” When this lives on a Windows client computer it is usually HTML Help (CHM) files. There are other variations like Web Help, which are also compiled HTML, but which do not run on the client.

The projects are a set of HTML files, authored in a tool such as–but not limited to–RoboHelp, then compiled into a binary form that allows for indexing, hierarchy and table of contents. Other platforms (Mac OS, Linux, Java) require a different compiler, but the theory is the same.

If you’ve done localization before, you’ll find that RoboHelp projects are relatively easy, compared to a software project. RoboHelp (or whatever your authoring/compilation environment may be) creates a directory structure and file set that is easy to archive and hand off. It includes a main project file, table of contents file and index file. In fact, it’s even possible in a pinch to simply hand off the compiled file, and have the localizers decompile it; the files they need will fall into place as a result of the decompilation.

Although you may think of the project as a single entity for localization purposes, each HTML page is a separate component. There may be large numbers of these pages that don’t change from one version of your product to the next; nevertheless, you need to hand them off with the project, and you’ll likely be charged for a certain amount of “touching” that the localizer’s engineers will need to do. You may be able to save them some work and yourself some money by analyzing the project and determining which pages have no translatable changes, but by and large you should consider the costs for touching unchanged pages an unavoidable expense.

The biggest problem with these projects is in-country review. There’s no easy way for an in-country reviewer to make changes or post comments in the compiled localized version. We’ve found that MS Excel is the worst way of doing this (except for all the others), so we’ve learned to live with it.

In theory, the translators are not mucking about with any tags, so the compiled localized version should work the same as the original. Yeah, right. All the links need to be checked–they do break sometimes–and the index and table of contents should be validated. And, don’t forget to try a few searches to make sure they work; your customers surely will, and you want to spare them any unpleasant surprises.

Remember:

  • If you’ve included graphics in your help project, you’ll need to obtain the original source files. These are not GIFs or JPEGs; they will be the application files from which the GIFs and JPEGs were generated. You’ll need to hand off files from applications like Adobe Illustrator, or Flash or even PowerPoint, so that the translators can properly edit the text in them. Engineers often do quick mock-ups in Microsoft Word’s Word Art that end up in the final product, and it takes a while to track them down.
  • Encoding can be thorny. Some compilers behave oddly if you try to impose the same encoding on both the HTML pages and the table of contents, especially in Japanese, in our experience.

Getting the Writers to Care about Localized Documents

April 27th, 2007 Comments off

Do your technical writers go through the localized documents before handing them off to production?

I thought not.

It is, of course, just one more thing on a writer’s already crowded list of things to do. Add to that the appeal for the writer of going through a book in a language of which s/he has probably no notion, and you have a recipe for can’t/won’t/don’t want to.

You can go through it yourself, localization manager that you are, and you’ll probably find a few things wrong. But the writers are looking for very different things, and they have a talent for spotting them immediately. If you can get your writers around the corner on the inconvenience of the exercise, you’ll find that they add real value. The movement into and out of translation software can break things in a large document, and who better to detect such things – even with no more than a cursory overview – than the people who wrote the book in the first place?

I’ve seen writers go through translated versions of their documents and find:

  • unexplained typeface changes
  • broken or dead hyperlinks
  • missing callouts
  • untranslated text
  • incorrect document part numbers
  • corrupted graphics

The real showstopper, though, occurs at the end of a two-month translation cycle for a 300-page manual, when the writer spends ten minutes going through the book, then sends you e-mail that reads, “Nice job on the Chinese manual, but you got the wrong version translated.”

Maybe not the optimal time to find this out, but once again: Who besides the writer would have caught this?

Localizing Declarations of Conformity

March 23rd, 2007 Comments off

Does your documentation contain Declarations of Conformity with European Community standards? If it does, here is some due diligence you should undertake before having the docs translated.

The EC has promulgated a long series of directives on a variety of industries ranging from aerospace to toys. Some of these directives describe industrial policy and consumer protection. If your product falls into the category of those covered by a set of directives, then 1) the product must conform to the directives; and 2) you must declare that it conforms and list the directives with which it conforms.

This second requirement leads to some of the driest text with which you’ll ever fill pages in a user guide; for instance:

Protection requirements concerning electromagnetic compatibility to Article 3(1)(b)

Harmonised standards applied:

EN 301-489-1, V1.4.1 (2002-08); Electromagnetic compatibility and Radio spectrum Matters (ERM); Electromagnetic Compatibility (EMC) Standard for Radio Equipment and Service. Part 1: Common technical requirements

ETSI EN 301 489-25 V2.2.1 (2003-05)

Fascinating reading. And, it makes for even more fascinating translation work.

If you’re localizing your U.S. product for sale in Germany, the translation of the names of these standards with which you’re declaring conformity should match the German names acknowledged by the EC. You could hand off the English text to a German translator, who could trip through several technical dictionaries creating his own translation. The numbers of the directives would be correct (because not translated), but strictly speaking, the titles would not be correct, unless your translator was extremely lucky.

Fortunately, the EC has made this easy. Depending on the industry, they offer accepted translations of the titles and text of the directives in as many as twenty languages on their Web site. With a bit of digging, your translators can find and re-use approved text. This will not only save them (and you) time, but will ensure you of a better fit for your localized documentation.

Translation non-savings, Part II

March 2nd, 2007 Comments off

Again I ask: How far will you go to improve your localization process? If a big improvement didn’t save any obvious money, would your organization go for it?

I selected a sample of 180 files. In one set, I left all of the HTML tags and line-wrapping as they have been; in the other set, I pulled out raw, unwrapped text without HTML tags. My assumption was that the translation memory tools would find more matches in the raw, unwrapped text than in the formatted text.

I cannot yet figure out how or why – let alone what to do about it – but the matching rate dropped as a result of this experiment.

Original HTML Formatting and Tags Unwrapped, unformatted text
100% match and Repetitions 65% 51%
95-99% match 9% 14%
No match 9% 15%

This is, as they say in American comedy, a revoltin’ development. It means that the anticipated savings in translation costs won’t be there – though I suspect that the translators themselves will spend more time aligning and copy-pasting than they will translating – and that I’ll have to demonstrate process improvement elsewhere. If I can find an elsewhere.

True, the localization vendor will probably spend less time in engineering and file preparation, but I think I need to demonstrate to my client an internal improvement – less work, less time, less annoyance – rather than an external one.

Localization Train slowing

January 30th, 2007 Comments off

We’re seeing the localization juggernaut lose some steam.

In the early years, this client localized its flagship software package for developers in China, Japan and Korea (CJK), then added Brazil. It took small, reference applications into as many as 10 languages (including Hebrew and Thai) as those markets showed promise. The budget was pretty fat, the localized products were freshened frequently, and the developers were happy to have software and doc in their own language.

I suppose it was to be expected that this would peter out with time, because markets change, business cases wax and wane, and some regions never return the investment.

The new stressor on localization was less easy to anticipate: bulk. Each generation of improvements to the product brings several hundred more pages of documentation. All of this new documentation is, of course, “free” in English, but somebody has to pull out a checkbook to deal with it in other languages, and that checkbook comes out more slowly and with more misgivings these days.

Engineering and Product Management furrow their brow nowadays when I walk in with cost estimates. I’ve adapted to this change in attitude with a few techniques:

  1. The Technical Reference is the fattest target and the source of most of the expansion. It lives in a compiled help file (CHM) that is no longer written by Tech Pubs, but generated by Perl scripts from header files written by the engineers. Our modus localizandi has been to hand off the finished help project, now comprising 3700 HTML files, and have the HTML translated. In an effort to lower cost, I’m attempting a proof-of-concept to localize the header files themselves, then tune the scripts to convert them into localized HTML. This should lower our localization engineering costs considerably.
  2. I agitate for interim localization updates, peeling off documentation deltas every few weeks and handing them off for translation, even if there are no plans to release them yet. This reduces the sticker shock and time-to-market delay that comes of getting an estimate on a release only when necessary, which may be a 10- to 18-month interval. Product Management and Engineering, who only think about localization when it’s absolutely unavoidable, find the tsunami of untranslated text depressing.
  3. Although it’s not a very clean way of doing things, I screen from the localization handoff those items that I know have little to be translated. Sometimes I go to the level of resource files, but more often I take documents to which only a few minor changes have been made from one En version to the next, hand off changed text, then place the translations myself. This is not for the faint of heart, nor for those who don’t really know the languages involved, but it can save some money.
  4. I try to keep global plates spinning, in the hope that more people will consider the global dimension of what we do, and the fact that localization is the necessary step for making your product acceptable to people whose use of your product will make you money, if you make it easy for them.
  5. I never impart bad news on Friday.

Improved Docs through Localization

January 16th, 2007 Comments off

I spent some time on the phone with new clients last week, going through a user guide they plan to have localized. Discussing the usual localization questions (i.e., the ones I figured the translators would ask sooner or later), we began to edge towards the initially depressing realm of Changing Documentation to Suit Localization.

“Don’t misunderstand,” I repeated intoned, “I’m not trying to get you to re-write an already published book just to make localization easier. We’re just bringing up small issues in how you can write future books a bit more generically so that you can take exactly what you’ve published in English and hand it off for localization without customizing it first.”

Still, I thought I detected a collective, resigned sigh from them. I’ve learned by now that it translates to “Writing for translation is really going to be a pain, isn’t it?”

They then asked for suggestions about optimizing future documents for localization purposes, in the form of guidelines or style guides. This is good thinking, and I told them so. It amounts to documentation internationalization.

I’ve read plenty of articles on how to do this (authors include Kit Brown of Comgenesis and Nancy Combe), but I usually find them superficial (leave white space, use numbered callouts, be sure to do the software first…), because the solution doesn’t lie in documents, but rather in each organization and in the way that Engineering, Product Management, Tech Pubs and the overseas partners work together.

I told them that they could read up on this for a month, or we could all just go through the process of localizing an already written manual and make our own guidelines. The former won’t do any harm, but I think they’ll find that the latter will result in more – and more-specific – pointers that will apply to future books.

The important thing is to arrive at gradual changes that the company will tolerate in the next 3/6/12 months, so that their books become more global without the localization-tail wagging the dog.

Localization Conundrum

December 20th, 2006 Comments off

My client received a request from Korea for a localized version 6.5. There are two issues:

  1. It’s going to cost a lot, because the last version localized into Ko was version 5.01.
  2. English is up to version 8, and the process of creating the help is much better than in 6.5 . Should we include those enhancements to 6.5 Ko, even though they would take it out of parity with 6.5 En?

I experimented to see whether the improvements mattered to the localization process in general and the cost in particular. I re-created portions of version 5.01 help using version 5.01 Perl scripts, then did portions of that same help using version 6.5 Perl scripts. Then I handed both sets off for wordcount analysis. They were within 2-3% of each other, so the cost-savings in translation are not there.

However, I suspected that the vendor would charge me a lot more for engineering on the 5.01 help, because the version 6.5 scripts are much cleaner, and they handle the raw text much better. This compelled me to examine the matter further.

Better help or not, the problem is one of product management. Even if 6.5 help is “better,” it differs too much from 5.01 help. I imagine a Korean customer struggling to bounce back and forth between 5.01 En and Ko, and puzzling at the discrepancies, even though the Ko version had a lot more information than the En version.

They are the sort of discrepancies that make cowards of us all (albeit well advised cowards). I’ve decided to hand off the pure 5.01 En help system for this project, warts and all.