Archive

Archive for the ‘HTML localization’ Category

Segmentation and Translation Memory

September 20th, 2006 Comments off

To get the broken sentences in the new files to find their equivalents (or even just fuzzy matches) in translation memory we have three options:

  1. Modify the Perl scripts that extract the text from the header files into the HTML, so that the scripts no longer introduce the hard returns.
  2. Massage the HTML files themselves and replace the hard returns with spaces.
  3. Tune the segmentation rules in Trados such that it ignores the hard returns (but only the ones we want it to ignore) and doesn’t consider the segment finished until it gets to a hard stop/period.

To go as far upstream as possible, I suppose we should opt for #1 and fix the problem at its source. This seems optimal, unless we subsequently break more things than we repair. Options #2 and #3 are neat hacks and good opportunities to exercise fun tools, but they burn up time and still don’t fix the problem upstream.

Also, I don’t want the tail to wag the dog. The money spent in translating false positives may be less than the time and money spent in fixing the problem.

August 1st, 2006 Comments off

So the API Ref weighs in at 3280 HTML pages now, about 750 more than in the last release.

The trick will be in figuring out which of these zillions of pages have substantive changes (i.e., new translatable text, changed translatable text) and which have changed due to non-translatable issues (i.e., changes to the HTML code inside the tags). Translation memory tools are meant to ignore the latter, but I can’t leave good translation inside outdated HTML; something is bound to break, or at least look bad, if we shuffle multiple generations of HTML code and tag conventions together and compile it.


I don’t think the TM tools are going to rescue me from this. I should figure out a way to translate the source header files instead of the downstream HTML files.

Categories: HTML localization Tags: