Archive

Archive for the ‘localization tools’ Category

Not-so-simple Plugin Bridges Engineer-Translator Gap

June 29th, 2010 1 comment

Bridging gap between engineers and translatorsI said it a few months ago and I’ll say it again: Nobody needs Lua resource files.

Just when CAT tools, parsers and SaaS implementations give us new features and functionality to simplify life for translators, new formats like this creep up to bite us. This is particularly true in mobile apps, since that is the new frontier in software development.

So when a mobile apps client decided to change the format for resource files from XML to a roll-your-own format designed for the Lua scripting language, I foresaw difficulties between unsophisticated translators and the format and advised the engineers on a plugin that would smooth the process of getting translatable strings into a format translators could use.

The client hasn’t yet released the plugin for its development platform, but it’s coming shortly. It takes this:

ModRsc {
--
 name  ="IDS_EXITCONFIRM_HDRSTR",
 id    =1800,
 type  = 1,
 data  =EncStringRscData(0x03, "Exit Application?"),
}
ModRsc {
--
 name  ="IDS_EXITCOFIRM_BODYTXT",
 id    = 1801,
 type  = 1,
 data  =EncStringRscData(0xff, "Are you sure you want to exit?"),
}
ModRsc {
--
 name  ="IDS_PRIVACY_POLICY",
 id    = 1802,
 type  = 1,
 data  =EncStringRscData(0x03, "Privacy Policy"),
}
ModRsc {
--
 name  ="IDS_RATINGSINFO_HDR",
 id    = 1807,
 type  = 1,
 data  =EncStringRscData(0xff, "Ratings Info"),
}
ModRsc {
--
 name  ="IDS_THANKYOUFREE_TXT",
 id    = 1808,
 type  = 1,
 data  =EncStringRscData(0xff, "Thanks for your download!"),
}

- which you really don’t want to hand off to a translator and which could be parsed if an engineer wrote a good enough regular expression for it – and turns it into this:

translatable strings from resource files (Lua)

Not a very big deal for five strings, but quite a time-saver once you reach 50, 100, 200 strings.

You hand this .xlsx file off to translators, they translate into column D, they send it back to you, and the plugin takes the translation and round-trips it into the Lua resource format. That’s a great deal more accessible to translators, and it’s important to make them happy; otherwise, they can’t localize your software.

So, why am I still not content?

I’m not content because it takes a lot of software to perform this conversion:

  • Microsoft Visual Studio
  • Visual Studio Office Runtime
  • software development kit for this mobile app platform
  • .NET Framework

You may find these on the computers of software developers, but not likely on the computers of most of the people who would normally be tasked with handing off strings to translators: program managers, QA leads, tech writers, even localization project managers. And few translators would invest in all of this, let alone be interested in configuring it as needed.

Still, it’s inherent to the beast. Since this industry began, tools have been tying the Gordian knot between the necessary complexity of making text display in software and the necessary simplicity of letting translators perform their work.

If there were a solution located right in the middle of these two extremes, we’d have come up with it by now.

John White of venTAJA Marketing is a localization project manager and consultant. He is also a marketing communications writer for technology and language companies.

photo credit: David Kitching / CC BY-SA 2.0

Salesforce.com Localization – A Work in Progress

April 29th, 2010 No comments

Rebecca Ray and I had a chat about localization and software as a service (SaaS) products.

This had my dander up because I’ve been tangentially associated with the localization of a Salesforce.com application for the last few months, and there are some gaping holes between how you localize one and how we as an industry have come to understand localizing everything else in the world.

The industry standard, of course, is to externalize everything in need of translation so that it is not part of the code base. Anything else is what we refer to as a “giant localization leap backwards.” Once all of the bits of UI are out of code and into a resource file, we hand it off to computer-assisted translation (CAT) tools that make it easy for translators to do their work. The tools use fuzzy matching and lookups to help in translating consistently throughout the product, documentation, marketing collateral, Website, etc. The resulting translated resource files then go back into the code, and the result is a localized product.

Salesforce.com is doing something different. The bad news is that it’s annoying; the good news is that they realize it and have plans to address it.

(Disclaimer: Salesforce.com is hosting next week’s Localization Unconference in San Mateo, and I plan to attend, so I should not be a churlish guest by slanging their outrageously successful product. I shall be polite.)

Translation in the Cloud

If you use Salesforce.com, you know that you access it on the Web through a browser. It does not reside on your computer the way that, say, a copy of Microsoft Excel does. This “cloud” model is popular, and Salesforce.com is hardly alone in providing it: Intuit, Oracle, SAP, Google Docs and a jillion other vendors and packages run this way. In fact, Lionbridge wants to move the entire localization function into the cloud with its GeoWorkz initiative.

The problem is that, not only is all of your company’s data in the cloud, but everything in the user interface is, too. So, if your company developed an elaborate customization to its version of Salesforce.com’s product – as my client has – and if you suddenly needed to localize all of it, you need to translate a great many strings that reside in the cloud, and which you cannot simply spin down into a resource file and hand off to a translator.

Salesforce.com has done this deliberately. In their fervor to keep everything in the cloud, they have built and made available their Translation Workbench utility, which allows your translators – you do know who your translators are, don’t you? – to translate your customized application in the cloud.

The problem is…

It’s not the Jedi way.

When translators work in the cloud, they have no recourse to the above-described CAT tools that contribute so roundly to their ability to deliver a professional, consistent translation. It’s like “tastes great” and “less filling:” you can’t really have both, despite what the commercials would have you believe.

They need to scribble notes to themselves and remember stuff. If several of them are working on the project, they have to phone or IM one another, instead of having all the history and intelligence reside in the tools. It’s like going back to translating when all you had was a typewriter, a dictionary and a sheet of paper.

The Other Way to Localize a Salesforce.com Application

The forward-thinking localization product manager at Salesforce.com, Shawna Wolverton, is painfully aware of this problem. She told me that the company has in place a somewhat clunky procedure for exporting all translatable strings through the use of a high-end version of  Salesforce.com, the Force.com IDE, the Eclipse IDE, three forceps, a banana and a bicycle pump.

Shawna has made available a document on this procedure, titled “Localizing with the Force.com IDE.” As the document describes the procedure,

The Force.com IDE can offer great time savings by allowing you to work with your translations in XML files and use a simple interface to load the translations into Salesforce.com easily.

Even with my 10% markup, you can obtain the document for free.

The result of this procedure is that you will have XML files which your translators or language service provider can pull into CAT tools, translate and hand back to you for re-import to your application.

Shawna also tells me that they envision an even easier export-import function as an option in the main product sometime in the near future.

“What’s next, Johnny?”

I’m glad you asked.

Ten years ago, we had the same problem, except that instead of translatable content residing in the cloud, it was in our content management systems (CMS). Companies had invested mega-bundles of money to centralize documents in CMS, and people in my position felt, well, silly having to pull documents and files out of the CMS, attach them to e-mail messages or burn CDs with them, and send them to our localization partners. Silly, I say.

The vendors came to our rescue by building interfaces between their CAT tools (or the server versions thereof) and our CMS, so that they could find changed files, pull them out for handoff to translators while we slept, and check them back into the correct language branch in CMS before we knew they’d even been touched.

I have no doubt that this similar predicament will inspire vendors to enable their tools for the cloud, so that they can find your Salesforce.com application, translate the UI while you’re asleep, and let people in other countries work in their own language before you’ve had your first cup of coffee the next morning.

Ain’t technology grand?

John White of venTAJA Marketing is a localization project manager and consultant.

International Keyboard Frenzy

July 31st, 2008 Comments off

My wife is traveling through Europe, sending us e-mail from Internet cafés along the way. Here’s one I received this morning:

thnks for the msgs.  i luv zou and miss zou 9sorrz i hav a bratixlava kezboard0. will trz longer message in a few dazs. 
love, hugs and kisses.
She’s actually a pretty good typist, but she was flummoxed by the keyboard on the computer she used in Bratislava, Slovakia, because several of the keys are in different places from where here fingers expected them to be on a U.S.-English keyboard. The interface between fingers and keys is a fragile one in computing. 
Of course, my wife could have tinkered with the Regional Settings control panel (Windows) or International system preference (MacOS) to disregard the hardware keyboard and interpret the keystrokes according to any other supported keyboard layout (like U.S.-English), but machines in Internet cafés are probably not set up to allow that kind of modification without administrative permission.
Thosé of üs whö frequently wríte with çharacterß from othër langüages not natively supported by our hardware need keyboard tricks to do so.
DOS
  • Are there any dinosaurs out there who remember how to do this besides me? To generate ü on a U.S.-English keyboard, for example, you had to hold down the left Alt key and enter 129 on the keypad. The left Alt key accessed the ASCII characters above 128.
  • I don’t think Latin-based operating systems supported non-Roman characters; you had to either buy that version of the OS or get special software to add the functionality. (Who cares? It’s ancient history.)

WINDOWS

  • U.S.-English users can use the U.S.-International keyboard layout to generate combined Latin characters like ëüöàñçß¿¡. I use it as my default mapping. It takes a bit of getting used to the change in how you use your quotation mark key -” ‘ – because you hit it before the key you want to accentuate. 
  • You can also Insert Symbol in most Windows applications, but this is clunky. 
  • For Asian and other non-Latin characters, or to map a different soft keyboard over your hardware keyboard, enable a different input language in the Regional Settings control panel. (This may require installing additional fonts in some exotic languages.)

MAC OS

  • Right out of the box, you can use the same keyboard tricks that have been in place since System 7. Option + e tells the OS that you want an accent aigu over the next character, such as e or a; option + u generates the diaeresis or umlaut over the next character, and other option + combinations result in other common accented Roman characters.
  • From the International system preference you can display a character palette in the desired language, then select the characters as you need them, or you can impose a software keyboard over your hardware keyboard. 
  • There’s also full support for Asian and other non-Latin input methods, but again, you may need to install fonts (e.g., for Indic languages) from your original installer discs.

I have no doubt that these functions are elegantly handled in Unix/Linux variants as well, but I have the disadvantage of never spending time on them. Post a comment if you have useful tips on this.

How do you handle multilingual character input in your daily work?

Localizing Robohelp Files – The Basics

May 29th, 2008 Comments off

We get a lot of search engine queries like “localize Robohelp file” and “translate help project.” I’m pretty sure that most of them come from technical writers who have used Robohelp to create help projects (Compiled HTML Help Format), and who have suddenly received the assignment to get the projects localized.

The short answer
Find a localization company who can demonstrate to your satisfaction that it has done this before, and hand off the entire English version of your project – .hpj, .hhc, .hhk, .htm/.html and, of course, the .chm. Then go back to your regularly scheduled crisis. You should give the final version a quick smoke test before releasing it, for your own edification as well as to see whether anything is conspicuously missing or wrong.

The medium answer
Maybe you don’t have the inclination or budget to have this done professionally, and you want to localize the CHM in house. Or perhaps you’re the in-country partner of a company whose product needs localizing, and you’ve convinced yourself that it cannot be that much harder than translating a text file, so why not try it?

You’re partially right: it’s not impossible. In fact, it’s even possible to decompile all of the HTML pages out of the binary CHM and start work from there. But your best bet is to obtain the entire help project mentioned above and then use translation memory software to simplify the process. Once you’ve finished translating, you’ll need to compile the localized CHM using Robohelp or another help-authoring product (even hhc.exe).

The long answer
This is the medium answer with a bit more detail and several warnings.

  • There may be a way to translate inside the compiled help file, but I wouldn’t trust it. Fundamentally, it’s necessary to translate all of the HTML pages, then recompile the CHM; thus, it requires translation talent and some light engineering talent. If you don’t have either one, then stop and go back to The Short Answer.
  • hhc.exe is the Microsoft HTML Help compiler that comes with Windows. It’s part of the HTML Help Workshop freely available from Microsoft. This workshop is not an authoring environment like Robohelp, but it offers the engineering muscle to create a CHM once you have created all of the HTML content. If you have to localize a CHM without recourse to the original project, you can use hhc.exe to decompile all of the HTML pages out of the CHM.
  • Robohelp combines an authoring environment for creating the HTML pages and the hooks to the HTML Help compiler. As such, it is the one-stop shopping solution for creating a CHM. However, it is known to introduce formatting and features that confuse the standard compiler, such that some Robohelp projects need to be compiled in Robohelp.
  • Robohelp was developed by BlueSky Software, which morphed into eHelp, which was acquired by Macromedia, which Adobe bought. Along the way it made some decisions about Asian languages that resulted in the need to compile Asian language projects with the Asian language version of Robohelp. This non-international approach was complicated by the fact that not all English versions of Robohelp were available for Asian languages. Perhaps Adobe has dealt with this by now, but if you’re still authoring in early versions, be prepared for your localization vendor to tell you that it needs to use an even earlier Asian- language version.
  • Because the hierarchical table of contents is not HTML, you may find that you need to assign to it a different encoding from that of the HTML pages for everything to show up properly in the localized CHM, especially in double-byte languages.
  • The main value in a CHM lies in the links from one page to another. In a complex project, these links can get quite long. Translators should stay away from them, and the best way to accomplish that is with translation memory software such as Déjà Vu, SDL Trados, across or Wordfast. These tools insulate tags and other untouchable elements from even novice translators.

We’ve marveled at how many search engine queries there are about localizing these projects, and we think that Robohelp and the other authoring environments have done a poor job explaining what’s involved.

If you liked this article have a look atLocalizing Robohelp Projects.”

SDL TMS or Idiom WorldServer?

January 10th, 2008 Comments off

Question from one of the subscribers to this blog:

“We are in the process of bringing on a workflow tool. In general, for software, DITA/XML, and Frame files, do you prefer working with the SDL Translation Management System, or Idiom WorldServer, or another program? I have my own ideas, but I’m always curious to hear the opinions of other localization professionals.”

I recommended asking pointed questions to ensure the chosen vendor/solution:

  • doesn’t lock you in to a particular LSP, or out of freelance translators who won’t have the tool
  • manages the native file formats, without conversion
  • allows you to talk to internal technical leads (not just salespeople)
  • offers integration with your version control system, so that you’re not manually moving files to and from your engineering repository.

Those of you with experience using these tools, kindly comment.

Where do your glossaries live?

November 16th, 2007 1 comment

The experienced project manager with your localization/translation vendor approaches a new client/project by asking you, “Has this ever been translated before?” Her big goal is to discover whether there’s a translation memory database floating around, to help her translators do their work more quickly and keep your costs low, and her background goal is to find existing documents with key terms already translated and approved.

Smart companies maintain these key terms in a “glossary” or terminology list. Glossaries are far less comprehensive than translation memory because they serve a slightly different purpose: Instead of proposing a fuzzy-match translation for an entire sentence, they serve as a reference for the translators. Good translators know how to find translations for generally accepted terms like “closed-loop servomechanism” and “high-definition multimedia interface,” but if the sales manager in your Shanghai office has already told you how he likes to see the word translated, everybody will be happier if that preference is observed.

So where do your glossaries live?

“Live” is the important word, because glossaries change and grow with time. Most glossaries I’ve seen are in a spreadsheet or word processing document. While that’s better than nothing, it can suffer from decentralization, since updates don’t always make it to everybody involved in the project, and some translators run the risk of using old terminology.

One of my more localization-savvy clients makes its glossary available on its partner portal, requiring a login and password. The php-based application, which is actually hosted by a translation vendor, allows searching in multiple languages. My client deliberately does not make the glossary available for download or export; this ensures that everybody is using the same version with all updates.

I like this model. The assets reside on the client/owner’s site, and the terminology “lives” with the linguistic experts, who can easily modify it. It’s a bit more work for the translator, who would rather have a flat-file document, but overall it serves linguistic interests well. It’s tried-and-true technology built in to most computer-aided translation tools.

What are you doing with your glossaries?

Translation non-savings, Part II

March 2nd, 2007 Comments off

Again I ask: How far will you go to improve your localization process? If a big improvement didn’t save any obvious money, would your organization go for it?

I selected a sample of 180 files. In one set, I left all of the HTML tags and line-wrapping as they have been; in the other set, I pulled out raw, unwrapped text without HTML tags. My assumption was that the translation memory tools would find more matches in the raw, unwrapped text than in the formatted text.

I cannot yet figure out how or why – let alone what to do about it – but the matching rate dropped as a result of this experiment.

Original HTML Formatting and Tags Unwrapped, unformatted text
100% match and Repetitions 65% 51%
95-99% match 9% 14%
No match 9% 15%

This is, as they say in American comedy, a revoltin’ development. It means that the anticipated savings in translation costs won’t be there – though I suspect that the translators themselves will spend more time aligning and copy-pasting than they will translating – and that I’ll have to demonstrate process improvement elsewhere. If I can find an elsewhere.

True, the localization vendor will probably spend less time in engineering and file preparation, but I think I need to demonstrate to my client an internal improvement – less work, less time, less annoyance – rather than an external one.

Favorite Localization Tools

December 16th, 2006 Comments off

Here’s a short list of Windows-based tools I use a great deal in managing localization projects:

Beyond Compare
– Clients constantly drill me about the differences between the last version of their product and this version, with an eye to the order of magnitude of localization expense they’re in for. Beyond Compare is the best tool I’ve found for finding the files that have changed, then comparing older and newer versions of files in a specialized viewer. Good technical support as well.

EmEditor – As long as you have the font and OS support installed, you can view multi-byte characters in their appropriate applications under English-language Windows, but EmEditor allows you to change the encoding of a text file to better display it, or so that you can edit it. My standard text editor is Ultra-Edit, which has excellent search-and-replace capability, but it’s not as deft as EmEditor for multibyte work on an English OS.

SDLX Glue – An obscure utility inside the SDLX suite, this will append up to I don’t know how many hundred HTML files together. Translation vendors like it for work on big sites because it slashes the number of files being slung around. Naturally, it includes an unglue utility as well.

FAR – A technical writer introduced me to this utility, which includes a compiler system for HTML Help and MS Help. It will compile CHM files in any language such that, if you have a good HTML authoring tool, you don’t need RoboHelp to build your CHMs. (Unfortunately, I’ve had problems when I’ve tried to use FAR on projects that have been created in RoboHelp, but there are some ways around them.)

Moreover, FAR stands for “Find And Replace”, and this is hands down the best front end on regular expressions that I’ve ever found. The Holy Grail of search-and-replace is ignoring line breaks, and while regex supports that, not many utilities (that I’ve found) implement it. For instance, in the text

In a white room

with black curtains

at the station

if your goal was to find “room with black curtains at”, most utilities would not be able to locate it because of the line breaks. FAR does find it, and even allows you to replace the text with line breaks. Top-flight technical support also.

Most of these are shareware, but they’re well worth the US$25-$50.

(compiling CHMs, finding and replacing across line breaks)

Doing the Localization Vendor’s Work?

September 25th, 2006 Comments off

Sometimes I know too much about this process.

Or, maybe I’m just too nice a guy.

To make things easier for the vendor (and cheaper for me) I’ve resolved to carve the 3200 HTML files in the API Reference CHM into different buckets, depending on whether and how much they require translation vs. engineering. Naturally, the ultimate arbiter is the Trados or SDLX analysis that the vendor will perform, but I’ve already mentioned my concern about false positives and need write no more on the topic here.

My tool of choice is the extremely capable Beyond Compare which, at US$30, is worth it just to see how well thought-out a software package it is. I compare version 3.9 files against version 4 files, tuning the comparison rules to groom the file buckets as accurately as possible.

The distribution is not perfect, if for no other reason than because its first level of triage is the filename and not the file contents, but it’s better than guessing, and it’s much better than thousands of false positives.

Once I’ve gone through the files, I’ll have a better idea of how to label the buckets in a way that meets both my needs and those of the vendor.

At least, I think I’m being too nice a guy. Maybe this is just a big pain for the vendor, and they’re too polite to inform me of that.