Archive

Archive for the ‘resource file localization’ Category

Lua Resource Files – Nobody Needs These

May 5th, 2010 3 comments

One client’s team of extremely enthusiastic engineers has moved its product off of XML-formatted resource files and onto Lua files.

Perhaps this proves once again that XML is not exactly the promised land; something lies beyond it.

There is some organizational momentum toward Lua. The client is in the business of making mobile software – à la “There’s an app for that” (client is not Apple, before you ask) – and Lua is a darling of the mobile gaming community because it is a lightweight, high-performance language. In fact, World of Warcraft uses Lua.

Still, when it comes to handing the resource files off for translation, as it has in the last couple of days, Lua looks like yet another format in a world that we thought had room for no more.

The strings look like this:

--[[Standard Strings--]]
ModRsc {
	name="IDS_CLEAR",
	type=1,
	id=1110,
	data =EncStringRscData(0x03, "Clear")
}
--[[noneuse 'none' to indicate no URL or specify URL (e.g. http://www.johnwhitepaper.com/)--]]
ModRsc {
	name="IDS_START_URL",
	type=1,
	id=1111,
	data =EncStringRscData(0x03, "none")
}

The strings are similar to those of a typical value-pair resource file, like a .rc or a .properties file, and they even support comments, wherein one could pack context for translators.

But the CAT tool parsers don’t like it. Both the string ID and the string itself appear in quotation marks, so even if you created a rule to isolate all quoted text, you’d end up with the string names and the strings. The tools have evolved so far that it’s frustrating to see them grab elements we don’t need, then have to tell the translator to ignore those.

Last year, I posted on the issue of unsophisticated translators and file formats. At the time, the client’s India development center was working on a way of extracting these strings to .xls files. Sadly, there has been no progress worth announcing, so here we are with tweezers, pulling out the strings for translation.

I don’t see an easy way around it. Do you?

John White of venTAJA Marketing is a localization project manager and consultant.

photo credit:http://www.flickr.com/photos/cursedthing/ / CC BY-ND 2.0

Salesforce.com Localization – A Work in Progress

April 29th, 2010 No comments

Rebecca Ray and I had a chat about localization and software as a service (SaaS) products.

This had my dander up because I’ve been tangentially associated with the localization of a Salesforce.com application for the last few months, and there are some gaping holes between how you localize one and how we as an industry have come to understand localizing everything else in the world.

The industry standard, of course, is to externalize everything in need of translation so that it is not part of the code base. Anything else is what we refer to as a “giant localization leap backwards.” Once all of the bits of UI are out of code and into a resource file, we hand it off to computer-assisted translation (CAT) tools that make it easy for translators to do their work. The tools use fuzzy matching and lookups to help in translating consistently throughout the product, documentation, marketing collateral, Website, etc. The resulting translated resource files then go back into the code, and the result is a localized product.

Salesforce.com is doing something different. The bad news is that it’s annoying; the good news is that they realize it and have plans to address it.

(Disclaimer: Salesforce.com is hosting next week’s Localization Unconference in San Mateo, and I plan to attend, so I should not be a churlish guest by slanging their outrageously successful product. I shall be polite.)

Translation in the Cloud

If you use Salesforce.com, you know that you access it on the Web through a browser. It does not reside on your computer the way that, say, a copy of Microsoft Excel does. This “cloud” model is popular, and Salesforce.com is hardly alone in providing it: Intuit, Oracle, SAP, Google Docs and a jillion other vendors and packages run this way. In fact, Lionbridge wants to move the entire localization function into the cloud with its GeoWorkz initiative.

The problem is that, not only is all of your company’s data in the cloud, but everything in the user interface is, too. So, if your company developed an elaborate customization to its version of Salesforce.com’s product – as my client has – and if you suddenly needed to localize all of it, you need to translate a great many strings that reside in the cloud, and which you cannot simply spin down into a resource file and hand off to a translator.

Salesforce.com has done this deliberately. In their fervor to keep everything in the cloud, they have built and made available their Translation Workbench utility, which allows your translators – you do know who your translators are, don’t you? – to translate your customized application in the cloud.

The problem is…

It’s not the Jedi way.

When translators work in the cloud, they have no recourse to the above-described CAT tools that contribute so roundly to their ability to deliver a professional, consistent translation. It’s like “tastes great” and “less filling:” you can’t really have both, despite what the commercials would have you believe.

They need to scribble notes to themselves and remember stuff. If several of them are working on the project, they have to phone or IM one another, instead of having all the history and intelligence reside in the tools. It’s like going back to translating when all you had was a typewriter, a dictionary and a sheet of paper.

The Other Way to Localize a Salesforce.com Application

The forward-thinking localization product manager at Salesforce.com, Shawna Wolverton, is painfully aware of this problem. She told me that the company has in place a somewhat clunky procedure for exporting all translatable strings through the use of a high-end version of  Salesforce.com, the Force.com IDE, the Eclipse IDE, three forceps, a banana and a bicycle pump.

Shawna has made available a document on this procedure, titled “Localizing with the Force.com IDE.” As the document describes the procedure,

The Force.com IDE can offer great time savings by allowing you to work with your translations in XML files and use a simple interface to load the translations into Salesforce.com easily.

Even with my 10% markup, you can obtain the document for free.

The result of this procedure is that you will have XML files which your translators or language service provider can pull into CAT tools, translate and hand back to you for re-import to your application.

Shawna also tells me that they envision an even easier export-import function as an option in the main product sometime in the near future.

“What’s next, Johnny?”

I’m glad you asked.

Ten years ago, we had the same problem, except that instead of translatable content residing in the cloud, it was in our content management systems (CMS). Companies had invested mega-bundles of money to centralize documents in CMS, and people in my position felt, well, silly having to pull documents and files out of the CMS, attach them to e-mail messages or burn CDs with them, and send them to our localization partners. Silly, I say.

The vendors came to our rescue by building interfaces between their CAT tools (or the server versions thereof) and our CMS, so that they could find changed files, pull them out for handoff to translators while we slept, and check them back into the correct language branch in CMS before we knew they’d even been touched.

I have no doubt that this similar predicament will inspire vendors to enable their tools for the cloud, so that they can find your Salesforce.com application, translate the UI while you’re asleep, and let people in other countries work in their own language before you’ve had your first cup of coffee the next morning.

Ain’t technology grand?

John White of venTAJA Marketing is a localization project manager and consultant.

Unsophisticated Translator vs. File Format – Who Wins?

November 19th, 2009 1 comment

translator-file-format-confusionAmong the dozens of unsung challenges of translation, file formats are among the most infamous.

There is almost always a disconnect between the format in which content creators save their files and the format (or the tools, really) in which the translators want to work. The content is in .pdf, and the translators want to work with MS Word .doc files. The software application works with .xml, and the translators want to work in with MS Excel .xls files.

One client, which distributes a platform for creating mobile phone applications, is grappling with this at the moment. It manages resources in a structured file format based on a programming language called Lua, so its translatable text looks like this:

ModRsc {

name  =”IDS_STRING_1001″,

id    = 1001,

type  = 1,

data  =EncStringRscData(0×03, “This is a new app.”),

}

They have asked me how translation works, because they’re willing to build a converter into their editing tool that takes resources like “This is a new app” and puts them into a format (.doc, .xls, .txt, etc.) that translators can use.

I applaud this kind of thinking, and spent about an hour on the phone with their development team in Hyderabad discussing it.

Background on Translators

There are two kinds of translators:

  1. Sophisticated – proper localization companies using computer-aided translation (CAT) tools and professional linguists
  2. Unsophisticated – “Here, Bob (or Najiv or Youli or Ramesh). Have your brother-in-law translate this for us.”

Sophisticated translators will use tools that parse XML, HTML, .rc, .properties, etc. files flawlessly, isolating the text for them to translate and hiding the code, tags and other things we don’t want them going near. As long as the convention is reliable – e.g., translate anything in quotation marks – the sophisticated translator can modify the parser to find and extract the text. These translators and tools are appropriate for apps with any number of translatable strings.

Unsophisticated translators are limited to tools like MS Word and Excel. Therefore, somebody or something needs to pull out and transform the translatable text into these formats. Then, after translation, somebody or something needs to reverse the transformation and put the text back in. These translators and tools are appropriate for apps with small numbers (<100) of translatable strings.

Recommendation for Resource Files

I explained that the Lua files would pose no problem for sophisticated translators. Tools like Trados, SDLX and Déjà Vu will easily parse and isolate translatable content in quotation marks.

For unsophisticated translators, one of the engineers suggested creating a plugin for MS Word that would parse all translatable text and allow translators to work in their favorite tool. There would be no need for developing transformations or conversion routines to go from Lua format into .xls/.txt/.doc/etc. The plugin could save the translated version back out in native Lua format.

Everybody wins with this solution.

  • My client’s engineers get off the hook of creating an intricate program to cover multiple output and input formats for resources.
  • The app developer has an easy way of pushing resources out to translators and pulling in the translated result.
  • Sophisticated translators don’t need to change the way they work at all.
  • Unsophisticated translators get a simple format that only requires that they install an MS Word plugin.

What kinds of file-format trade-offs do you have to make?

John White of venTAJA Marketing is a localization project manager and consultant.

photo credit: woodleywonderworks

"I can quit smoking whenever I want to."

July 24th, 2008 Comments off
“…I just don’t want to.”

Have you heard that one before? I heard something similar last week from a director of engineering:

“All of our strings are embedded in source code. This is deliberate, and we planned it very carefully.”

How would you have reacted?

At first, I figured he was pulling my leg (“taking the mickey,” “having me on,” etc.). Then he explained the process of localizing strings in the gnu gettext model, which can live peacefully without external resources.

A line of code reading

result = wx.MessageDialog(_(“Welcome to my blog. Today is %s”), date.today)

uses the _ function in the English context as an identity function. In a localized context it will load the language pack built using the gnu gettext utilities and map the English strings to the localized equivalent:

“Welcome to my blog. Today is %s” -> “Bienvenido a mi blog. Hoy es %s”

To redeem what seems like shortsightedness in allowing developers to embed strings in code, these utilities also contain scripts that can pull out all the English strings from source code and make localization packages, which translators can work on without danger of touching the code. Other scripts can push the localized strings back into place.

Like .properties files in Java and .rc files in C++, these localization packages isolate non-code elements for easy localization. However, a programmer’s coding mistake could still result in strings going undetected by the scripts, so I still plan to perform pseudo-translation and internationalization testing on this software as soon as possible.

Just in case the director of engineering can’t quit smoking as easily as he thinks he can.

Keeping an eye on Catalyst

November 29th, 2007 Comments off

In localization, “Catalyst” is a tool from Alchemy Software. Among other things, it allows you to localize UI elements within software resource files, sometimes without the need to rebuild the software manually into binary format.

Since software binaries come from text files, part of Catalyst’s value lies in straddling the divide between allowing the translator to change strings in the these text files (say, from English to Japanese) and displaying them in the binary, run-time format in which the user will see them on screen.

Last month a vendor returned some resource files to me which we had them localize from English to Japanese. I rebuilt the binaries (language-resource DLLs) and ran them. Unfortunately, a number of items were suddenly missing from the Japanese menus, so I had to troubleshoot the problem.

My first thought was that either a person or a tool (or a person using a tool) had modified something that should not be affected by the localization process. I had handed off a resource file containing these lines:

32777 MENU DISCARDABLE
BEGIN
POPUP “&Tools”
BEGIN
MENUITEM “Serial P&ort Settings…”, ID_TOOLS_SERIALPORTSETTINGS
MENUITEM “&Network Settings…”, ID_TOOLS_NETWORK
MENUITEM “&Battery Settings…”, ID_TOOLS_BATTERYSETTINGS
END
END

32779 MENU DISCARDABLE
BEGIN
POPUP “&File”
END

They returned to me a resource file containing these strings:

9 MENU DISCARDABLE
BEGIN
POPUP “???(&T)”
BEGIN
MENUITEM “??????????(&O)…”, ID_TOOLS_SERIALPORTSETTINGS
MENUITEM “????????(&N)…”, ID_TOOLS_NETWORK
MENUITEM “???????(&B)…”, ID_TOOLS_BATTERYSETTINGS
END
END

11 MENU DISCARDABLE
BEGIN
POPUP “????(&F)”
END

There was nothing wrong with the translation, and the string IDs were intact. The product has long been “double-byte clean,” so I knew that the software was not gagging on the Japanese characters.

The problem lay in the menu ID numbers, which are 32777 and 32779 in the English, but which came back in the Japanese files as 9 and 11. The vendor believes that Catalyst changed them, since they had used it to for resizing and QA.

Normally, this renumbering has no effect on how the binary functions. In this case, however, it has a profound effect on how the binary functions, because there is code somewhere in the software that is looking for “32777″ and “32779″ and when it doesn’t find those ID’s, it cannot complete the menu. This is poor internationalization in the code base which I have discussed with Engineering, to no avail, so I need to police the resource files in each round of localization.

How is Catalyst working for you? Have you seen similar problems?

Interested in this topic? You might enjoy another article I’ve written called Localized Binaries – The Plot Thickens

How to pseudo-translate, Part I

March 6th, 2007 Comments off

Before you localize your software product, wouldn’t you like to have an idea of what’s going to break as a result?

If you’ve written it in English, it will surprise and alarm you to learn that that’s no assurance that it will work when the user interface (UI) is in Chinese or Arabic or maybe even Spanish. The most conspicuous vulnerabilities are:

  • text swell, in which “prompt” becomes “Eingabeausforderung” in German, for example, and the 40 pixels of width you’ve reserved in the English UI results in only a small part of the German appearing;
  • corrupted characters, which will show up in the UI as question marks or little black boxes because characters such as à, ü, ¿, ß, Ø and ??? aren’t in the code page or encoding under which your software is compiled;
  • illegible or invalid names of files and paths, which occur when installing your software on an operating system that will handle more kinds of characters than your product will;
  • crashes, which occur when your software mishandles the strange characters so badly that the program just giggles briefly and then dies;
  • ethnocentric business logic, which leads to ridiculous results when users select unanticipated countries or currencies;
  • hard-coded anything, whether currency symbols, standards of measurement (metric vs. English) or UI strings.

In the past, localization efforts have become stranded on these beaches late in the voyage, after the text has been translated and the binaries rebuilt. It needn’t be that way.

Internationalization testing is the process of pushing alien characters and situations down your software’s throat to see what breaks. The more complex the software, the more complex the testing, such that there are companies that specialize in internationalization as much as if not more than localization.

It’s not rocket science, but it doesn’t happen on its own, either. And, you don’t want your customers worldwide doing any more of your internationalization testing than absolutely necessary, because they really don’t appreciate buying the product and then testing it.

The process requires some cooperation between Engineering and QA, which should already be in place for the domestic product and can easily be extended to the international products as well. An upcoming post will explain some of the tools and techniques for proper internationalization testing.

Localized Binaries – The Plot Thickens

December 7th, 2006 Comments off

The engineer has demonstrated that it is no longer possible to build just the resource binaries; it is now necessary to build the entire blinking product.

“Why is that?” I ask.

“We’ve improved the makefile,” he replies. The makefile is a script used by the make command to build binaries.

“That doesn’t feel like an improvement to me,” I venture. “Why can’t I just build the two or three resource binaries I need? I don’t need all of the executables and other rot.”

“Yes, well, we’ve improved the makefile.”

“But there was a small, localized makefile that lived in each of the directories of the resource binaries I wanted. What happened to them?”

“We improved the main makefile by rolling all of those lower-level makefiles into it.”

That’s a hint to me that they improved it for the purpose of creating all of the files that go into the installer, but that’s far and away more files than I want. It also means that it’s probably going to take me a half-hour now to build binaries that used to take about six seconds each.

Had they been following good I18n hygiene, they’d have asked themselves (or me, even) whether there were costs associated with consolidating all of the lower-level makefiles and eliminating the possibility of rebuilding except in this huge batch. The costs don’t really affect them that much, though they’ll slow me down somewhat.

It’s an “improvement.”

Pulling the rug out from under the Localization Manager

November 27th, 2006 Comments off

It’s a thrill for the gearhead in me to build my own localized binaries.

Most projects in the wide world don’t require this of the localization manager, of course. A staff engineer, or at least a release engineer, is usually tasked with building the binaries that house the localized software resources. There’s some delay involved in that, though, since the engineers don’t often place very high priority on building these infernal things, let alone building them as often as the localization QA cycles require.

The localizers are able to preview the localized resources in their localization environment (Alchemy, Visual Studio, etc.), but our engineers have an arcane build environment and procedure that I don’t care to impose on even my least liked localization vendor, simply because it’s an open invitation to failure. Instead, I persuaded the engineer who created the entire scheme to spend three hours duplicating the environment with me so that I could document it and reproduce it on a quick turnaround.

“Why are you going to all this trouble?” the engineer asked me.

“I’m trying to drive you crazy this one time so that I don’t drive you crazy eight or nine times over the next few weeks. The translators will find new things to change as they continue localizing the rest of the product, and they’ll change the resource files. If I have to bug you with each one of these changes, you may come to view localization as something, well, inconvenient.”

“Good. Thanks for sparing me that.”

That was for version 2.0.0 of this software. I was able to save precious days by doing my own builds and turning the binaries around to the localizers promptly. I also saved myself all of the credibility and Brownie points I’d have had to mortgage by running to the engineers all the time.

Now that we’re localizing version 2.0.1, however, the procedure is changed. The engineers have pulled the rug out from under me, and nothing that used to work, works. Time to bug the engineer again and get the updated Rosetta Stone so that I can build these things.

Bad internationalization practice

August 20th, 2006 Comments off

Unfortunately, there’s been another architecture change besides the move to .NET: Engineering has split the resource DLL into two pieces.

This is not bad news in itself, but there is a tricky dimension to putting the the two DLLs together at run time, and the engineers have handled it in a way that assumes a little too much.

The main menu contains the usual entries (File, Edit, View, Tools, Windows, Help), each of which contains a submenu. The localization hiccup is that some of the submenu items live in one DLL, and the others live in the other DLL. What brings them together at run-time? The software depends on the presence of the string “&Edit” in each one. What happens when “&Edit” gets translated? “Oh, well, I guess we didn’t think of that…”

The pseudo-translated string reads “&ßéüdßéüt”. The sets of submenu items don’t find one another in the DLLs at run-time, so they simply don’t show up in the menus. Another triumph for the farsightedness of internationalization testing, and back to the drawing board for the developers.

Pseudo-translating the resource files

August 16th, 2006 Comments off

I probably shouldn’t enjoy this stuff so much, but I’m a gearhead at heart, so I get a lot of gratification from climbing around inside resource files.

One of the unsung virtues of localization consulting is pseudo-translation and subsequent QA. The goal is to replace the source (in this case, English) strings with well thought-out gibberish, in an effort to make the software barf. This can take a number of forms, such as:

  • truncated strings
  • corrupted characters
  • hard-coded strings
  • expanses of blank space where strings should be; and
  • crashes (my favorite)

I’m not really all that happy that I’ve caused the software to crash, but at least it vindicates the function of localization project management in general and pseudo-translation in particular in a way that even the most jaded developer cannot ignore.