Archive

Archive for the ‘Web localization’ Category

Internationalization Puzzler Resolved

April 30th, 2009 No comments

A few weeks ago I posted on an I18n problem with IBM Websphere that was causing corrupted characters to display. In short, Websphere had told the browser to ignore the stated page encoding (UTF-8) and to display the page as if encoded for Latin-1. Not the Jedi way.

Our engineers had to get this escalated to tier 3 with IBM support. This seemed ridiculous to me, because we can’t have been the only Websphere site trying to display Spanish and Portuguese, and other people must have complained about such a silly problem, but it took tier 3 to get us a solution, and that’s all that matters.

The short answer: we need to change all of our top level jsp’s to explicitly set the response encoding to UTF-8 (response.setContentType(“text/html; charset=UTF-8″);) . Once the engineers had done that, the container finally returned a consistent result in UTF-8. It’s still a bit confusing why the UTF-8 encoding was returned on some pages and not on others but it all seems to work now, so we happily closed this case with IBM.

I append the entire response from IBM, simply so that it will live in one more place on the Web for future searches.

Two points before diving into your server1-non-working trace:

I. How WebSphere application server set Default Response Encoding:

If autoResponseEncoding is true:
1. Check request locale, set it if exists
2. Get encoding from request.getCharacterEncoding()
3. set the encoding according to the above locale
4. set to default ISO-8859-1

finally setContentType() with the charset set to above encoding.

Please note that autoResponseEncoding is independent from
autoRequestEncoding or client.encoding.override.

II. From Servlet Spec:
1. setContentType(): only work if response has not been committed (i.e
before getWriter)
2. Dispatch Include (SRV.8.3)
Any attempt to set headers or call any method that affects the
headers of the response will be ignored.

==============================================

Now this is your server1-non-working’s analysis:

1. autoResponseEnding set the encoding according to step (I.3) above:
from the locale (en_us)
thus the encoding is ISO-8859-1

setContentType type –> text/html; charset=ISO-8859-1

2. The request to /home.wfl will result in a dispatch forward to
[/WEB-INF/jsp/home.jsp] which in turn including other resources:

[/WEB-INF/jsp/home.jsp]

setContentType type –> text/html

+include /WEB-INF/jsp/include/header.jsp
++including /WEB-INF/jsp/include/includes.jsp]
++including /WEB-INF/jsp/include/syncstatus.jsp]

(there are several attempt to call the setContentType() within the
including JSP but got ignored …though I do not see any attemp to set
charset to UTF-8)

3. getWriter() with the encoding that found/set in step 1: ISO-8859-1.

=================================

You might want to check the top level JSP (i.e. home.jsp in this case)
and setContentType accordingly and before the response is committed.
Do not set it in the including resources as it will be ignored.
————————

Thank you for using IBM products and support.

Internationalization Puzzler: Page Encoding

April 3rd, 2009 1 comment

For a Web localization project, we’ve pseudo-translated the Java-based site, which is running on IBM Websphere.

To pseudo-translate, we padded all of the strings with leading ¿¡ÃÉ and trailing ßÎÕÜ (target languages this round are Latin-1). Chars are UTF-8 encoded and all pages are generated with metatag charset=utf-8.

As Websphere sends the pages back, many of them look fine; e.g.:

good_chars

However, many of the pages display the characters as corrupted:

bad_chars

Oddly, the browser reports that these bad pages are encoded for Western European (ISO), in spite of the fact that the charset in the page source shows UTF-8. If you switch the browser to display the page at UTF-8, the characters show up properly.

It appears that Websphere is telling the browser, “I know what’s best. Ignore the UTF-8 in the charset and handle this page as ISO,” and the browser obliges.

Even more maddeningly, this does not happen on all pages, but only some pages in the site. All pages in the site (so I’m told) are created identically.

Happens with both Firefox and IE. The engineers have experimented with Tomcat, which does not act up like this, but we need to make Websphere work.

Have you ever seen this? Any ideas on what could be tricking the browser?

I Don’t Want to Localize That, And You Can’t Make Me

April 10th, 2008 Comments off

Ever hear your children make similar utterances, except with different predicates (“go to bed,” “clean my room,” “do my chores”)?

One of our clients is staffed with people too polite to say things quite so bluntly (but not too polite to dig in their heels similarly). The upside is that I enjoy working with almost everybody I’ve ever encountered there; the downside is that there are some places where their global-readiness is stuck.

Web presence
I bend over backwards to uphold a simple rule of Web navigation: Localize everything along the click-path to a visitor’s goal. So, if a visitor starts on a Russian home page, and decides she wants to download a Russian version of the trial product, I believe in ensuring that she doesn’t have to put up with English on her way through the site, unless she wants to do so. Or, if we need to push her to an English page, I try to make it apparent with “English only” next to the link.

We’ve made a lot of progress in combing out the remnants of English that dot many localized sites (that makes my skin crawl – how about you?), so that each page is linguistically pure to several levels. That was expensive and it took a lot of time.

The problem now is that this client’s site relies on a lot of plumbing for verifying that the visitor is not from an axis-of-evil country, or hacking the site, or trying to perform unsupported operations, and the UI for all of this infrastructure is in English. Much of it is just background code, but several pages (login, registration) are in English, and the infrastructure team is not interested in localizing any of it, despite my polite persistence.

License agreement
This is a hot one. When you download trial software from Germany or Finland, do you click “I accept” to a license agreement in that language? Granted, German and Finnish are not the lingua franca that English is, but how much more does it say about the relationship you want to have with your customers when you make some effort to inform them of your business terms in their language?

“We don’t translate anything legal, and it’s not hurting sales,” I keep hearing. This is a battle I know I’ll never win, so I look for victories elsewhere.

Site logic
There was a sudden need some months ago to place a terms and conditions page in the click-path to a particular download. Naturally, the terms and conditions were in English only, and I could have lived with that. The problem is that the code behind the page sent the visitor to a particular next page, without regard for where the visitor was going when he landed on the terms and conditions page.

So, visitors on the way to download the Korean/Japanese/Chinese/Spanish… version of the product sailed along the click-path in their own language until reaching the terms and conditions page. Then they agreed to text they almost certainly did not read, then they landed on a completely irrelevant page of English, with no reasonable way of getting back to their intended destination.

This is related to the inflexible plumbing I mentioned above. It’s great infrastructure; it’s just monolingual and it doesn’t really need to be.

Anyway, these bits of stubbornness amount to a small downside in a client that is not stingy about localization in general and that has a strong global presence. They’re correct when they say, “I don’t want to localize that, and you can’t make me,” so it’s easier to roll with the punches and enjoy other victories.

Besides, if I’m around long enough, they’ll move on and I can take the matter up with their successors.

What things have people told you they’re not going to localize, and you can’t make them?

A localization lesson from North Korea

February 14th, 2008 1 comment

You never know how your next lesson in localization will reach you. What if you were to learn something from a digital trailblazer like North Korea?

The International Herald Tribune reports that North Korea is offering overseas shoppers the chance to buy hundreds of its goods through the Internet. Offered items include boxing gloves, bicycles, commemorative stamps, roller skates and uniforms for Taekwondo. While the site is obliging enough to accept credit cards, it has not yet reached the level of capitalistic level of revealing prices.

But the most forward-thinking aspect of the site is that, right out of the gate, it is localized into Korean, English, Chinese, Russian and Japanese. How many of us are that enterprising and globally inclined? Show this post to your boss and say, “See? Even lowly North Korea can develop and implement a localization strategy. Why can’t we?”

The hitch: The site has been live since 31 Dec 07, but it spends a lot of time off line. Try your luck: www.dprk-economy.com/en/Shop/index.php

Categories: Web localization Tags:

Amtrak.com in deutscher Sprache!

November 2nd, 2007 1 comment

This is probably more in the domain of John Yunker, whose Global by Design site focuses on American companies coming out of their americocentric stupor, but I’ll mention that Amtrak’s site has been localized into Spanish and German.

This is a hot one. Passenger train travel is not exactly all the rage. The network is not expanding noticeably, and even after Antarctica melts, Americans still aren’t going to get out of their cars and take a train, except to amuse their children. Why throw marketing dollars at a localized Web site?

Why Spanish? Because hundreds of thousands of Hispanic Americans need to move from city to city, and if they’re going to take the train, it’s easier for them to research routes and schedules in their own language. On the other hand, the railroads in Mexico, in particular, are a popular joke, and buses long ago displaced trains as the default means of intercity passenger transportation. So it seems that Amtrak sees the demographic potential, but may have some cultural baggage to overcome in attracting this new ridership, not to mention the issue of whether their sector of the Hispanic market uses the Web (yet).

Why German? Because Germans (and Austrians and Swiss) believe in the trains, I suppose. This is even more intriguing than the Spanish site, because it required more research than simply picking up the newspaper and reading that Hispanic buying power in the U.S. will have risen 347% to almost $1 trillion from 1990 to 2009 (“The Multicultural Economy, 1990-2009″, from the Selig Center for Economic Growth). The move to German must have involved polling actual passengers and getting hip to the fact that these people not only think in terms of train travel, but also use the Web to research it.

Both Spanish and German sites are more than mere afterthoughts; they seem to be comprehensively translated, several levels deep. Notes:

  • They even translated “California” as “Kalifornien” in the state drop-down menus, no doubt as a nod to the governor.
  • Don’t use accented characters when you enter your name on the Spanish site. The error message telling you what you did wrong and how to rectify it is still in English.
  • I don’t know which credit card the Spanish- and German-speaking travelers are likely to use, but the only choices are Visa, MasterCard, AmEx, Diners and Discover. No debit cards, no PayPal.

Sometimes the mechanics of a localization project are less compelling than the story behind it. If you know the story behind the Amtrak localizations – or an offbeat story behind a project you’ve done – please post it here.

(Blogger’s note: Travel between San Diego and Los Angeles by car has lost all of its allure, and I opt for Amtrak whenever I can. I have had multiple pleasant conversations in Spanish with people on this train route, usually people from Mexico visiting family in Southern California.)

Localization Testbenches, Part III (Web sites)

April 13th, 2007 Comments off

What are you using to test your localized products? If you’re handing them to your domestic QA team and expecting that they’ll intuitively test them with correct language locale settings, you may be in for an unpleasant surprise.

2) Web sites
Performing QA on localized Web sites requires less overhead. Most browsers nowadays accommodate multi-byte characters automatically and without installing language packs. So, if you send e-mail pointing your executive staff to your newly unveiled Chinese site, you can be sure that most of them will see a page free of corrupted characters.

Dumb HTML pages, however, are not where the biggest problems lie. You need to ensure that your forms, auto-responders, search pages, search result pages, search indexing and databases all support the target languages.

This can in fact be more complex than localizing software; while localizing software requires plugging a lot of holes, they’re all your holes. Localizing an entire Web presence and user experience can require plugging holes among different products and platforms: Web server, Web application, database, reporting utility, e-mail software, shopping cart. To do right by the visitors to your Web site, you need to test everything all along this click-path.

It’s rare that you will need to build a new app server based on the Hebrew versions of Linux, Apache, etc. to support these languages and users. But if characters and messages are not surviving all the way through your e-path, you may need to enable the support in several places, order some language packs and research the locale’s computing needs. This is hard to do on a testbench in a lab, so you may end up testing your own international-stage instances on production servers throughout the organization.