Internationalization Puzzler: Page Encoding
For a Web localization project, we’ve pseudo-translated the Java-based site, which is running on IBM Websphere.
To pseudo-translate, we padded all of the strings with leading ¿¡ÃÉ and trailing ßÎÕÜ (target languages this round are Latin-1). Chars are UTF-8 encoded and all pages are generated with metatag charset=utf-8.
As Websphere sends the pages back, many of them look fine; e.g.:

However, many of the pages display the characters as corrupted:

Oddly, the browser reports that these bad pages are encoded for Western European (ISO), in spite of the fact that the charset in the page source shows UTF-8. If you switch the browser to display the page at UTF-8, the characters show up properly.
It appears that Websphere is telling the browser, “I know what’s best. Ignore the UTF-8 in the charset and handle this page as ISO,” and the browser obliges.
Even more maddeningly, this does not happen on all pages, but only some pages in the site. All pages in the site (so I’m told) are created identically.
Happens with both Firefox and IE. The engineers have experimented with Tomcat, which does not act up like this, but we need to make Websphere work.
Have you ever seen this? Any ideas on what could be tricking the browser?
My first guess is that your HTTP Content-Type header is saying ISO-8859-1, even though your HTML “meta http-equiv” header is saying UTF-8. I’d suggest first using a tool to look at the HTTP headers you are actually receiving, then take a look at the Websphere settings that control what HTTP headers it sends.
The W3C tutorial on character encodings at http://www.w3.org/International/O-charset has good information and links about what the HTTP and HTML should say.