GNOME Website Localization

Strategy

Problem

Whenever a new GNOME web site has been proposed in the past, one of the requirements that basically everybody agrees on has been that the web site content should be localized into different languages. The problem is that almost every CMS that supports localization does so in a different way, almost always implementing localization from scratch, and reintroducing problems that have been solved many years ago in other areas. Unfortunately, the reality seems to be that many web hackers working on CMS products that implement localization seem to know little about localization, and often underestimate the problems that can arise by attacking the problem from the wrong end and reinventing a (bad) wheel.

At the same time, GNOME is a desktop that has several dozens of high quality localizations produced by existing teams of several hundred translators. By selecting a web site tool we should take advantage of that, and promote that fact.

This is why it is extremely important that when we chose a tool, we chose a tool that helps solve the larger problems and lets us integrate in the existing GNOME localization efforts. Putting a check mark next to "this CMS supports localization" is not sufficient. The important question is how does it support localization. If it does this in an extremely backwards way that does not integrate with other GNOME localization efforts, then the CMS might very well be the wrong tool for the job, no matter how good it is in other areas.

Already Decided Stuff

What should be localized?
The consensus was that on the www.gnome.org web page, only the existing and official content should be localized. Local communities can continue to use their own web sites to promote their community.

Remaining Questions

  1. Who should do the localization?
  2. How should they be notified of changes?
  3. How do we make it easy and attractive to translate?
  4. How do we ensure translation quality?
  5. How should we make sure enough content is translated into a particular language?

Proposed Solution

My proposal (ChristianRose) is to use the GNOME Translation Project for the web site localization, i.e. the project that already localizes the GNOME applications and their documentation. That would solve allt the points above, because the project already has tools, policy, and infrastructure for this. The last point could be solved by some additional policy that we may choose at any time.

However, using the GNOME Translation Project requires that we can integrate with the tools that they use:

Requirements
  1. As little content as possible should be inside images. Use SVG if you cannot avoid that.
  2. As many links as possible should stay the same irregardless of localization used, so that we do not need to unnecessarily provide "localized" links (some external links may require localiation though, so as to provide links to the correct localized version of the external site).
  3. The web site should support RTL (right to left) content for the languages that require that.
  4. The translateable content of the web site should be provided in po file format. If the application does not support gettext/po files natively, it should be able to export the translateable content in XML files, so that we may extract and subsequently merge the translateable content into the XML in a build step. This requirement is extremely important -- if we cannot export content into and import translations from po files, we cannot use the existing GNOME translation efforts.

Technical Stuff

Already Decided Stuff

How do we choose the correct localization?
The consensus seemed to be that we should parse the browser's settings, but provide for a method for the visitor to override the default selection. In practice this usually means, in order:
  1. See if there is a language cookie/get request. If there is, use that as the language selection.
  2. Parse the HTTP Accept Language header until we find a match, and use that as language selection.
  3. Fall back to the default language (English).

Comments

There is an interesting solution that I made for my SoC'06 project - library.gnome.org. It is using Apache's MultiViews to choose the right page, falling back to English if there is no localized page (or it is outdated and removed) for any of languages listed in HTTP Accept Language.

The nice things are:

  • If you find a link somewhere on the web and follow it you will get localized version of page (it it exists). Same is with searching the web with English keyword.
  • You can setup a ordered list of preferred languages in your browser. For me, it is Serbian (sr), Serbian (hsb), Croatian (hr), English (en). With some custom based solution (like Wikipedia) you need to look around for every article if it exists on any of theese languages by hand. This way it is done automaticly.
  • Provided with direct links to displayed page in other languages enables you to jump over quickly

As you may know there are problems with this when you don't want to reconfigure your browser or if you want to see same page on other language and follow links to other pages. The solution I am using is there is an option to override HTTP Accept Language header with a cookie, set from JavaScript. If you use a cookie, only one language can be selected (no list of preferred languages) but it is doing a trick.

Only remaining problems are if you have JavaScript disabled (I didn't want to add server side processing) and want to see website on different language then one in you browser's setup or you are using broken cache proxy with JavaScript disabled. In both cases (which are rare and broken), you will need to change language by clicking a link for every page you load.

You can see a demonstration with code on http://goranrakic.com/lgo/

And about notifications about update/broken links..., POT file will change after that, marking some messages in PO files as fuzzy on intltool-update. Translators alredy know how to handle this and update and resubmit PO file. I still think that there is a need for something like "site-push" to regenerate all updated website pages and all localized pages rather then having server side dynamic parsing of MO files for every page request. -- (GoranRakic)


Yes, dynamic lookups into mo files may be unnecessary most of the time, so a static merge (if it's from XML, we can use intltool for that) under a "build" step is probably the best way. We can probably make that trigger whenever there's a content change. -- (ChristianRose)


GnomeWeb/Localization (last edited 2008-02-03 14:45:23 by anonymous)