What should be localized?

A common source for questions is what should be localized and marked for translation. This is of course a broad question without exact and always applicable answers. A good rule of thumb is to reverse the question, and localize and mark for translation everything, with a few exceptions. Such exceptions generally include debug messages that are only intended for the developers themselves, and data such as file names, variable names, binary names etc.

In this context, we classify a debug message as a message that is just there for debugging purposes for the developers themselves and would be incomprehensible to any other technical person that wouldn't have looked at the code in question. Such messages should not be marked for translation. On the other hand, messages that could be of use to any other user or administrator other than the developers should be marked for translation. Fictive examples of the first category we could be:

  "Entered do_funcy_stuff () loop..."
  "Division by zero when calculating sector_size variable"

Messages like this are clearly debuggy to their nature, and are most likely useful to noone but the very developers themselves. As such, they shouldn't be marked for translation. As examples of the second category we could have:

  "Couldn't read the media -- make sure the appropriate kernel modules are loaded"
  "The device couldn't be mounted writable because of a NFS permission issue"
  "Call to cdrecord returned with an error"
  "apm wasn't found"

These messages are of technical nature, yet they aren't only usable for the developers when debugging, but may sometimes help other technical people to resolve an issue themselves. As such, these messages should be marked for translation, since in many places in the world being technical and skilled with administering computers, and having good skills in technical English, isn't necessarily the same.

Sometimes there is a danger of feedback from users that would not be understandable by the developers, due to the error messages reported being translated. However, this can often be avoided by using error codes, which still allows for the descriptive portion of an error message to be translated.

Innocent unmarked messages

Almost anything that may appear to the user in some way or other needs getting marked for translation. Even very innocent messages like a simple "%d" may need to be localized for some language (like Persian, which localizes it to "%Id" to use localized digits in the user interface). Following is a list of examples that need to be marked for translation, but were not in some cases:

  • "%d", "%f", etc: Some languages like Persian use different forms of digits, which have their own Unicode characters. These localized these to "%Id", which is a gettext/glibc extension for using locale-dependent localized digits. Also if you sprintf numers using this format strings to a buffer, make sure that you allocate a large enough buffer, as each single localized digit may take up to four bytes. If you can, use g_strdup_printf instead and free the buffer when you are done. When marking these for translation though, you must use a context prefix to disambiguate them and give the translator the chance to choose whether it should be localized for each usecase separately. If you mark all "%d"'s in your application for normal translation by using _("%d") then there will just one such string that the translator can translate, and that will affect all the integers you have marked so. Instead, you should use contexts, and mark one for example like Q_("calendar:day:digits|%d") which gives the chance to the translator to use Lating digits for calendar days, while using localized digits for other things. See the section on translation contexts above for details.

  • ",", ";", etc: Not every language uses the same separators used in English. While some languages may have a different preference, some use totally different characters as a list separator (Arabic and the languages using the Arabic script have their own comma, semicolon, question mark, etc).

  • "<b>%s</b>": That is an innocent way to mark something to make it boldface in the interface, to emphasize importance or make it a header. But not every language has a concept of modern boldface typefaces, or even if it has such fonts, they may not be the preferred font for such kind of emphasis.

  • "%s <%s>": Let's say the first %s is the name of a contact person and the second one is his email address. Let's also assume that every locale in the world should use the angle brackets to envelope the email address. Even in this case, this still needs to be marked for translation for the poor bidirectional languages like Hebrew and Arabic where they may need to insert bidirectional control characters here and there to make sure the angle brackets appear in the right direction and name and email address appear in the right order.

  • "%d KB": not every language uses the same abbreviation KB for kilobytes. Some language even lack an abbrevation for kilobyte, so make sure there is some room on the screen for the cases the two letters are expanded to a full translation of "kilobyte".

    • Also, "KB" is incorrect because it would expand to "Kelvin Bytes" which is non-sensical. The correct abbreviation for the kilo- multiplier is "k", hence kilobytes must be abbreviated with "kB". -- EmmanueleBassi

Important Note: On the contrary, make sure things that may need to be process by other things later (saved in configuration files or logs, sent over the wire, etc are not marked for localization. Also, if a certain localization should follow a certain non-trivial format, make sure you check the returned localization for errors and fall back to a sane alternative if the translator has made a mistake.

TranslationProject/DevGuidelines/What should be localized (last edited 2008-02-03 14:47:22 by anonymous)