Avoid markup wherever possible

This is one case of "don't mark things for translation that shouldn't be translated". The cases where markup occurs in messages can be divided into two types. The first type is messages similar to this example:

   msgid "This text is <b>bold</b>."

In this case, the markup contains important positional information. Only the text "bold" should be bold, and a translation needs to take that into account, so the markup actually carries important information to the translator here. This use of markup is important and a necessity.

The other case where markup is used is where the entire message or entire paragraphs or headings are surrounded with markup, like in these examples:

   msgid "<b>Home Page Preferences</b>"
   
   msgid "<span size=\"medium\"><b>No file</b></span>"
   
   msgid ""
   "<span weight=\"bold\" size=\"larger\">What do you want to do with this "
   "file?\n"
   "</span>\n"
   "It's not possible to view this file type directly in the browser:"

In this type of messages, the markup contains no relevant information to the translator, since all the translatable content is embedded in the markup (in the last example above, the message could just as well have been split into two separate messages, so this still applies). Instead, these messages are just an extra source for errors (when the markup gets accidentally "mistranslated"), a nuisance, and create lots of extra and totally unnecessary work for the translators. Whenever the markup for a message should change the slightest, all translations will have to be updated, even though no translatable content changed. Whenever a new message is added that has surrounding markup, even if the exact same message without this exact markup was translated before, the message will have to be "translated" again. Usually all this adds up, and it's not uncommon to have a situation like with all these examples occurring in the same po file:

   msgid "<b>Home Page Preferences</b>"
   
   msgid "<i>Home Page Preferences</i>"

   msgid "<span size=\"larger"><b>Home Page Preferences</b></span>"

In short, every possible combination of the same actual message but with different and irrelevant markup surrounding it will have to be "translated" separately, instead of just one "Home Page Preferences" message. It's not just a nuisance and a lot of unnecessary work that's slowing the translation process. Sometimes the surrounding markup adds much more text than the actual message, and that confuses gettext's fuzzy-matching so that it considers the message an entirely new message and doesn't fuzzy-mark it with a previous similar translation, or that it fuzzy-matches on the markup instead of on the actual message. This can cause consistency problems in the translation, where the same terminology won't be used since fuzzy-matching didn't work properly. Fuzzy-matching is an important time-saver and important for consistent use of terminology in translations, and when it doesn't work properly, it affects consistency in a negative way.

The solution to these problems is to try to separate markup from gettext calls, so that the markup isn't passed through _(). This doesn't apply to the first case of messages mentioned above, but certainly for the second type of messages. In the case of the examples above, they should be rewritten so that they appear in the po file like this:

   msgid "Home Page Preferences"

   msgid "No file"

   msgid "What do you want to do with this file?"

   msgid "It's not possible to view this file type directly in the browser:"

This can be accomplished by designing the messages that need markup like this:

   str = g_strdup_printf ("<b>%s</b>",
                          _("My translated string"));

TranslationProject/DevGuidelines/Avoid markup wherever possible (last edited 2008-02-03 14:47:30 by anonymous)