Localize using gettext and intltool
This page consists of some parts from Christian Rose's L10N Guidelines for Developers document that could not be moved anywhere else. It does not provide enough information on localization your application, and may be outdated. Please look for additional resources for developers at TranslationProject wiki pages.
Today's applications typically store messages that should be translated in many different file formats apart from real source code files. Such files may be .desktop files, .soundlist files, XML-based file formats, and so on.
Using intltool solves many problems that are present when managing and editing translations of message entries in such files directly. Below I'll use .desktop files as the example, but all of this also applies to all the other non-code file formats that intltool supports.
intltool integrates the message entries in .desktop files into the application's normal pot file, and then later merges back the translations into the .desktop file at build time. By using this strategy, it solves the following important problems:
Trackability. With intltool, the status of completeness of .desktop entry translations is tracked together with the rest of the application status on, for example, translation status web pages. With direct .desktop manipulation, the status for these translations has to be kept track of seperately (and usually isn't kept track of at all).
Visibility. With intltool, the .desktop entries to translate are immediately visible to the translator as any other message at the same central place as the other messages (the .po files), and so it's close to impossible for the translator to forget these messages. With direct .desktop file manipulation, this danger on the other hand is exceptionally big as each and every application names its .desktop files differently and places them differently, and sometimes there are many .desktop files in the same application and sometimes there are none -- impossible to easily keep track of when you handle the translations of a large number of applications.
Notification. With intltool, changes and additions to the .desktop entries propagate directly into the potfile, and gets marked the same way as other message additions and changes, and hence, the translator gets notified of changes. With direct .desktop file manipulation, changes in the original aren't tracked or marked and so messages may easily have incorrect translations forever.
Translation re-use. With intltool, the messages benefit from the translation re-use that's used in po files. The exact same message that's used in the .desktop file and somewhere else in the application does only need to be translated once. Also, similar messages get appropriately fuzzy-matched, so the translator does only need to make the appropriate changes, not translate the whole message again.
Stability. With intltool, each translator does only edit his or her .po file. With direct .desktop file manipulation, each and every translator needs to edit the same file, which may result in problems with different encodings used, and other such accidents that ruin all translations. Also, there isn't the same danger of Git conflicts or file access conflicts.
How to localize using gettext and intltool
This document does not try to explain how to enable gettext and intltool support in an application. There are other documents that describe that process better, and Malcolm Tredinnick has written an excellent document explaining the process of adding translation support to an existing application.
The PO format and the PO files
Even as a developer. knowing some elementary stuff about the PO format, the format of the actual translations, is very useful.
The PO format is a really simple format, which probably at least partly explains its success and widespread use. The format is basically a hash list consisting of msgid and msgstr pairs, with the msgid being the original English string and key, and the msgstr being the translated value of it. As the English string is the key, all instances of the exact same English string in the code will be represented by exactly only one key/value pair, referred to as a message, in the PO file. Usually this is not a problem, but instead a benefit of the format, as the exact same string won't have to be translated more than once by the translator. Below is an example of a message.
#: gedit/dialogs/gedit-plugin-program-location-dialog.c:78 #: gedit/dialogs/program-location-dialog.glade2.h:2 msgid "Set program location..." msgstr "Ställ in programplats..."
In addition to the msgid and msgstr parts, a message usually also has lines starting with #: that tells what source files and what lines the string used as msgid was extracted from. These lines have no syntactic value. They are only there as a help for translators and developers to know where a message came from. For all PO parsing tools, the value of the msgid is what's used as key and what actually makes a difference for distinguishing individual messages.
A message in a PO file can be in one of essentially three different states. The message can be translated, fuzzy, or untranslated. A message counts as translated as soon as the msgstr part of it is non-empty. In a similar manner, an untranslated message is one where the msgstr is empty. The fuzzy state is special and essentially means that there is a translation in the msgstr part, but that this translation is most likely not entirely correct, and that it thus needs manual attention by a translator. A message can become fuzzy in one of two ways:
- The original string that the msgid represents was changed in the source code. A typo in the string may have been fixed or the string altered in some other way. The translator needs to check that the msgstr is still valid and make changes if necessary.
- A new string has been added to the source, and the string is very similar, but not identical, to the msgid of an already existing, translated message. Then the msgstr of that message will be automatically reused for the new message, but the new message will also at the same time be marked fuzzy so that the translator knows there is some difference that he or she needs to adapt the translation to match.
There is always one special message in each valid PO file: the PO file header. It is encoded with the msgid for the empty string ("") as the key, and the actual header values are in the msgstr part. This unfortunately means that if you mark an empty string for translation, you will get the entire PO file header back as the "translation". In almost all cases this is probably not what you want. Hence, do not mark empty strings for translation.
More good things to know about PO files is that the validity of any particular PO file can always be checked by running msgfmt -cv file.po on it. This will also display the translation status for that particular PO file. To get the translation status for all PO files for a module, intltool-update can be used. Just run this inside the po subdirectory:
This will refresh all PO files against the current state of the code and display their current translation status. Just remember to not commit these altered files afterwards. Please keep in mind that the PO files themselves are the domains of the translators, and developers committing updated PO files usually just clutters the Git history and increases the danger of accidental Git conflicts. GNOME translators take care of updating their PO files themselves by always using intltool-update on their PO file before updating the translation itself. Thus, there's usually no need as a developer to update the PO files, even though make dist usually wants to do just that. Please do not commit PO files that have been altered by anything other than changes to the actual translation to Git. And if you need to do that, please ask translators in advance.
POTFILES.in and POTFILES.skip
The file po/POTFILES.in specifies which source files should be used for building the .pot and .po files. It should list the file names, with paths relative to the project root, each on a single line.
In a similar way, a file po/POTFILES.skip can be added that specifies the files with marked-up messages that for some reason shouldn't be translated and hence shouldn't be in POTFILES.in. The format is the same as POTFILES.in.
Since it's the developers that usually know what files are used in the project and which ones aren't (and hence shouldn't be translated even though they contain marked-up messages), it's the responsibility of the developers to keep these files uptodate. This is usually an easy task: intltool-update (part of the intltool package) has a --maintain option which allows for easy checking of files that have accidentally been left out. So to check if POTFILES.in and POTFILES.skip are uptodate, all one has to do is run this inside the po subdirectory of a fresh Git checkout:
Please make sure that both POTFILES.in and POTFILES.skip are uptodate. Since it's a rather common mistake to forget to add files to POTFILES.in, any files with translatable content not present in either of these files will likely be treated as an accidental POTFILES.in omission and corrected as such.
Please remember that only files that are present in a fresh Git checkout should be listed in POTFILES.in or POTFILES.skip. This means that those files should not contain any generated files. The reason for this is that translators need to be able to work on a fresh Git checkout without having to build anything, so only files that are present in such fresh checkouts should be listed in POTFILES.in or POTFILES.skip. Listing generated files would cause errors or useless warnings when running intltool-update on a fresh Git checkout.
Please also keep the POTFILES.in and POTFILES.skip files sorted alphabetically, using the C collating order if possible (LC_COLLATE=C). This helps catching duplicates in the listings, and it helps manual inspection when comparing the content in these files with directory listings.