GLib Unicode Support

GLib includes sufficient Unicode support for the needs of Pango and GTK+. This includes

  • Conversion between utf8, ucs2, ucs4 and various other encodings
  • Character composition and normalization
  • Case conversion
  • Character properties such as script, character type, break type, combining class, width

The data tables behind these APIs are directly derived from the data files of the Unicode standard.

Updating to a new version of the Unicode standard

To update GLib to a new version of the Unicode standard (say version 9), the following steps are necessary:

  • Download and unpack the data files (UCD.zip) to a directory such as ~/ucd
  • In glib/, regenerate gunidecomp.h, gunichartables.h, gunibreak.h and gscripttable.h by running

./gen-unicode-tables.pl -both 9.0.0 ~/ucd
  • Update the GUnicodeScript enum in gunicode.h and the iso15924_tags array in guniprop.c with any new additions. Hint: the 4 character aliases for scripts can be found in ~/ucd/PropertyValueAliases.txt
  • Expand test cases in glib/tests/unicode.c to cover new additions
  • In tests/, regenerate casefold.txt and casemap.txt by running

./gen-casefold-txt.pl 9.0.0 ~/ucd/CaseFolding.txt > casefold.txt
./gen-casemap-txt.pl 9.0.0 ~/ucd/UnicodeData.txt ~/ucd/SpecialCasing.txt > casemap.txt
  • Mention the supported version of the Unicode in NEWS

Projects/GLib/UnicodeSupport (last edited 2015-10-06 11:05:29 by MatthiasClasen)