An Open Discussion on Architecting A Unified Text Layout Engine for FLOSS Systems
This Open Discussion is to be held at the Gnome Live! 2006 Boston Text Layout Summit. Please note that the Text Layout Summit is not Gnome-specific in any shape or form, although of course the Gnome community is participating and hosting the event. We encourage all stakeholders in the FreeDesktop and world-wide FLOSS community to send in your suggestions and participate.
As this topic has garnered a lot of interest, a significant amount of time will be set aside for the Discussion. At the present time it is still not decided whether the Discussion will take place on Saturday or on Sunday, although some voices are suggesting devoting a significant fraction of Sunday to such a discussion. This will also permit some folks who will not be arriving until late on Saturday to participate as well.
Ed Trager has volunteered to moderate the discussion.
At present, a given FLOSS-based desktop system will often have different programs using different text layout engines: many GTK-based programs, including The Gimp and Inkscape, use Pango as the text layout engine; OpenOffice.org uses IBM's ICU text layout classes; KDE programs use Qt's layout engine; while many applications use homegrown text layout engines: KOffice uses Qt's native shaping but has its own code for paragraph layout; and Scribus currently has its own code for the whole layout process but the development team is thinking about using a forked version of Qt's shaper.
Because of differences in the layout engines operating “behind the scenes”, different software can exhibit differing levels of support for complex text layout (CTL) scripts like Arabic or Kannada. Some scripts, like Myanmar, are currently hardly supported at all in the majority of software on FLOSS systems today. The situation is confusing if you are a software developer who is new to the FLOSS world. It is even more confusing if you are an end-user trying to fathom why, for example, Arabic is rendered correctly in Inkscape, shows weird rendering artifacts on the diacritics in OpenOffice, and isn't even shaped correctly at all in Scribus — even when using the very same OpenType font in all three programs.
Having multiple approaches is good in that it facilitates experimentation, but only if we then evaluate those other approaches and ensure that users have the best of several approaches.
In theory, development of a unified text layout engine, independent of any one development toolkit, would allow developers to focus first on rendering text in all modern human scripts correctly, and secondly, on doing so efficiently.
Moreover, as scripts in the Unicode Standard are revised, and as other new scripts are added to the Unicode Standard, collaboration on a unified text layout engine would guarantee that script and advanced typographical functionality would be “rolled out” in FLOSS systems in a unified manner: all software would become competent in handling new and revised scripts or advanced typographical features at the same time, instead of in the piecemeal, hit-or-miss fashion of today.
While the theoretical benefits are clear, overcoming the obstacles of differing APIs and even just thinking about all of the work that might be required to re-engineer a host of popular and rapidly maturing FLOSS software may give developers second thoughts, headaches, or both!
On the other hand, exciting developments are afoot in the world of Open Source typography that cannot be ignored. SIL's Graphite technology is maturing rapidly, even though Graphite-enabled fonts remain scarce and the technology is arguably under-utilized. Pango has recently become capable of vertical text layout. Numerous fonts for world scripts are being developed and released under Open Source licenses like the GPL and the new Open Font License (OFL) from SIL. And programs like OpenOffice, Scribus, Firefox, the Gimp, and Inkscape have become serious challengers to proprietary counterparts. Governments and other organizations around the world are taking serious interest in FLOSS systems and special initiatives such as the One Laptop Per Child (OLPC) program.
The confluence of all of these developments and more make it even more important —and more exciting— that the FLOSS community gets it right when designing the text layout engine for future FLOSS systems. Please contribute by outlining the categories and topics for discussion that are important to you and the community at large:
- Required features of a modern text layout engine:
- Carefully designed OO API which is easy for people to understand, use, and extend.
- Existing APIs of pango, Qt and probably ICU must be maintained: otherwise it won't be a unified text layout engine, just yet another text layout engine.
Do we need a completely new API (application programming interface) at all? The above introductory text identifies problems with the capabilities of existing text layout engines, but doesn't identify any problems with the APIs of existing layout engines. Only a few items below need any API change: justification (enabling, and exposing some options such as use of kashida vs space) and line-breaking (exposing options such as speed vs æsthetics).
- What sort of extending do you have in mind?
- Good API documentation so people can get up to speed and use the API quickly.
- See above re degree of need for new API.
- What discussion at the summit is necessary for achieving good documentation?
- Support for correct shaping for all modern scripts:
- Including shaping for currently neglected scripts like Mongolian
- Modular shaping engine to make it easy to add new scripts as they become encoded (e.g., Lanna)
- Support for text justification:
- JALT table support
- Kashida justification for Arabic script
- Support for advanced justification algorithms for Japanese and mixed / multilingual text
- Enabling/disabling optional ligatures
- Support for cursor positioning:
- Support for JSTF table
- Support for "smart cursor" positioning to highlight and edit character components within ligatures in a seamless manner for both OpenType and Graphite fonts.
- Support for line breaking:
- Carefully designed API that makes it easy to subclass / extend to handle additional scripts and language needs.
- Support for word/line breaking in non-spaced Southeast Asian scripts:
- Support for dictionary-based word segmentation algorithms (Thai, Khmer, Myanmar)
- Support for rule-based word segmentation algorithms (Lao, others?)
- Support for sophisticated hyphenation algorithms:
- Support for more than just one sophisticated hyphenation algorithm
Locale- and language-aware hyphenation (e.g.: American & UK English have different rules)
- Support for vertical text:
- Support for vertical Japanese and Chinese
- Support for vertical traditional Mongolian
- Support for script/language specific layout (e.g. GSUB table, digit substitution)
- Support for extendable script properties (e.g. to select different stylistic forms)
- Support for adjusting a text layout to different metrics (e.g. when replaying a meta file on a system where the original font is not available)
- Maximum metric compatibility with legacy systems
- Maximum binary API compatibility between versions
- Carefully designed OO API which is easy for people to understand, use, and extend.
- Status Quo:
- ICU layout engine
- Qt's layout engine
- Architecture / layout pipeline:
- Itemizer -- scan Unicode text and find font runs, script runs and bidi runs.
- Hyphenator -- scan Unicode and determine possible break position and their type (hyphen/no hyphen)
- Shaper -- determine the used glyphs, their position and their metrics for one or more items.
- Apply optional features.
- Layouter -- break paragraphs into lines and apply justification. Reshape some items as needed.
- Renderer -- render given lines on screen or to PS / PDF
- Required features for a *unified* API:
- Most important is unification of the shaper since differences in glyph layout are the most annoying ones for a user. Also there's usually only one “right” way to do it.
Conflicting requirements for a layouter: some apps (DTP & visual design) require sophisticated layout, while PDAs others want something small & fast (though may require correct shaping for at least a couple of scripts).
- Large things must be separable.
Shapers, if in the aggregate they take up lots of space. Allowing shapers to be separate may also make it easier to add/upgrade a shaper independently. (so long Sophisticated bits range from fast, simple & small to high-quality layout.
- Separating sophisticated line breakers allows for experimentation.
- Must have some control over CPU use. Some things are obvious, like defaulting to not doing expensive things like river avoidance (desirable for DTP but too slow for most use). Other choices are harder: what if we find we can speed things up by not doing proper shaping? (E.g. there is some anecdotal evidence that LaTeX is much faster than pango for a large english-language document.)
- Large things must be separable.
- Unification is not necessary for renderers. They only need to render text using glyph indices (instead of Unicode code points)
- Ability to reuse (partial) results of a previous itemizer run
- Ability to rerun the shaper on some arbitrary substring
- Ability to (de)select any feature when running the shaper
- Use of unified datastructures: font faces (eg. FT_Face), styles (using feature lists?), Unicode strings (UCS4?) and char attributes, Glyph strings and position / metrics information.
- The shaper needs to return following information about an item:
- glyph indices and their number
- logical clusters (what glyph belongs to which char)
- x/y advance
- (x,y) offset
- ascent and descent (might be calculated outside the shaper from glyphmetrics and yoffset)
- rotation (at least 0°, 90° and 270°) and flip
- hor/vert scaling (offset, rotations and scaling could be combined into an affine transform)
The last two items will probably evoke some discussion
Shaper interface requirements for line-breaking
Shaping can depend in part on line-break choices —
- if the break appears in the middle of a ligature. (Most common if doing hyphenation, though might also occur with spaceless scripts like Thai, Lao, Khmer, Myanmar. In principle this could happen even in english if a font has a ligature involving space or other punctuation, though I don't know of any fonts that do this.)
glyph (& spacing) changes in response to available space (JALT & FALT tables). This occurs mostly with justified text, though might be desirable for ragged text to prevent a word from sticking out past the ends of nearby lines.
Different responses to this:
- The simplest (involving least work for shaper writers) is to do shaping in two passes, where the first just gives width metrics (so we can work out where to break into separate lines) and the second pass gives the string of glyphs and positions given those line-break choices.
- For simple layout cases (no ligatures across line-break opportunities, no hyphenation, no space adjustment), a speedup is available by doing just a single pass that gives word widths, glyphs and glyph positions.
- Can extend the above a little by doing a single pass but giving diff info to cover some of the cases.
Shaper interface requirements for justification
When a line contains runs of different scripts, we need to find out how much extension/shrinking opportunity there is within each run. Different opportunities have different priorities: e.g. in english text we want to space out/condense word spaces more than letter spaces. Based on http://www.w3.org/TR/css3-text/#text-justify (or the more recent draft at http://fantasai.inkedblade.net/style/specs/css3-text/scratchpad, which mentions Tibetan), it appears that deciding spacing requires knowing:
- The total width of inter-word spaces (and other inter-word markers).
- The number of kashida insertion points
- The amount of extension and condensing available from glyph changes (e.g. applying JALT, FALT tables).
- (Unfinished: lunch time...)
Some rules for justification of Tibetan: http://www.tug.org/pipermail/xetex/2005-April/002147.html. Mentions a "space after ga" option not mentioned by the above-referenced CSS3 draft.
Consumer Device Support
Consumer devices running FOSS software are becoming increasingly important and prevalent. Many also support network connectivity and some sort of open browsing software. With these capabilities, the need to support advanced text layout in a large number of languages becomes important. A goal of the layout engine should be to provide a small kernel of core services without extensive dependencies on external libraries. In addition in and embedded device the object api is often provided by the device software so it would be nice if this api was object ready but procedural in nature to allow direct wrapping with and externally defined api. An example of a object ready procedural api is one that accepts a user data pointer and passes it back through callbacks. Also for interfaces the structure or class implementations used by the api should be provided by the user instead of being defined internally. This api should consist of the following.
- An interface to provide an external font list for resolving glyphs
- The ability to flow the text without actually retrieving the glyph data but instead just provide the glyph indexes.
- Minimal line breaking support
- Text file format preferably xml for adding additional languages and locales
A number of assumptions could be made for example the font interface from Freetype could be adopted for the font list. The actual data structure for the font list and layout results, and line breaks should be user supplied with the api providing the base structure this allows additional implementation data to be added. Next a text file format will allow systems that don't support dynamically loaded code or don't allow unsigned dynamic code to ship with a small number of supported locales but add additional ones on demand. Glyph rendering, font management, text storage etc are handled outside the layout engine.