PeterMoulder – GNOME Wiki Archive

Currently working with Andreas Vox on what is needed from shapers to allow paragraph-at-a-time (or page-at-a-time) line breaking to be done.

There are many ways of breaking paragraphs into lines. Let's start with what a simple dynamic programming implementation needs for ragged-end text (i.e. not fully justified, no condensing or expansion of text to make it fit better). It is assumed that we have already gone over the text to find possible line break points (the language-dependent hyphenation points and normal word breaks). The line breaker then needs to know:

The total number of line-break opportunities in the paragraph. (I.e. the number of “words” in the paragraph, so long as ‘word’ is defined in terms of line break opportunities.)
The width of a line consisting of words a..b. It wants this in O(1) time, but every query for a..b is preceded by a query for a..(b-1) (or a..b is a single word).
The intrinsic cost of a particular break point. This is mainly just to avoid hyphenation when we can, by saying that hyphenation points have a higher cost than normal word breaks.
What breaks are mandatory (forced line break characters); though this could be included in the "line break penalty" above.

If we are also allowed to shrink/extend text to make it better fit the available space (kashidas, JALT table (and FALT table?), perhaps enabling/disabling optional ligatures), then instead of a single width for a given line a..b, we need:

The minimum width of that line.
The “natural” width of the line, i.e. the best width (or the least of the best widths if it's possible for different widths to look equally good).
Information about the “cost” of condensing/extending the line, so that we know whether it looks better to have fewer words on one line and more words on another.

Effect of optical margin adjustment on line-breaking

We haven't yet thought how to handle optical margins (OPBD table) well for doing “width of line a..b” calculations in O(1) time, while handling bidi correctly. (Bidi makes it harder to work out which is the leftmost/rightmost word: bidi reordering ususally takes Ω(n) time in the number of words. But maybe we can still work out the revised leftmost & rightmost word in O(1) time if we're only adding one word since the last query.)

However, assuming that optical margin adjustments can only reduce the needed width of a line rather than increase it (or if at least we know an upper bound as to how much the width can increase), then a reasonable fallback plan is just to ignore these optical margin adjustments for the width calculations when making line-breaking decisions. This would mean that we would occasionally wrongly conclude that a line a..b is too tight to fit well in the available space; it has most effect on line-breaking choices for very narrow columns (e.g. newspapers, tables).