Geary Full-text Search Strategies
Back to Geary
Geary 0.10 introduces a new full-text search algorithm to improve the search experience for users. Prior complaints about search revolved around Geary returning too many results with highlighted terms seemingly unrelated to the search query. Although not all search problems are fixed in 0.10, Geary should be more conservative about displaying results that match the user's query.
This new strategy may still be unsatisfying for some, and so Geary 0.10 offers a way for power users to customize the experience via GSettings. There are no plans to expose this setting in the Geary user interface, as we think it will merely confuse novice users.
Since quotation marks in search terms are important in Geary, search terms are described here with square brackets surrounding them to set them apart from the surrounding text. In other words, if this is typed into Geary's search box:
It's described in this text as [root beer]. This will find all emails with words starting with the characters root (including roots, rooting, and rooted) and words starting with the characters beer (including beery).
If this is typed into Geary's search box:
It's described in this text as ["root beer"], which will find all emails with the exact phrase root beer in them.
Prior to 0.10, Geary's full-text search used a Porter stemmer (based on Snowball) for improving search results. (The Snowball package offered a variety of stemmers for different languages, which Geary would select based on the user's locale settings.) Broadly speaking, Geary used the stemmer to search for "the root" of each search term. For example, typing [accounts] would prefix-match on all emails with words starting with account, including accounts, accounting, and accountant. This is useful and most likely helpful to the user.
The problem is, stemmers can overstem. The stem for [accountancy] is also account. Since Geary uses prefix matching by default, emails with account, accounts, accounting, and accountant would be displayed with those terms highlighted. These results make sense if the user enters [accounts] but seem like overkill if the user enters [accountancy].
The changes in 0.10 are an attempt to curb overstemming without dropping stemming entirely.
Instead of a single search algorithm that simply uses the stemmed variant of each search term, Geary now employs heuristics to restrain the effects of overstemming. The heuristic has four settings (called strategies):
EXACT: No stemming is performed with EXACT searches. Typing [accountancy] will only show emails containing words starting with those letters. EXACT does not mean prefix matching is turned off. Typing [accountancy] will also find emails with the made-up word accountancymess in them. (Prefix matching can be turned off by putting the word into quotes, i.e. ["accountancy"])
CONSERVATIVE: This is the default setting. Geary will only stem "longer" search terms (six or more characters) and the stemmed variant cannot be significantly shorter than the original word. For example, [accounts] will stem to account, but [accountancy] will not stem at all. CONSERVATIVE is designed to allow only stemmed words that are "close" to their originals into the search.
AGGRESSIVE: Like CONSERVATIVE, this setting has limits about when to stem and what stems are allowed, but it's more generous on both counts. [accountancy] will stem to account with AGGRESSIVE.
HORIZON: This setting stems all words. This is most like Geary's search algorithm prior to 0.10.
These settings are not planned to be exposed via Geary's user interface. If a user is not happy with the CONSERVATIVE strategy, they may change it in GSettings:
- Run dconf-editor.
- Find /org/yorba/geary in the sidebar.
- Change the "search-strategy" string setting to one of the four above settings. Case is ignored. Unrecognized strings are treated as CONSERVATIVE.
Geary does not need to be restarted for the change to take effect. If a search is already displayed, simply click on the search edit box at the top of window and press Enter to run the search again with the new strategy.
It's possible future versions of Geary will tweak the search heuristics in ways not described or anticipated here. Future designs should do their best to comport to the spirit of the above strategies so users are minimally affected by the change. It's also possible future changes will simply ignore this setting. For these reasons, the setting should be treated as an advisory for Geary.
This setting only affects searches of the local database.
The same search under different strategies may produce the exact same results or wildly different ones. The scope of the results are highly dependent on the search terms and the sum set of words in your email database.