Finding and Reminding

Contents

Finding and Reminding

It is important for a computing environment to provide a simple and effective way to find, organize, and be reminded about content and data. This is, of course, a complicated and difficult problem - one the industry has been struggling with for almost 40 years. So, we must acknowledge up-front that we will not solve it completely or with a single shot. That said, as we break from the past and some of the most difficult design and conceptual constraints we stand an excellent chance of dramatically improving the situation.

Introduction

By finding we are actually referring to re-finding something that has already been seen - intentional/conscious retrieval. This is a bit different from the task of web searching for something new. Reminding refers to the establishment and use of cues (usually visual) that assist or supplement memory - opportunistic retrieval. Some have referred to the problem as that of "keeping found things found."

Most computing environments offer a number of tools which attempt to address aspects of the general problem. These often include:

"Desktop" (folder)
"Places" (folders or collections)
Search
Recently used lists
File Manager (eg. Explorer or Finder)
File open/save dialogs

A functional evaluation is often useful for analyzing the effectiveness and suitability when such a variety of tools are involved. They can help explain why some tools are better than others for certain purposes and help guide or predict the likelihood of success for future designs. We'll discuss three different functional evaluations and how they may be used to analyze the effectiveness of GNOME and provide guidance for the GNOME 3 designs.

Functional Evaluation Frameworks

Nardi and Barreau

This well-known evaluation was first published "Finding and Reminding: File Organization from the Desktop" in 1995. It classifies three types or sets of information:

Ephemeral - "has a short shelf life and includes items such as (some) electronic mail messages, 'to do' lists, note pads, memos, calendars, and news articles downloaded from databases."
Working - "is frequently-used information that is relevant to the user's current work needs and that has a shelf life of weeks or months."
Archived - "has a shelf life of months or years, but is only indirectly relevant to the user's current work. It is infrequently accessed."

Dumais et al

In a variety of papers since 2001, Dumais et al through their studies of web browsing and bookmarking behaviors have identified a number of factors that may be used to evaluate information retrieval systems:

Portability of information - "users can take the information with them wherever they go"
Number of access points - "the user can access the information from multiple places"
Persistence of information - "the Web page, with the same URL and content will remain the same into the future"
Preservation of information in its current state - "the interactive, design and layout qualities of the Website will be preserved"
Currency of information - "the information can be refreshed to reflect the most current updates to the content"
Context - "the user is able to establish a context for why a Website was kept in the form of reminders about its use and likely future use"
Reminding - "the user is reminded about the information by the keeping method"
Ease of integration - "the method helps the user to integrate new information or new references with ongoing projects and existing organizational schemes"
Communication and information sharing - "the method makes it easier to share information with others"
Ease of maintenance - "the user will need to maintain and update his or her personal information collection"

Whittaker et al

Whittaker et al described in 1996 how e-mail, although originally designed as a communications application, was increasingly used for task management and personal archiving. Something I think any gmail user can certainly attest to. Their functional analysis of email usage is relevant to any discussion of finding and reminding and provides a great deal of insight into the problem.

They specifically define a few types of information that were kept around and not dealt with immediately. These include:

To-do items - "require the user to execute some action"
To-read items - "take time and effort to read, and users often delay reading them"
Items of indeterminate status - "unsure of the significance"
Ongoing items over a time period - "ongoing, but incomplete"
Record or history - "an archive" "a reminder to the user"

Unifying principle here being that they all are forms of incompleteness. "Art is never finished, only abandoned - Leonardo da Vinci. These may be viewed as another formulation of the Nardi analysis: ephemeral (1, 2, 3), working (3, 4), and archived (5).

They also identified three types of users:

No Filers - never file or categorize information into folders
Spring Cleaners - attempt to file information (often ineffectively) after the system had broken down
Frequent Filers - make strenuous efforts to organize information

And discovered a number of problems with filing in general:

It is a cognitively difficult task
Desire to postpone judgment
Folders may be too small
Folders may be too big
Folders may be too numerous
Drastically reduced the reminding function "out of sight - out of mind"

Summary

It may be useful to think in terms of the following categories of information:

On hand (grip) - should remain easily and quickly accessible while relevant
Under foot (trip) - should be visible to facilitate opportunistic finding and reminding
Out of sight (slip) - when the shelf life of the first two types of information expire they should slip out of view

Information that is frequently used or currently relevant should be kept around and readily available. Information that is incomplete or needs attention or action should be kept in a place where it may be tripped over to offer opportunistic finding. Other information that may no longer be immediately relevant should be available but out of the way and not interfere, clutter, or confuse. Out of the way information should have a distance that is proportional to its relevance (likely time of last use).

To be most useful the information retrieval system should be:

Available - ubiquitous access
Persistent - can be relied upon as a record
Current - up to date and relevant
Contextual - includes information in context
Present - serves a useful reminding function
Shareable - easily shared between people
Transparent - don't require effort to maintain

The system should be designed with the following types of users in mind:

No Filers
Spring Cleaners
Frequent Filers

Brief Analysis of GNOME 2

GNOME 2 lacks a credible or usable search framework but it does offer the following:

Desktop
Places
Recently used lists
File Manager
File Chooser (open/save dialogs)

The Desktop is at once the most prominent and hidden tool for information retrieval. It is used to hold both ephemeral, and working set information and often provides access to the filing and archiving. Given time, the constant stream of things to do, the constant remainder that does not get done, and the unwillingness to categorize and archive manually, and the fact that the solution doesn't scale (due to being spatially bound) results in the system breaking down. On top of this - so to speak - is the problem that this data lives underneath all of the current activities on the computer and is therefore very difficult to reach. Which also tends to reduce its effectiveness for finding and reminding.

The Desktop does not typically have any intrinsic notion of "shelf life" for the information displayed in it. Manual organization and filing is required in order to maintain the efficacy of the system.

	Grip	Trip	Slip	Available	Persistent	Current	Contextual	Present	Shareable	Transparent
Desktop	med	med	low	low	high	low	low	high	low	low
Places	low	low	low	low	high	low	low	low	low	low
Recently Used	high?	low	high?	low	low	low	low	low	low	med
File Manager	low	low	low	low	high	low	low	low	low	low
File Chooser	med	low	low	low	high	low	low	med	low	med

Overall, GNOME 2 does not perform well in a number of critical areas. It requires too much work, is too isolated, lacks context, and doesn't keep the right information on hand or in sight.

Applications to Future Designs

If we want to try to address some of these failures as we design GNOME 3 what might we consider?

Principles

Avoid elaborate filing schemes
- Optimize for the case where influx / volume outpaces the ability to triage and file content
- Do not rely on spatial finding
Avoid explicit categorization
- "Information does not fall happily into neat categorization structures." Lansdale
- It is impossible to generate unambiguous category names
- "Real categories, from the user's perspective, are often overlapping and fuzzy, making unambiguous partitioning impossible." Dumais and Landauer, 1983
- Don't make me think.
- It may be difficult to assess importance a priori so make it easy to note later
Newer information is usually more relevant
- "For most people old information is not, in general, useful information." - Nardi and Barreau, 1997
- "Use of piles has an implicit element of ordering by time: the most recent documents are near the top of the piles." - Lansdale, 1988
- Information should have a shelf-life
The world is a web
- Content is not only local
- Limit or restrict exposure to filesystem and organizational hierarchies
- Allow access to cloud-based content
- Content exists in a social context
It is a busy world
- Keep indeterminate information separate from noted or kept information
- "Used" does not mean "complete"

Use Cases

Ricardo has downloaded 4 PDFs that he needs to read for a class next week. He doesn't have the time now but he wants to make sure he gets to it before the due date.

Tess received a draft of the presentation from a colleague over email. She needs to make edits and return it but she needs to find it first.

Louis has been working for the last few days on a document in Mooble's online document filing system. He forgot to add a note about the last quarter's expenses and needs to find it quickly.

Vicki is a video editing wiz. She needs to put the finishing touches on a new commercial and send it out to the client by tomorrow night.

David is a prolific photographer and he has a USB disk full of images he needs to process and upload to his photo site.

Liddy doesn't use files that much but she does like to save web pages to keep track of them.

Proposals

Original proposal

Comments

PaoloBorelli: one of my more frequent use case is opening files "related" to the ones in the recent list, but not in the recent list. Usually related means "in the same folder" (source files of a project, downloaded files, ...)- What I'd like is a keyboard shortcut that works like alt-tab in the gnome-shell and brings up a list of the recent files, but with two levels. When you move the selection to an item in the list a secondary contextual list pops up containin a selection of related files and actions like "open containing folder"

(SorooshYazdani): Following the discussion on http://blogs.gnome.org/mccann/, there are few objections that we should keep in mind:

Abstraction makes users more helpless when something goes wrong. (I'm not sure if I agree with this issue.)
How does the metadata transfer from one system to another (say Windows)? (somewhat related, will tar still work?)
When the tree structure works (for those frequent filers), it works pretty well. Tagging system at first seems to be a step in the wrong direction for them.
Specifically, allowing for more (looser?) categorization will make finding documents more difficult in the long run.

I think the latter two issues are things that can be dealt with fairly easily. When I hear about what is being proposed, I imagine a system where creating a new tag takes just a bit more energy than using one of the more recent tags available. Most implementations I can think of actually follow this, with varying degree of success, and I think this is something that needs to be fine tuned down the road. But I think with the right balance, we can avoid the tag explosion that will make this system useless.

The fear raised in item 3 though is something that we should not underestimate. I imagine if the tags have some sort of hierarchical, that will solve the problem. However very few systems I know of actually implement that (f-spot is the only one I can think of). For example, it is natural for any music file to have multiple tags keeping track of the album name, the band name, the genre, etc. However, it makes more sense for the album name to be a subtag (?) of the band name, which is itself a subtag of the genre, and that is a subtag of music. Now, if the user is just browsing his/her computer, they just see the tag for music, rather than all of the genre tags, band names, etc. When they double click on music, then they will see some of these other tags. The paradigm suggested in here can now be applied to the situation in here: the tags that are used most often will be displayed first, followed by some of the less used tags. (I admit this part is a bit hairy, and probably needs some trial and error.) One can also imagine the computer giving hints on organizing your tags, for example if all your pictures that have your cousin in them also has the "Europe" tag on it, the computer can suggest (incorrectly in this case) that your cousin is a subtag of europe. (I guess a better example is if almost all have the "Family" tag on it, then it can suggest your cousin as a subtag of family.)

The hierarchical tag system will not be that far from the old folder based system, so in theory it can ease the issue in item 2. The biggest drawback in all of this though is that it is unclear to me how much work this is, and how much of the infrastructure for this is in place.

Nikolaos Georgosopoulos: How about the Semandic Desktop? Why not leave the OS do the simple thing it is meant to do (i.e. organize files in directories/folders) but have an extra layer that does the semantic organization in a semiautomatic way. Semiautomatic in that it can be enabled and disabled by the user but it will do the application of assigning tags mostly automatic. Mostly because, of course, the user will be able to add and work on the metadata and their structure but the system will be recognizing patterns and apply metadata (from the existing structures) as files appear. Dates and usernames are, in that scenario, metadata in the semantic model with a specific purpose and user can create such notions as he goes to tell the system what to do with it. Describing mp3 as "music" is a simple action of association but in many cases documents are filed with a specific file name that contains the purpose of the document and then, if a pattern arises the user can define that pattern and have the system recognize it and do the job.

Of course the integration of the semantic desktop will need to be higher that what is available today and it will have to become "smart" as to when to use the semantics to provide information and when no to (i.e. if search is enabled by default the semantics will be used yet the user can disabled it to simply follow the folder structure).

Last but not least, while hierarchies are straight forward to follow, many things are more connected in a graph than in a simple tree and the "thesaurus" like search is more convenient for a user that does not remember the term in its exact form. Let me tell you that I usually forget most of the things about what I search except its concept let alone remembering under what term I categorized it. With RDF technologies booming I think it will be an interesting, if nothing else, exercise to see something like this in the coming years. The key factor will be the integration of the semantic desktop technologies to the OS letting the user free of having to set all the values manually (something that after a point becomes tedious and boring thus, abandoned). GNOME 3 (and mostly the shell) is all about asking the user to think less and that, in this aspect, means the OS does more of the thinking.

I'm a simple user of GNOME and so, many things are missing such as, how much work this would be and how much of this can already be implemented without even changing the OS but I've read about integration of Semantic search in GNOME some time ago (haven't tried it yet) and if that is possible then maybe the rest is not so far fetched.