Shotwell Architecture Overview: Data Structures

This section does not deal with all data structures in Shotwell, only two primary ones and their children. These structures are used and monitored throughout the application, and thus have a very specific set of design goals.

Although Shotwell has many data structures for many types of objects (cameras, pages, caches, thumbnails, etc.), it has a set of primary data objects which almost all these other key off of for their duties. The collections of these data objects may be quite large, and different subsystems in the application may have very different requirements for manipulating and observing these collections. These data objects are generally stored in the database (one object per row), as they need to persist and are constantly accessed.

1. Background

In earlier versions of Shotwell, these objects did not inherit from a common ancestor, although they were being manipulated in similar (or identical) ways. Additionally, when a collection of these objects was updated (i.e. one was removed from a list), other classes would need to be notified that this had occurred so they could reflect that internally or to the user. In some cases, an object needed to know when the state of any object of a particular type had changed. The solution was either a hardcoded special-case call or connecting to the signal of every object, which could grow into the tens of thousands. These (and other) aspects of the design were highly unsatisfactory and led to a major redesign of the data structures.

Other situations encouraged other design choices. For example, the collection classes offer methods for batch operations on multiple items. The reason for this is to aggregate all the state changes into a single signal call (i.e. a bunch of items have been removed) rather than a signal call on every item. It’s not that signalling is so expensive, but if the signal handler is doing something expensive due to the state change (i.e. doing a complete reflow of the user-interface), it’s better that one signal is fired than one thousand. Workarounds for this usually entailed using the Idle or Timeout queues.

Three notes before proceeding:

Although this design may sound like a generic collections class implementation, it is not intended to be as such. If Shotwell needs functionality specific to a photo organizer that is not generic elsewhere, it will be made. Conversely, although a generics class (such as Gee) solves some of these design goals, the need for others lead to this implementation. (Note that Gee is used internally by these classes.)

Related to the first, there are plenty of situations in Shotwell where it might be tempting to use these classes for, say, a temporary collection of a particular type of object. This may be overkill; often a Gee.Collection is perfectly suited and should be used.

This design is not immutable. As feature requirements change, these classes will reflect them. That said, the spirit of the design should be kept in mind as they evolve.

2. Design Goals

The design of this architecture was motivated by the following goals:

Generic object querying, manipulation, and observation (i.e. signals)
Generic object collection querying, manipulation, and observation (again, through signals)
Enforce uniqueness of objects with a backing store while maintaining multiple representations of them for the user and for user-facing container classes (i.e. Tag, Event)
Encourage batch operations on objects in a collection (rather than one at a time)
Generic implementations of common visibility and ordering operations, i.e. selection, hide/unhide, sorting, etc.
Signal aggregation
Signal reflection
More obvious signal ordering for child classes and observers
Collection monitoring and mirroring (to fully synchronize one collection with another)

The particular meaning of some of these terms is explained below.

3. Objects

The base class for all objects in this system is DataObject. A DataObject has a single public field of common use, get_name . Unlike other signals , notify_altered () does not fire an "altered" signal for the DataObject. This was deemed too expensive in common use, particularly when a number of objects are being altered at once. Instead, DataObject contacts its owning DataCollection (explained below), which collects the notifications and fires a single "item-altered" . Instead, a single "items-altered" signal is provided by DataCollection. For each object that has been altered, it provides an Alteration object.

An Alteration is an immutable object with one or more subject:detail pairs. Subject may be something broad, like image (indicating the visible nature of the photo has changed) or metadata (suggesting data about the photo has changed). Detail is finer-grained information, such as color or exposure-time. An Alteration may contain more than one subject:detail pair. Alterations may also be compressed; for example, compressing an Alteration of image:color and an Alteration of metadata:time-taken results in a new single Alteration of image:color, metadata:time-taken. This allows for a single altered signal to be fired at once, another aspect of signal aggregation.

DataObject has two immediate children: DataSource and DataView. The distinction between these is important.

3.1. Source Objects

A data source (often referred to simply as source in the code) represents a backing object (object in the general sense, such as a file or a row in the database). Each source is individuated and unique; that is, while two files may have the exact same contents, they are not the same file. A write operation on one does not change the other.

This uniqueness principle is important to the concept of a DataSource. Just as the file or a database row is unique, each in-memory DataSource is likewise unique. If two DataSources exist that point to the same file, there is a problem.

A DataSource adds one additional concept to DataObject: it can be destroyed (which is different than simply losing all its references and being finalized). Destruction means the source is removed from persistence and (possibly) it’s backing object (i.e file) is deleted. Thus, when a DataSource for a photo is destroyed, the file may not be deleted, but Shotwell does not remember it either, and it must be re-imported to be a DataSource again.

Although the file may not be altered (as in the case of a photo file), if some persisting aspect of that file changes (i.e. a transformation is added to the database), the DataSource is altered and fires a signal.

Note that the DataSource for photos (PhotoSource) is more closely tied to a row in the database than the file on disk (although it obviously works with both). The distinction becomes more interesting in direct-edit mode, where an in-memory database is used and the file is handled more directly.

Some container DataSources (i.e. Events and Tags) may be dehydrated () and reconstituted () via a SourceProxy. This abstract base class can store an in-memory representation of the DataSource before it’s destroyed. If the DataSource needs to be re-created (for example, because of an undo operation), the DataSource is reconstituted and re-inserted in its SourceCollection. The SourceProxy will fire a "broken" signal if some event occurs that makes it impossible to reconstitute the DataSource (for example, if a photo in a dehydrated Event is destroyed, the Event cannot properly be reconstructed). DataSources that wish to implement this need to implement save_snapshot (), implement a SourceSnapshot and SourceProxy, and make the proxy available.

DataSources may also be unlinked (and relinked) from their SourceCollection. Unlinking is a particular use case; it means the DataSource wishes to be removed from the SourceCollection but may return to the SourceCollection in the future. Because this act is completely signalled, other observers (particularly ContainerDataSources, which are comprised of DataSources that hold or aggregate other DataSources, i.e. an Event is comprised of many Photos), these container objects can serialize backlinks to the DataSource. When (and if) the DataSource is relinked to its SourceCollection, the links can be restored. If the ContainerDataSource’s last contained DataSource is unlinked, the ContainerDataSource will evaporate itself. This is how the Trash works in Shotwell — if a Photo is moved to the trash, the Event object gives it a backlink to store. The Photo is no longer in the main SourceCollection and appears "gone" to everyone in the system. At this point, a number of possible outcomes exist:

If the Photo is restored from the trash (i.e. undeleted), the backlink is re-established and the Photo becomes part of the Event again.
If the Photo is destroyed or the Event is destroyed, the backlink is destroyed.
If the last Photo of the Event is moved to the trash, the Event is evaporated and appears to be destroyed in the eyes of the user.
If the Photo is restored from the trash when the Event is evaporated, the Event rehydrates itself.

Note that the terms evaporate and rehydrate are used both for the SourceProxy system and the unlinking/relinking system, because both operations are analogous. The important distinction between the two can be summarized like this: SourceProxy is only for in-memory, application-session (i.e. the Undo/Redo system), while the unlinking/relinking system persists between sessions (and is used for the Trash can and for Offline media).

When a DataSource is unlinked but should be available to the system in its unlinked state, use a SourceHoldingTank. This class understands linking and the requirements of a DataSource. The trash and offline photos are held in SourceHoldingTanks.

3.2. View Objects

A data view represents an object that shadows a data source. It is aware of its DataSource in special ways, but it also can maintain its own state. While each data source must be unique, there may be (and probably will be) several DataViews of that source object. The obvious use of a DataView is for a user-visible representation of a particular DataSource, although there are other uses of the DataView class. In short, view is not synonymous with user-interface. It’s better to think of it as a viewer (or observer, or subscriber) of a source, and that there may be multiple viewers of any single source.

DataView is designed to be the base class of user-interface objects. However, as GObject only allows single-inheritance and it might be desireable to use a Gtk.Widget for the interface, DataView can be held by the widget and observed via its signals. Going from a DataView to its Widget container is not handled in the current implementation of Shotwell, but could be added.

DataView does offer a handful of states that are normally associated with user-interfaces, including selection and visibility and signals for its view and geometry changing. Note that these states are manipulated through the object’s collection and not directly. This rule was selected because (a) synchronization is vital to this architecture, and allowing for both increased code complexity, and (2) more often than not, the caller is dealing with a list of objects to begin with (i.e. a list of selected items). This rule also encourages using the Marker class (explained below) to its fullest, meaning batch state changes can be grouped into one call.

A future refactorization may be to break out concepts from DataView and ViewCollection that are specific to user-interfaces (i.e. selection and some of the signals) into specific subclasses, i.e. UserView and UserCollection.

3.3. Source and View Relationship

When a DataView is created it registers as a subscriber to its DataSource. When the DataSource is altered in some way, all its subscribers are informed. For example, if the user crops a photo (altering the DataSource), it’s vital that all its representations (DataViews) know of this. Some will need to redraw, some will drop their cached pixbufs, and so on. Likewise, if a photo is destroyed, all DataViews need to be removed (as there is nothing backing them any more). This is one aspect of signal reflection.

4. Collections

The base class for all collections of DataObjects is DataCollection. Unlike DataObject (a thin class), DataCollection is full-featured. Its immediate children (SourceCollection and ViewCollection) deal with their respective contained objects (DataSource and DataView).

4.1. DataCollection

A DataCollection groups DataObjects. DataSources and DataViews are not grouped together; this is enforced in DataCollection’s children.

DataCollection offers the usual collection-y operations, including add, remove, size, iterable, clear, and so forth. A couple of distinct differences exist between the usual collection class and DataCollection. One is that an object may not be directly removed from DataCollection. Instead, DataObjects are marked via a Marker object and then removed all at once (via remove_marked). A generic method (act_on_marked) exists for other batch operations on the collection.

The motivation for this is twofold. First, earlier versions of the Gee iterator didn’t offer a remove () method, but would assert () if the Gee.Collection was altered while iterating. (Newer versions of Gee solve this problem,) Another issue was being notified of the removal; as explained below, all collection alterations are signaled. Marker allows for this to be trapped and signalled. Using a standard collection would mean writing a custom iterator, which is just moving the problem around. Also, it’s not always obvious when something is removed from a collection. It is possible for a collection to be iterated, calling methods on the objects, which then turn around and remove themselves from the list being iterated over. Marker, again, solves this problem.

DataCollection also offers signals to observe various state changes in the collection, including "items-added", "items-removed", "contents-altered" and "ordering-changed". DataCollection also works with DataObject for the other half of signal reflection, its "items-altered" signal. This allows for an observer to be notified whenever any DataObject in the collection is altered.

DataCollection offers the ability to freeze and thaw its signals. Not all signals are currently affected; this may change in the future. The ones that are affected are documented in the code. When a DataCollection is frozen, signals coming from its member DataObjects are collected and held. When the collection is thawed, the appropriate signal is fired once, passing a list of all objects that fired in the interim. (For example, if a collection is frozen and iterated over, and some of the objects are altered in the process, when the collection is thawed a single signal is fired listing all the objects that altered.) This is another aspect of signal aggregation, meaning that observers of a collection aren’t called, say, 1000 times when altering 1000 objects in a loop, but only once.

DataCollections also have generic properties which all DataObjects may query for and monitor for changes. When a property is added or modified, the DataCollection is frozen while all objects are notified, and then thawed. This provides a generic way to change some setting the individual DataObjects understand (such as a flag to display their title, or the scaling size of their thumbnail).

4.2. SourceCollection

SourceCollection is a thin layer over DataCollection. It merely provides signal reflection for DataSource’s destroy signal. It also ensures that when an object is destroyed it’s removed from the collection. DataSources may be unlinked and relinked from their SourceCollection; this is where that operation occurs.

4.3. DatabaseSourceCollection

A DatabaseSourceCollection is a utility SourceCollection. Since most SourceCollections deal in DataSources from the database, this class offers functionality to convert a database ID to the corresponding DataSource.

4.4. ViewCollection

Unlike SourceCollection, ViewCollection is a rich class adding many features to DataCollection. Akin to SourceCollection, it provides signal reflection for all the signals DataView provides for selection, visibility, and view/geometry alterations.

ViewCollection also offers SourceCollection monitoring. Most (if not all) ViewCollections have a simple criteria for what DataSources they are interested in representing. For example, an EventPage’s ViewCollection wants to display all PhotoSources for that event (i.e. has the same database !EventID). Rather than trapping all the appropriate signals on the SourceCollection and manually adding/removing DataViews as necessary, ViewCollection.monitor_source_collection () will do the work. The caller provides a ViewManager object, which merely has a predicate function (include_in_view) and an object creation function (create_view). Thus, as changes occur to the DataSources, the ViewCollection is always up-to-date.

Similarly, a ViewFilter may be installed. Rather than adding/removing DataViews from the ViewCollection, a ViewFilter allows for DataViews’ visibility be defined. Thus, if something is hidden it is still part of the collection, it’s simply not discovered through the standard get/iterator methods. The filter works even when DataViews report they’ve been altered or when they’re added.

Like remove_marked in DataCollection, selection and visible changes are made via select_marked/unselect_marked and hide_marked/show_marked.

ViewCollections may be mirrored, that is, when an element is added or removed from one ViewCollection, the mirror is automatically updated. This is used in the Tag pages; the Tag object maintains an internal ViewCollection of all photos associated with that tag. Each TagPage mirrors the appropriate Tag’s ViewCollection, displaying a user-interface object for each Photo.

ViewCollections may also be locked. This is similar in concept to freezing a collection, but instead of signal aggregation, locking a ViewCollection prevents items from being removed or hidden. This allows for the collection to be iterated over and modified without fear of items going away (which can be confusing to the user). When the ViewCollection is unlocked, the items are removed or hidden all at once. (This is deprecated functionality and will hopefully be removed in the near future.)

Finally, a ViewCollection offers basic controller aspects (get_first/get_last/get_next/get_previous). These are useful when the user is moving through the collection, but not a good choice for a coded iteration.

5. Signal Ordering

Although !GObject provides several mechanisms to specify what order signal handlers will execute, it’s a bit complex for the needs of this architecture. Also, as Shotwell is 100% Vala, we wanted to provide a system that was easily coded in that language.

There’s strict signal ordering and loose signal ordering. This architecture uses a looser ordering model. Specifically, it’s assumed that the subclass of a DataObject will need for its internal state to be updated at a particular moment, while the ordering of observers is far less rigid. For example, a ViewCollection maintains a list of all items visible in its collection. A child class (which is more intimately tied to its parent than observers, just as in real life) may need to update its state or perform an operation before or after the observers are notified. However, observers should have no expectations of when they’re notified in relation to other observers. If one observer (i.e. signal handler) really needs to be notified prior to another observer, that indicates a design problem.

Best practices when working with this architecture is this:

No signal is fired directly, neither by the signalled class, a subclass, or an observer. Instead, all signals are declared virtual and have corresponding protected virtual notify_* methods (i.e. notify_items_added). The notify_* method fires the signal (i.e. "items-added"). These methods are protected for a reason; only that class or a subclass should call them directly.
A subclass that needs to update before the signal is fired overrides the notify_* method, being sure to call its base handler.
A subclass that needs to update after the signal is fired overrides the virtual signal handler, being sure to call its base handler.
Observers are notified between the two.

This means any block of code should know when it’s executing, although observers cannot be sure of their order.

6. In Practice

Today there are five SourceCollections in Shotwell: LibraryPhoto.global (for all photos in library mode), DirectPhoto.global (all photos in direct-edit mode), !Event.global (all events), !Tag.global (all tags), and !Video.global (all videos). They are static members of these classes, reflecting the uniqueness of their members. (They are analogous to the facilities some object-oriented languages have to find all instances of a class, although Shotwell’s SourceCollections are manually maintained. They also do not use weak references, as they are the first and last word on the availability of an instance.)

Every Page has a ViewCollection. In general, Page’s children install ViewManagers to monitor their appropriate SourceCollection, which mean they always display UI representations for whatever they’re interested in, or they mirror a Source object’s internal ViewCollection (as with Tag). Future pages are similarly set up. For example, for a Page to display all photos taken in the current year, the implementation would inherit from CollectionPage (which provides a checkerboard page of Thumbnails for all photos in its ViewCollection) and create a ViewManager to filter on the date. Even if a photo’s date is changed on a separate page, it will be reflected in this hypothetical CurrentYearPage’s ViewCollection.

As mentioned, Tag and Event internally use a ViewCollection to track the PhotoSources assigned to its !EventID (or !TagID). ViewCollection source monitoring allows for this list to always be synchronized. Future work includes having EventPage use ViewCollection’s mirror feature to keep itself synchronized.

CheckerboardLayout, a widget within the CheckerboardPage, monitors the supplied ViewCollection to update the display in ways appropriate to what’s changed (specifically to avoid a reflow whenever possible). Because the concept of selection is generic to ViewCollection, CheckerboardLayout can implement a drag select on its DrawingArea with other observers of the ViewCollection updating their state when appropriate.

7. Further Work

While much of this architecture is in place, some design decisions were put off until more features either necessitated them or clarified their implementation.

The signal ordering system (notify_* methods) have not been fully implemented in the collection classes. Some of the subclass signal overrides need to be moved to their notify_* counterparts. Some have a notify_* method but no signal, because it is understood that no code exists subscribing to that signal and there is performance overhead with firing it.