Shotwell Architecture Overview: Source Code Organization

1. Background

For eight releases, Shotwell used a simple (to be charitable) model of source code organization. All source code was stored in a single directory (src/) in files with more-or-less descriptive names, usually the name of a major class in that file, but not always. In almost every case, each file holds multiple classes. There was some attempt to keep source code of likeminded purpose together in files, but not always successfully. This was fine when Shotwell was under 30,000 lines of code (including whitespace and comments), but began to bulge at the seams thereafter. Shotwell stands today at approximately 100,000 lines of code. The old model was inadequate.

As discussed on the Shotwell mailing list, there were a number of factors in deciding how to best organize the source code. Other factors were not discussed but were implicit or required by certain conditions. In sum these include:

We decided from the outset that we would not use autotools for Shotwell’s build system. Any changes to the source code organization had to work with our homebrew Makefile.
Unlike many other languages, Vala compiles all the code at once, producing .c files which may be compiled in parallel. This makes strategies like recursive Makefiles difficult.
We want the source code organization to maximize programmer productivity and to enable (or encourage!) outside contributors to make changes easily.
Programmers were complaining that it was painful to modify the Makefile. The Makefile had grown to 500 lines of baroque code.
It was also painful to add new source files, both because that required modifying the Makefile and just the general nuisance of getting one prepared. This led to existing source files growing without bounds.
Namespaces were not embraced by Shotwell code early on. They could be used now, but at the expense of updating a lot of existing code.
Initialization was a persistent problem. Many subsystems had to be launched at startup in a particular order, which was manually enforced in main (). Additionally, these systems’ init () and terminate () functions made no checks that they were not called more than once.
One topic of concern was directory and filename naming schemes. Because Vala is a relatively young language, a single standard has not come to the fore.
There is a desire to avoid a deep directory structure. If the code can be organized by one-deep directory nesting, I consider that a win.
Automation is good, unless it’s bad. Complexity is bad, unless it’s good.

On 11 January 2011 (1/11/11!) the initial change to a new organization model was committed to the repository. This document attempts to explain some of its thinking and how to grow Shotwell’s code in this model.

2. Units

There now exists in Shotwell unitized code and ununitized code. Ununitized code is simply code written and organized under the old model: all symbols in the global namespace, manual initialization/termination, files placed directly under the top-level src directory. As of today, most of Shotwell’s code remains ununitized.

Unitized code has several features:

Units are named in CamelCase.
Each unit has a master unit file that is the name of the unit with the Vala source extension, i.e. Xyzzy.vala.
The master unit file has initialization and termination methods (init() and terminate()) that are called once and only once, at program startup and shutdown, in the main (i.e. UI) thread context.
The master unit may also have a preconfigure() method that is called prior to init(). This allows for required parameters to be passed to a unit prior to initialization, i.e. the database filename. Note that this must be done manually in main.
The master unit declares these methods in the unit’s CamelCased namespace. Note that all code in the unit is not required to be in the unit’s namespace. This allows for a migration path for existing code. (More thoughts on namespaces are below.)
Units are stored in subdirectories under src with the unit name in underscored lowercase, i.e. src/xyzzy/.

The vision is that once most or all code has been unitized, the programmer can quickly find what they need.

2.1. Unit Files

The only source code requirement for a unit is that it have a master unit file matching the name of the unit, with the init/terminate/preconfigure methods declared within the unit’s namespace. From there on, programmers can (and should) add new source files to the unit. Any name is acceptable. However, the scheme Shotwell tends to follow is that the filename matches the name of the class (which is CamelCased) or describes the grouping of functionality within the file (i.e. ColorAdjustments.vala). If more than one class is in the file, the "major" class name is chosen for the filename.

There is no hard requirement in Vala that each file hold exactly one class, like Java. I’d like to see more code broken out in Shotwell, but files may hold more than one class.

New code should (within reason) start using the unit’s namespace. For now, this is decided on a case-by-case basis. More discussion is in Namespaces, below.

2.2. The mk Directory

Each unit has a directory named mk. In it is a file named after the unit’s directory name with a .mk extension (i.e. db.mk). This is a Makefile that’s included at build time.

The unit’s mk file holds all the particulars for the unit, including its directory name, unit name, a list of the source files to be compiled into the project, a list of other units this unit relies upon, and a list of resource files that should be included in the distributed tarball (but are not necessarily used at compile time). Some of these fields are explained in more detail below.

The final line in these mk files is to include unitize.mk, which processes the unit’s mk file for the master Makefile to use.

2.3. The Master units.mk File

The units.mk file (in Shotwell’s root) holds a list of all the units in the source tree. This must be updated manually when a new unit is added. This list is used to compile Shotwell.

It also holds variables that describe which units are required for various run modes (library or direct-edit). These lists are used to initialize only the units that are required for a particular mode.

2.4. The Unit-unit

All support code and template files for the unit system is stored in src/unit, the unit-unit. This unit is automatically a prerequisite for all units and does not need to be listed in UNIT_USES.

3. In Practice

3.1. Adding or Modifying Existing Code

If the code change merely requires adding or modifying existing source files, there’s nothing special to do. Make a patch!

3.2. Updating a Unit

If additional files are being added to the project, there are some steps to follow.

If the file is some kind of a resource that is required for the tarball or is used by Shotwell at run-time, add it to the unit’s rc directory (if it does not exist, create it). In the unit’s mk file, add it to the UNIT_RC variable.

Note: Some resource files are located in special locations in the source tree, such as the .ui and icon files. These (in particular, the .ui files) may be migrated into the proper unit, but for now place those types of files in those directories.

If the file is a source file, create the file in the unit directory and add it to to the UNIT_FILES variable in the unit’s mk file.

In the root of the Shotwell source tree is a new script, mkvala. This script file automates the task of creating a new source file. Run the script with no parameters to see its usage. Currently this does nothing more than create a file of the appropriate name in the unit’s directory with the Yorba license at the top.

The script does not add the new file to the unit’s mk file. This must be done by hand. The script does minimal error checking. It will overwrite existing files.

3.3. Creating a Unit

If a new unit needs to be built, or ununitized code needs to be moved into a new unit, use the mkunit script in Shotwell’s root directory. Do not do it by hand. You’ll probably get it wrong.

The script will create the unit directory, the unit’s mk directory and .mk file, and the master unit file with the init and terminate methods prepared. The script uses m4 to insert the proper names into the files.

Once the unit has been created, add the unit name to the units.mk file. Be sure to add it to whichever run mode it will be used (library or direct-edit mode, or both).

3.4. UI and Glade Files

If your unit contains UI classes that are referenced in a Glade file, the class reference in the object tag needs to be the fully qualified name of the class without the dots. For example the AlienDatabaseImportDialog class in the AlienDb unit is referenced like this in the Glade file:

<object class="AlienDbAlienDatabaseImportDialog" id="alien-db-import_dialog">

...

</object>

Signal handlers also need to reference the fully qualified name but using a lowercase, underscore separated name, like the example below from the same dialog:

<signal name="file_set"

handler="alien_db_alien_database_import_dialog_on_file_chooser_file_set"

object="alien-db-import_dialog"/>

4. At Compilation Time: Unitizing

When all the files are in place and the .mk files are properly updated, the build process can unitize the units. Like other aspects of the build process, this is automatic and only needs to be done once (unless a make clean is performed).

Unitizing code means processing its unit.mk file and generate temporary build files that handle a unit’s requirements. These temporary files are held in the .unitize directory (created under the src directory). These files are auto-generated .vala files.

Some people moan about auto-generated files. I’m one of them. I don’t love them. Automation is good, except when it’s bad. Here, I think they’re good since (a) they perform operations that are common among all units and is easily broken when hand-coded, and (b) any bugs in the technique can easily be fixed without updating a lot of files.

4.1. Initialization and Termination

The first type of auto-generated file is a unit’s internal file (i.e. _DbInternals.vala). This file holds the real initialization and termination points. The generated code uses an incremented counter to check if it’s been called more than once. This ensures that every unit’s init() and terminate() methods are called once and only once.

This file also calls (once and only once) the initialization and termination methods of all of this unit’s required units (listed in the unit’s mk file under UNIT_USES) prior to the unit’s own methods. This ensures that all prerequisite units are initialized and terminated in proper order.

4.2. unitize init and unitize terminate

The second type of auto-generated file holds the unitize *_init and unitize *_terminate functions for each run mode (library or direct-edit). So, to initialize all units for library mode, main merely calls library_unitize_init (), which initializes all units. library_unitize_terminate () performs the symmetric task.

5. Namespaces

Namespaces introduce a simple way to organize code by name. It also can be nightmarishly complex when slavishly followed.

Because we want to migrate to this new system rather than introduce a major new model all at once, existing code is being moved to units in waves. Some of that new code will be added to the unit’s namespace as well. Some won’t. The call is largely subjective, depending on factors like how much code is affected by using a namespace, and does the change require moving other code into the namespace as well.

In the case of old code, I tend to prefer erring on the side of caution when moving it into a namespace. In the case of new code, I’m less conservative.

In general, I feel the file and directory organization of the new model is the larger win. Namespaces are not as pressing.