Code Overview

This page briefly describes the rationale and architecture of some of the core Conduit elements. For information on writing your own DataProvider, see 'Writing a Data Provider'.

DataProviders

This section does not describe the role of all the DataProvider methods. For that information, please see 'Writing a Data Provider'.

MODULES dictionary

The MODULES dictionary stores the following information

MODULES = {
        "NameOfModuleClass" : { "type": "dataprovider" }          
}

Notes;

Outer key is the name of the class

Valid inner dict properties include;

key	Value
type	One of `dataprovider, converter, dataprovider-factory`

Class properties

Most information about a dataprovider and its capabilities is communicated through class properties. This allows dataproviders properties to be inspected without the class being instantiated which results in decreased memory usage and startup time improvements.

   1 class UnisonTwoWay(DataProvider.TwoWay):
   2 
   3     _name_ = "Unison"
   4     _description_ = "Synchronize remote files using Unison"
   5     _category_ = DataProvider.CATEGORY_LOCAL
   6     _module_type_ = "twoway"
   7     _in_type_ = "file"
   8     _out_type_ = "file"
   9     _icon_ = "image-x-generic"

Valid class properties include;

Property	Value
_name_	The localised display name
_description_	The localised display description
_category_	A DataProviderCatefory or one of the built in categories
_module_type_	One of `source, sink, twoway`
_in_type_	Datatype the dataprovider accepts. (N/A for datasources)
_out_type_	Datatype the dataprovider emits. (N/A for datasinks)
_icon_	the icon name (not filename)

Dynamic Dataprovider Support

If you wish to implement a dynamic dataprovider (a DP that becomes available at runtime) then first create a derived DataProviderFactory and when a new dataprovider becomes available call self.emit_added(klass,initargs,category). If the dataprovider then becomes unavailable call self.emit_removed(klass,initargs,category)

For more information please see

Dataprovider Categories

A category is just an name and an icon use to group together related dataproviders in the treeview. You are free to use the supplied categories or define your own, just as done in the iPod Dataprovider Factory.

DataType

A datatype is the basic representation of something to be synchronized. Conduit ships with a number of built in datatypes including

FileBR
- A gnomevfs compatible representation of a file. Because it uses gnomevfs this supports all those backends which gnomevfs supports including; Local file, SSH, WebDav, SMB, http/ftp
NoteBR
- A basic representation of a Note. Containts a title, and contents
EmailBR
- A representation of an email. Uses python's built in email class
ContactBR
- Represents a contact. Uses vobject intenally so that it can deal with the vcard format.
EventBR
- Represents a calender event. Uses vobject internally to deal with ical format.
TextBR
- A simple wrapper around a string.

It is not expected that these datatypes will be sufficient for all purposes, and indeed this is not the design goal. Because Conduits strength is in its dynamic nature, and the ability to convert between types, it is often easier to define a more specific type more suited to a Dataprovider and then provide those conversion functions that make sense.

Type Inheritance

Conduit also supports, and recommends that specific types, where possible, inheritit from these generic types. It is also necessary that the type name reflects this inheritece.

For example, the Video type inherits from the File type. Its type name is file/video

Type inheritance is good for the following reasons -

Promotes good OO design as the derived Video type can utilize the functionality provided by the File type
Provides a sensible place for Conduit to check that a File instance is a Video instance.

(2) Requires some more explanation. Lets say the user has a folder of data, not just videos, but audio, text, etc. They want to put the videos onto their n800. The n800 _in_type = file/video. The Folder _out_type = file. In order to only put videos on the n800, Conduit calls the file --> file/video which, by checking the files mimetype, has the opportunity to reject the file because it is not a video type.

Implementable Functions

Datatypes should implement a few functions including

def compare(self, B):BR
- This should compare the datatype with B and return one of
- conduit.datatypes.NEWER This means the data is newer than B
- conduit.datatypes.EQUAL This means the we are equal
- conduit.datatypes.OLDER This means the data is older than B
- conduit.datatypes.UNKNOWN This means we were unable to determine which is newer/older
def get_hash(self):BR
- Dataproviders may wish to override this function with one which returns a small text representation of the data they hold. This snippet is not used as part of the conversion or synchhronization process but may be displayed to the user to help the compare data in a conflict situation
def get_snippet(self):BR
- Dataproviders may wish to override this function with one which returns a small text representation of the data they hold. This snippet is not used as part of the conversion or synchhronization process but may be displayed to the user to help the compare data in a conflict situation

Functions to Call on DataType Instances

In the base Datatype class are the following functions. Usually these funtions should not need to be overridden, but they should be called on an instance on a datatype before returning it to the user, as they are used to encode information about where the datatype instance originated.

def set_UID(LUID)BR
- LUID should represent this instance, it is a locally unique identifier. This can be whatever best makes sense, and need not be globally unique, it should only be unique on a per dataprovider basis. For example the Gmail dataprovider uses the message ID as the LUID. This is probabbly not unique to all of Gmail but is certainly unique within a logged in users account (and hence within the configured Gmail dataprovider for that user)
set_mtime(datetime)BR
- This function attaches a modification time to the instance. Not all dataproviders support the concept of modification time, so it is not required that this function be invoked on datatype instances. Sometimes datatypes may override this funtion, and apply it to the actual instance (such as the File datattype; it sets the file mtime on disk to this value)
def set_open_URI(uri)BR
- If called on a returned instance then in the case of a conflict, conduit will attempt to open this URI using the appropriate desktop wide application (e.g. gnome-open on GNOME). This means, unlike the UID, the open_URI must make sense at a desktop wide level.

These functions will typically be called in the DataProvider get() function in the following way (taken from Conduit/WritingADataProvider)

   1     def get(self, LUID):
   2         DataProvider.TwoWay.get(self, LUID)
   3         data = self._get_data(LUID)
   4         data.set_UID(LUID)
   5         data.set_mtime(datatime.datetime(16,8,1983))
   6         data.set_open_URI("file:///home/john/file.txt")
   7         return data

Type Conversion

Conduit has a rich and powerful type conversion system. If two dataproviders input and output types types differ, then conduit checks if one or more conversions exist which can transform the data into the necessary format.

As explained in the [wiki:CodeOverview#TypeInheritance Type Inheritance] section, conversion will be made from the parent to the child type, but not in the other direction.

Conduit will also take the most specific conversion, if available. For example if you provide a conversion foo/bar->baz/bob then conduit will use this conversion function instead of converting foo->baz followed by converting baz->baz/bob. The following table provides some additional examples of the sequence of conversions performed when converting between derived and non-derived types

output type	input type	Sequence of Conversions
foo	foo	1) foo->foo
foo	foo/bar	1) foo->foo/bar
foo/bar	foo	1) foo->foo
foo	baz	1) foo->baz
foo	baz/bob	1) foo->bazBR2) baz->baz/bob
foo/bar	baz/bob	1) foo->bazBR2) baz->baz/bob
baz/bob	baz/bob	1) baz/bob->baz/bob

Conversion Arguments

You will notice that the table above includes a call to a conversion function even if the input and output types are the same. This call is actually only made if there are conversion arguments. Conversion arguments are a dictionary that get passed to the last converter in the chain. This allows things like transcoding videos or resizing photos according to parameters configured in the DataProvider.

Conversion arguments are returned to from the dataprovider by implementing the get_input_conversion_args() function.

The Synchronization Process

Synchronization occurs on a per conduit basis. This means that all items within a conduit are attempted to be synchronized. If there are multiple DataSinks then only oneway synchronization is supported. If there is a single DataSource and a single DataSink within the conduit then the user may select a oneway syncronization or a twoway syncronization.

Getting Whats Changed

The synchronization process begins with getting the LUIDs of all data that has been added, modified, or deleted from the dataprovider since the last sync.

This can be accomplished two ways. The common case is that the dataprovider cannot detect if the data has been added, modified or deleted, without comparing it to its last known state. If this is the case, the dataprovider need only implement get_all() which returns the LUIDs of all data in the dataprovider. This result is proxied by the DeltaProvider object which goes sequentially get()s each piece of data and sorts it into added,modified or deleted lists, depending on each data's Rid.

The uncommon case allows the dataprovider to return the LUIDs of data which has been added, modified or deleted without having to get each piece of data first. This case is currently unused within conduit, however this would be the most efficient way to implement synchronization with a remote webservice, if that webservce was able to record all data modifications since the last completed sync.

The following pseudo code represents this process

   1     def _get_changes(self, source, sink):
   2         try:
   3             added, modified, deleted = source.module.get_changes()
   4         except NotImplementedError:
   5             delta = DeltaProvider.DeltaProvider(source, sink)
   6             added, modified, deleted = delta.get_changes()
   7         return added, modified, deleted

One-way Synchronization

In this situation the DataSource on the left is referred to as the source, and the DataSink on the right is referred to as the sink although in actuality either or both of them could be DataProviderTwoWay.

The following pseudo code represents the steps in one way sync

   1     def one_way_sync(self, source, sink):
   2         #get all the data
   3         added, modified, deleted = self._get_changes(source, sink)
   4 
   5         #handle deleted data
   6         for d in deleted:
   7             self._apply_deleted_policy(source, d, sink, matchingUID)
   8 
   9         #one way sync treats added and modified the same. Both get transferred
  10         items = added + modified
  11         for i in items:
  12             #transfer the data
  13             data = self._get_data(source, sink, i)
  14             data = self._convert_data(source, sink, data)
  15             self._put_data(source, sink, data)

Two-way Synchronization

In two way sync the naming of one dataprovider the source, and one dataprovider the sink is just for conventions sake. Their is no difference as far as the logic is concerned.

Two-way sync logic is generally similar to one way sync, however we must explicitly check for the following special cases, which are all associated with checking for unexpected situations (conflicts);

The same data has been modified in both the source and the sink --> Conflict
The same data has been modified and deleted in both the source and the sink --> Conflict
The same data has been deleted in both the source and the sink --> OK

The following pseudo code represents the basic steps in two way sync

   1     def two_way_sync(self, source, sink):
   2         def list_in_list(list1, list2):
   3             found = []
   4             for i in list1:
   5                 if i in list2:
   6                     found.append(i)
   7             return found
   8             
   9         sourceAdded, sourceModified, sourceDeleted = self._get_changes(source, sink)
  10         sinkAdded, sinkModified, sinkDeleted = self._get_changes(sink, source)
  11 
  12         #added data can be put right away
  13         toput += [(source, i, sink) for i in sourceAdded]
  14         toput += [(sink, i, source) for i in sinkAdded]
  15 
  16         #as can deleted data
  17         todelete += [(source, i, sink) for i in sourceDeleted]
  18         todelete += [(sink, i, source) for i in sinkDeleted]
  19 
  20         #now check the special cases
  21         bothModified = list_in_list(sourceModified,sinkModified)
  22         bothDeleted = list_in_list(sourceDeleted, sinkDeleted)
  23         modifiedAndDeleted = list_in_list(sourceModified,sinkDeleted)
  24      
  25         #PHASE TWO: TRANSFER DATA
  26         #deleted data
  27         todelete = todelete - bothDeleted
  28         for sourcedp, dataUID, sinkdp in todelete:
  29             self._apply_deleted_policy(sourcedp, dataUID, sinkdp, matchingUID)
  30 
  31         #added data
  32         for sourcedp, dataUID, sinkdp in toput:
  33             data = self._get_data(sourcedp, sinkdp, dataUID)
  34             data = self._convert_data(sourcedp, sinkdp, data)
  35             self._put_data(sourcedp, sinkdp, data, dataRid)
  36 
  37         #modified, or special case data
  38         tocompare = bothModified+modifiedAndDeleted
  39         for dp1, data1UID, dp2, data2UID in tocompare:
  40             data1 = self._get_data(dp1, dp2, data1UID)
  41             data2 = self._get_data(dp2, dp1, data2UID)
  42             data1 = self._convert_data(dp1, dp2, data1)
  43             comparison = data1.compare(data2)
  44             self._apply_conflict_policy(dp2, dp1, comparison, data2, data1)

Conflicts

Some conflicts are easy to detect, such as when the same data has been modified in two corresponding dataproviders. Other cases are harder to detect, and require direct intervention from the dataprovider. Conduit accomodates both cases.

For example, on a fresh Conduit installation, with no knowledge of previous invocations. If the user attempts a file synchronization to a location where a file with the same name exists, then this is somthing that Conduit itself cannot know. The dataprovider however, has all the knowledge of the location that the file will be put, and before overwriting it should check that it is not newer than the file being put. If it is not equal or newer, then the datasink should emit a SynchronizeConflitError.

In both cases, upon detection of a conflict, the SyncManager first applies user policy (i.e. skip on conflict, ask user, etc). If user policy says to ask the user then the conflict is furthur passed onto the ConflictResolution system. If the user is running conduit from the GUI then the conflict resolution widget is shown. If the appropriate datatype has implemented get_snippet() then the widget shows a small text snippet to the user.

If the datatype had previously specified an open_URI then the user is able to inspect the data at this uri using the appropriate desktop means (e.g. gnome-open on GNOME).

Once the user has resolved the conflict by deciding which of the two pieces of data is newer, the conflict is resolved by calling the appropriate put() function with overwrite set to True.

It should also be noted that depending on the users preference, deletions can be treated as conflicts. For example, if the user accidentally deltes data from a folder, and then attempts to sync it, conduit will notice the deletion, and not go ahead and deleted the corresponding copy.

With reference to the one-way and two-way synchronization code above, this logic is implemented in the _apply_conflict_policy() and _apply_deleted_policy() functions.

Using Conduit from Your Application

Conduit provides a DBus interface that other applications may use to synchronize their data. The ability for external applications to create new synchronisations using conduit requires either of two things;

Conduit be aware of the external allpication and provider a dataprovider to suit OR
The external application using conduits built in datatypes in such a way that said datatype can be reconstructed using a UID.

This is best considered with an example

Application FOO wants to export / sync a file with Amazon S3 storage service. Because a file can be globally (at least within the logged in users session) identified by a UID (= URI in the file case) all the external application needs to do is set up a File <--> Amazon S3 conduit, and charge the File dataprovider with the list of URIs wanting to be exported.
Application BAR wants to synchronize its Notes with Tomboy. Because it stores its notes in an internal database, it must implement its own dataprovider (BarDataSource) and datatype (BarNoteDataType) that is aware of the meaning of BARs UID scheme. This is possible because Aplicatoin BAR and BarDataSource have the same understnading of what a Bar UID means. For example BarDataSource may use a DB identifier to represent a note. Consider the following squence of events;
1. Application BAR wants to synchronize BarDataSource with Tomboy
2. BAR passes a list of UIDs to conduit over DBus.
3. Conduit then populates BarDataSource with these UIDS. Presumably BarDataSource creates a bunch of BarNoteDataTypes for each of these UIDS and stores these internally.
4. BarDataSource then returns these to conduit via get().
5. The power of the synchronization framework is that id BAR also wants to be able to export notes to an iPod or Backpackit.com then all BarDataSource needs to do is provide a means to convert from BarNoteDataType to the standard Note datatype.