1. Code Overview
This page briefly describes the rationale and architecture of some of the core Conduit elements. For information on writing your own DataProvider, see 'Writing a Data Provider'.
1.1. DataProviders
This section does not describe the role of all the DataProvider methods. For that information, please see 'Writing a Data Provider'.
1.1.1. MODULES dictionary
The MODULES dictionary stores the following information
MODULES = { "NameOfModuleClass" : { "type": "dataprovider" } }
Notes;
- Outer key is the name of the class
Valid inner dict properties include;
key |
Value |
type |
One of dataprovider, converter, dataprovider-factory |
1.1.2. Class properties
Most information about a dataprovider and its capabilities is communicated through class properties. This allows dataproviders properties to be inspected without the class being instantiated which results in decreased memory usage and startup time improvements.
Valid class properties include;
Property |
Value |
_name_ |
The localised display name |
_description_ |
The localised display description |
_category_ |
A DataProviderCatefory or one of the built in categories |
_module_type_ |
One of source, sink, twoway |
_in_type_ |
Datatype the dataprovider accepts. (N/A for datasources) |
_out_type_ |
Datatype the dataprovider emits. (N/A for datasinks) |
_icon_ |
the icon name (not filename) |
1.1.3. Dynamic Dataprovider Support
If you wish to implement a dynamic dataprovider (a DP that becomes available at runtime) then first create a derived DataProviderFactory and when a new dataprovider becomes available call self.emit_added(klass,initargs,category). If the dataprovider then becomes unavailable call self.emit_removed(klass,initargs,category)
For more information please see
- HALFactory
1.1.4. Dataprovider Categories
A category is just an name and an icon use to group together related dataproviders in the treeview. You are free to use the supplied categories or define your own, just as done in the iPod Dataprovider Factory.
1.2. DataType
A datatype is the basic representation of something to be synchronized. Conduit ships with a number of built in datatypes including
FileBR
A gnomevfs compatible representation of a file. Because it uses gnomevfs this supports all those backends which gnomevfs supports including; Local file, SSH, WebDav, SMB, http/ftp
NoteBR
- A basic representation of a Note. Containts a title, and contents
EmailBR
- A representation of an email. Uses python's built in email class
ContactBR
- Represents a contact. Uses vobject intenally so that it can deal with the vcard format.
EventBR
- Represents a calender event. Uses vobject internally to deal with ical format.
TextBR
- A simple wrapper around a string.
It is not expected that these datatypes will be sufficient for all purposes, and indeed this is not the design goal. Because Conduits strength is in its dynamic nature, and the ability to convert between types, it is often easier to define a more specific type more suited to a Dataprovider and then provide those conversion functions that make sense.
1.2.1. Type Inheritance
Conduit also supports, and recommends that specific types, where possible, inheritit from these generic types. It is also necessary that the type name reflects this inheritece.
For example, the Video type inherits from the File type. Its type name is file/video
Type inheritance is good for the following reasons -
Promotes good OO design as the derived Video type can utilize the functionality provided by the File type
Provides a sensible place for Conduit to check that a File instance is a Video instance.
(2) Requires some more explanation. Lets say the user has a folder of data, not just videos, but audio, text, etc. They want to put the videos onto their n800. The n800 _in_type = file/video. The Folder _out_type = file. In order to only put videos on the n800, Conduit calls the file --> file/video which, by checking the files mimetype, has the opportunity to reject the file because it is not a video type.
1.2.2. Implementable Functions
Datatypes should implement a few functions including
def compare(self, B):BR
- This should compare the datatype with B and return one of
conduit.datatypes.NEWER This means the data is newer than B
conduit.datatypes.EQUAL This means the we are equal
conduit.datatypes.OLDER This means the data is older than B
conduit.datatypes.UNKNOWN This means we were unable to determine which is newer/older
def get_hash(self):BR
- Dataproviders may wish to override this function with one which returns a small text representation of the data they hold. This snippet is not used as part of the conversion or synchhronization process but may be displayed to the user to help the compare data in a conflict situation
def get_snippet(self):BR
- Dataproviders may wish to override this function with one which returns a small text representation of the data they hold. This snippet is not used as part of the conversion or synchhronization process but may be displayed to the user to help the compare data in a conflict situation
1.2.3. Functions to Call on DataType Instances
In the base Datatype class are the following functions. Usually these funtions should not need to be overridden, but they should be called on an instance on a datatype before returning it to the user, as they are used to encode information about where the datatype instance originated.
def set_UID(LUID)BR
- LUID should represent this instance, it is a locally unique identifier. This can be whatever best makes sense, and need not be globally unique, it should only be unique on a per dataprovider basis. For example the Gmail dataprovider uses the message ID as the LUID. This is probabbly not unique to all of Gmail but is certainly unique within a logged in users account (and hence within the configured Gmail dataprovider for that user)
set_mtime(datetime)BR
- This function attaches a modification time to the instance. Not all dataproviders support the concept of modification time, so it is not required that this function be invoked on datatype instances. Sometimes datatypes may override this funtion, and apply it to the actual instance (such as the File datattype; it sets the file mtime on disk to this value)
def set_open_URI(uri)BR
- If called on a returned instance then in the case of a conflict, conduit will attempt to open this URI using the appropriate desktop wide application (e.g. gnome-open on GNOME). This means, unlike the UID, the open_URI must make sense at a desktop wide level.
These functions will typically be called in the DataProvider get() function in the following way (taken from Conduit/WritingADataProvider)
1.2.4. Type Conversion
Conduit has a rich and powerful type conversion system. If two dataproviders input and output types types differ, then conduit checks if one or more conversions exist which can transform the data into the necessary format.
As explained in the [wiki:CodeOverview#TypeInheritance Type Inheritance] section, conversion will be made from the parent to the child type, but not in the other direction.
Conduit will also take the most specific conversion, if available. For example if you provide a conversion foo/bar->baz/bob then conduit will use this conversion function instead of converting foo->baz followed by converting baz->baz/bob. The following table provides some additional examples of the sequence of conversions performed when converting between derived and non-derived types
output type |
input type |
Sequence of Conversions |
foo |
foo |
1) foo->foo |
foo |
foo/bar |
1) foo->foo/bar |
foo/bar |
foo |
1) foo->foo |
foo |
baz |
1) foo->baz |
foo |
baz/bob |
1) foo->bazBR2) baz->baz/bob |
foo/bar |
baz/bob |
1) foo->bazBR2) baz->baz/bob |
baz/bob |
baz/bob |
1) baz/bob->baz/bob |
1.2.4.1. Conversion Arguments
You will notice that the table above includes a call to a conversion function even if the input and output types are the same. This call is actually only made if there are conversion arguments. Conversion arguments are a dictionary that get passed to the last converter in the chain. This allows things like transcoding videos or resizing photos according to parameters configured in the DataProvider.
Conversion arguments are returned to from the dataprovider by implementing the get_input_conversion_args() function.
1.3. The Synchronization Process
Synchronization occurs on a per conduit basis. This means that all items within a conduit are attempted to be synchronized. If there are multiple DataSinks then only oneway synchronization is supported. If there is a single DataSource and a single DataSink within the conduit then the user may select a oneway syncronization or a twoway syncronization.
1.3.1. Getting Whats Changed
The synchronization process begins with getting the LUIDs of all data that has been added, modified, or deleted from the dataprovider since the last sync.
This can be accomplished two ways. The common case is that the dataprovider cannot detect if the data has been added, modified or deleted, without comparing it to its last known state. If this is the case, the dataprovider need only implement get_all() which returns the LUIDs of all data in the dataprovider. This result is proxied by the DeltaProvider object which goes sequentially get()s each piece of data and sorts it into added,modified or deleted lists, depending on each data's Rid.
The uncommon case allows the dataprovider to return the LUIDs of data which has been added, modified or deleted without having to get each piece of data first. This case is currently unused within conduit, however this would be the most efficient way to implement synchronization with a remote webservice, if that webservce was able to record all data modifications since the last completed sync.
The following pseudo code represents this process
1.3.2. One-way Synchronization
In this situation the DataSource on the left is referred to as the source, and the DataSink on the right is referred to as the sink although in actuality either or both of them could be DataProviderTwoWay.
The following pseudo code represents the steps in one way sync
1 def one_way_sync(self, source, sink):
2 #get all the data
3 added, modified, deleted = self._get_changes(source, sink)
4
5 #handle deleted data
6 for d in deleted:
7 self._apply_deleted_policy(source, d, sink, matchingUID)
8
9 #one way sync treats added and modified the same. Both get transferred
10 items = added + modified
11 for i in items:
12 #transfer the data
13 data = self._get_data(source, sink, i)
14 data = self._convert_data(source, sink, data)
15 self._put_data(source, sink, data)
1.3.3. Two-way Synchronization
In two way sync the naming of one dataprovider the source, and one dataprovider the sink is just for conventions sake. Their is no difference as far as the logic is concerned.
Two-way sync logic is generally similar to one way sync, however we must explicitly check for the following special cases, which are all associated with checking for unexpected situations (conflicts);
The same data has been modified in both the source and the sink --> Conflict
The same data has been modified and deleted in both the source and the sink --> Conflict
The same data has been deleted in both the source and the sink --> OK
The following pseudo code represents the basic steps in two way sync
1 def two_way_sync(self, source, sink):
2 def list_in_list(list1, list2):
3 found = []
4 for i in list1:
5 if i in list2:
6 found.append(i)
7 return found
8
9 sourceAdded, sourceModified, sourceDeleted = self._get_changes(source, sink)
10 sinkAdded, sinkModified, sinkDeleted = self._get_changes(sink, source)
11
12 #added data can be put right away
13 toput += [(source, i, sink) for i in sourceAdded]
14 toput += [(sink, i, source) for i in sinkAdded]
15
16 #as can deleted data
17 todelete += [(source, i, sink) for i in sourceDeleted]
18 todelete += [(sink, i, source) for i in sinkDeleted]
19
20 #now check the special cases
21 bothModified = list_in_list(sourceModified,sinkModified)
22 bothDeleted = list_in_list(sourceDeleted, sinkDeleted)
23 modifiedAndDeleted = list_in_list(sourceModified,sinkDeleted)
24
25 #PHASE TWO: TRANSFER DATA
26 #deleted data
27 todelete = todelete - bothDeleted
28 for sourcedp, dataUID, sinkdp in todelete:
29 self._apply_deleted_policy(sourcedp, dataUID, sinkdp, matchingUID)
30
31 #added data
32 for sourcedp, dataUID, sinkdp in toput:
33 data = self._get_data(sourcedp, sinkdp, dataUID)
34 data = self._convert_data(sourcedp, sinkdp, data)
35 self._put_data(sourcedp, sinkdp, data, dataRid)
36
37 #modified, or special case data
38 tocompare = bothModified+modifiedAndDeleted
39 for dp1, data1UID, dp2, data2UID in tocompare:
40 data1 = self._get_data(dp1, dp2, data1UID)
41 data2 = self._get_data(dp2, dp1, data2UID)
42 data1 = self._convert_data(dp1, dp2, data1)
43 comparison = data1.compare(data2)
44 self._apply_conflict_policy(dp2, dp1, comparison, data2, data1)
1.3.4. Conflicts
Some conflicts are easy to detect, such as when the same data has been modified in two corresponding dataproviders. Other cases are harder to detect, and require direct intervention from the dataprovider. Conduit accomodates both cases.
For example, on a fresh Conduit installation, with no knowledge of previous invocations. If the user attempts a file synchronization to a location where a file with the same name exists, then this is somthing that Conduit itself cannot know. The dataprovider however, has all the knowledge of the location that the file will be put, and before overwriting it should check that it is not newer than the file being put. If it is not equal or newer, then the datasink should emit a SynchronizeConflitError.
In both cases, upon detection of a conflict, the SyncManager first applies user policy (i.e. skip on conflict, ask user, etc). If user policy says to ask the user then the conflict is furthur passed onto the ConflictResolution system. If the user is running conduit from the GUI then the conflict resolution widget is shown. If the appropriate datatype has implemented get_snippet() then the widget shows a small text snippet to the user.
If the datatype had previously specified an open_URI then the user is able to inspect the data at this uri using the appropriate desktop means (e.g. gnome-open on GNOME).
Once the user has resolved the conflict by deciding which of the two pieces of data is newer, the conflict is resolved by calling the appropriate put() function with overwrite set to True.
It should also be noted that depending on the users preference, deletions can be treated as conflicts. For example, if the user accidentally deltes data from a folder, and then attempts to sync it, conduit will notice the deletion, and not go ahead and deleted the corresponding copy.
With reference to the one-way and two-way synchronization code above, this logic is implemented in the _apply_conflict_policy() and _apply_deleted_policy() functions.
1.4. Using Conduit from Your Application
Conduit provides a DBus interface that other applications may use to synchronize their data. The ability for external applications to create new synchronisations using conduit requires either of two things;
- Conduit be aware of the external allpication and provider a dataprovider to suit OR
- The external application using conduits built in datatypes in such a way that said datatype can be reconstructed using a UID.
This is best considered with an example
Application FOO wants to export / sync a file with Amazon S3 storage service. Because a file can be globally (at least within the logged in users session) identified by a UID (= URI in the file case) all the external application needs to do is set up a File <--> Amazon S3 conduit, and charge the File dataprovider with the list of URIs wanting to be exported.
Application BAR wants to synchronize its Notes with Tomboy. Because it stores its notes in an internal database, it must implement its own dataprovider (BarDataSource) and datatype (BarNoteDataType) that is aware of the meaning of BARs UID scheme. This is possible because Aplicatoin BAR and BarDataSource have the same understnading of what a Bar UID means. For example BarDataSource may use a DB identifier to represent a note. Consider the following squence of events;
Application BAR wants to synchronize BarDataSource with Tomboy
- BAR passes a list of UIDs to conduit over DBus.
Conduit then populates BarDataSource with these UIDS. Presumably BarDataSource creates a bunch of BarNoteDataTypes for each of these UIDS and stores these internally.
BarDataSource then returns these to conduit via get().
The power of the synchronization framework is that id BAR also wants to be able to export notes to an iPod or Backpackit.com then all BarDataSource needs to do is provide a means to convert from BarNoteDataType to the standard Note datatype.