WORK IN PROGRESS

Contents

Application developers manual

1. Application developers manual

This is a manual intended for application developers interested in taking advantage of tracker, or in general, of a generic storage on desktop. In the following section, It will explain what the meaning and advantages of a "generic storage" are. This solution is based on ontologies, so that is the topic of sections 2 and 3. All these contents are pretty storage-independent and should be valid to any back-end.

The tracker specific features, and how they impact the architecture and design of client applications is covered in sections 4 and 5. The last 6th section includes the practicalities to install and work with the latest tracker.

1.1. Authors

Ivan Frade <ivan.frade@nokia.com>

Feel free to correct/extend/improve the content of this page.

1.2. Introduction

In the current desktop, the information is organized on application basis. A basic set of standards (e.g. freedesktop directory convention) warranty certain order on desktop, and few other specifications (e.g. thumbnails or icons) provide a very basic cross-application data, usually to solve the desktop integration issues.

In any case, each program saves its data in a (more or less) custom format in a (more or less) custom location. The user-data remains in application-silos. This makes specially difficult (if not impossible) to combine data between applications, as we need to do so in a desktop-wide search application, mash-up applications, to show recent used files or to provide a desktop-wide tagging solution.

The solution we propose to share data between applications, consists in a central storage where different sources would push data following a well-known "schema" (called ontology), and different applications would ask for that information following the same schema.

There will be applications that set and retrieve their data only to/from this storage (e.g. a notes application). Other applications (e.g. a RSS Reader) would just ask information because other independent providers (e.g. a RSS daemon) are pushing the data when it is needed (taking care of synchronizations and updates).

1.3. Ontologies

1.3.1. What is an Ontology: Concepts, properties and instances

An ontology is a model of a domain. It is an abstraction to represent some aspect of the world. This model is built using "Concepts" (aka. Classes or Categories) and "Properties". A Concept is an entity (real or abstract), and a property links two concepts or a concept with a basic type (such as string or integer). Every property must have a domain and range: which concepts can have this property, and what concepts/types the property can have as value.

Once the ontology is defined, it will be then populated with instances (concrete data). For the readers with some DDBB background, the ontology will sound like a DB schema, and the instances like the data. Note that an ontology is a declarative description: it is not enforcing the type of the objects, and it is not mandatory for an instance to define a value for all its properties. Although it is possible to set some restrictions in the ontology (like "an instance of A can never be an instance of B at the same time"), it is out of the scope of this tutorial.

Let's see an easy example: a model for RSS feeds. Here we can identify there two concepts: the Feed Channel (the URI we usually subscribe) and the Feed Message (the entries in that RSS). We need also a property to link them: has entry (domain FeedChannel, range FeedMessage). The final result can be represented like this: (no formal notation)

We can also add a couple of properties with basic types as range, like title and creationDate for the entries:

Finally, we add also an instance. It is a blog post titled "GuPnp 0.12.8 released", posted in the url http://zee-nix.blogspot.com/2009/06/gupnp-0128-released.html, that we got from the channel "planet maemo", identifyed by the uri http://maemo.org/news/planet-maemo:

1.3.1.1. Inheritance

In the ontology we can define inheritance relations between concepts and between properties, but with different restrictions. In general, a subconcept/subproperty is a more refined description than the superconcept/superproperty.

Concepts: We can define a concept as a "subconcept" of others (one or more), and it will have all its properties plus the properties inherited from the superconcepts. The subconcept is usually something more concrete. E.G. Audio and Video are two subconcepts of the concept Media.

We can extend our example to offer an abstraction of IM Messaging and Feeds. We add a superconcept CommunicationChannel, with two subconcepts, FeedChannel (an RSS channel or podcast) and IMChannel (a window in your favourite IM program). We add a similar structure in the Message side: a Message generic concept, with two more specific concepts FeedMessage (an entry in a blog or podcast) and IMMessage (one line sent in a Instant messaging program). The ontology looks like this:

In practice, superconcepts are useful to group similar things in the queries. E.G. We can ask for all instances of the concept FeedMessage (and we get all feed messages), or we can ask for all instances of Message. In this second case, we get all instances of Message and its subconcepts: IM messages and feeds (and emails, because Emails is another subclass of Message though not included in the example). So, we could show in the same view all those instances sorted with a common criteria.

Properties can also have inheritance but with few restrictions: There can be only one superproperty, and the property can only refine (restrict) the range or domain of the superproperty. The range/domain of a property can only be the same concept (or subconcepts of it) of the superproperty range/domain. E.G. If in the previous example we want to add a subproperty of communicationChannel (domain CommunicationChannel, range Message), the subproperty must have CommunicationChannel or any of its subclasses as domain, and Message or any of its subclasses as range.

Here is an example where we define a generic superclass InformationElement, and two subclasses Document and Media. The superclass has a property usageCount. We use the inheritance, to refine the meaning of that property: we create a subproperty viewCount for Document and playCount for the Media class:

Usually we use subproperties to provide a more accurate meaning of the property. Subproperties "override" the superproperties. I.E. setting a playCount to 3 for a song, means that the usageCounter of that song is also 3. In the queries, asking for a superproperty includes also the subproperties. I.E. if we query all things with usageCount greater than 3 we will get all instances with usageCount greater than 3, all Media with playCount greater than 3 and all Documents with viewCount greater than 3.

With all these ingredients, we can write an ontology, but before starting adding real data into it, we must talk about URIs and naming convention.

1.3.2. Serializing an ontology (uris, triplets, RDF)

Note: This section is a brief and hopefully useful resume of the first section of the RDF-Prime document (statements about resources).. That document is very well written and full of good examples, so it is strongly recommended to read that first section there (statements, RDF Model, ...). The following sections of that RDF-Prime document use XML as serialization language, while in tracker we chose Turtle as serialization format; so they are not really helpful for our purposes.

First of all, every element in the ontology is identified by a URI. Every concept, property and instance should have a unique name following the URI specification. Usually each ontology has its own namespace (base url) and then append the concept name to it. E.G. For our previous examples, we could use the namespace http://www.semanticdesktop.org/ontologies/nmo/, and then the real name of Message is http://www.semanticdesktop.org/ontologies/nmo/#Message.

Nothing prevents us from having more than one concept with the same name, as long as the namespace is different. Although it is usually a good practice to use real web addresses as namespaces, it is not mandatory (the parsers don't access any external address to validate the ontologies).

Now, it comes to the subject of triplets. An ontology is a directed graph, and a graph can be decomposed into triplets, with a node as first component (subject), the arcs starting in that node as second component (predicate) and the destination of the property (a basic type or other node) as third component (object). This idea is the core of the RDF (Resource Description Framework). We could say that we are describing a Resource (the subject) using properties and values.

Here is an example of serialization into triplets, using the instances of a previous example and complete URIs for the properties:

# These are just triplets. This is NOT turtle or any defined format
#
# Subject                             Property/predicate                Object
-----------------------------------   ------------------                -------------------------------------
<http://maemo.org/news/planet-maemo/>   <http://a.org/hasEntry>             <http://zee-nix.b.../...-released.html>

<http://zee-nix.b.../...-released.html> <http://a.org/creationDate>         "2009-06-03T05:17:00"

<http://zee-nix.b...-released.html>     <http://a.org/title>                "GUPnp 0.12.8 release"

Using triplets we can define also the classes and properties of the ontology (using few pre-defined RDF classes like "Class" and "Property" and properties as "type"). Here is the serialization of our first example (using "http://a.org/" as namespace of the concepts).

# FeedChannel is a Concept (Class in RDF)
#   Replacing "http://www.w3.org/2000/01/rdf-schema#" with "rdfs:"
<http://a.org/FeedChannel>      rdfs:type     <http://www.w3.org/2000/01/rdf-schema#Class>

# FeedMessage is a Concept (Class in RDF)
<http://a.org/FeedMessage>      rdfs:type     <http://www.w3.org/2000/01/rdf-schema#Class>

# hasEntry is a Property, with FeedChannel as domain, and FeedMessage as range
<http://a.org/hasEntry>         rdfs:type     <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property>
<http://a.org/hasEntry>         rdfs:domain   <http://a.org/FeedChannel>
<http://a.org/hasEntry>         rdfs:range    <http://a.org/FeedMessage>

And we can link instances with classes:

# Planet maemo (url) is a FeedChannel.
#   Replacing "http://www.w3.org/2000/01/rdf-schema#" with "rdfs:"
<http://maemo.org/news/planet-maemo/>  rdfs:type <http://a.org/FeedChannel>

1.3.3. Turtle format

The ontology is a model that can be represented as triplets. Now we need a format to write those triplets. The official W3C format for RDF is RDF/XML, but there are other alternatives (like trig or notation3). In tracker we chose a format called turtle because 1) it is easier to read 2) it is easier and faster to parse (no need to load the whole document in memory or create a DOM tree) 3) It is appendable.

It is a simple text file, with a syntax near to our previous examples: just one triplet per line, plus some syntactic sugar to make the things clearer. The http://www.w3.org/TeamSubmission/turtle/ is short and simple, so we simply repeat here the most common things you need to know to understand a turtle file.

URIS: Full URIs enclosed in '< >' or define a prefix and use prefix:element without '< >'

 # NOTE this is not a valid turtle file
 
 # This is a uri
 <file:///home/ivan/a.mp3> 
 
 # This is the same uri
 @PREFIX example: <file:///home/ivan> .
 example:a.mp3

One triplet per line, ending the line with a '.' (dot).

 <file:///home/ivan/a.mp3> a nmm:MusicPiece .
 <file:///home/ivan/a.mp3> nmm:artistName "Test song 1" .
 <file:///home/ivan/a.mp3> nmm:length 120 .

More than one triplet with the same subject, can be simplified. ';' at the end of the line means that the following line shares the same subject.

 # This is the same information as in the previous example
 <file:///home/ivan/a.mp3> a nmm:MusicPiece ;
                           nmm:artistName "Test song 1" ;
                           nmm:length 120 .

More than one triplet with the same subject and the same predicate, can be simplified. ',' at the end of the line means that the following line shares the same subject and predicate.

 # This is the same information as in the previous example
 <file:///home/ivan/a.mp3> a nmm:MusicPiece, nfo:FileDataObject ;
                           nmm:artistName "Test song 1" ;
                           nmm:length 120 .

There is one special predicate: a to indicate that a resource is an instance of a class

 # this URI (an mp3 file in the local file system) is an instance of nmm:MusicPiece
 <file:///home/ivan/a.mp3> a nmm:MusicPiece .

And that's it! Check the tracker ontology files, written in turtle, as an example of real data in this format.

1.4. Related presentations

Why (and how) tracker 0.7: Tracker's presentation in Gran Canaria Desktop Summit.