Zeitgeist 0.2 DB Design
Contents
This page contains a the documentation for the database used in the Zeitgeist engine version 0.2.
Design Principles
- Stick close to the data models of Xesam and Nepomuk to enable close integration with them in the future (ease development of a Tracker backend etc.)
Be extensible. Developers should be able to be free and innovative in their use of the engine. The underlying DB should be able to support this, not lock developers into some specific mindset. This point of course relies heavily on the actual API that is exposed
Support everything found in the recent DB draft, and a lot more!
Relation Tables
uri
We store the uri/id map in a separate table (ie. not in the item table, because we might not always have data associated to a uri, applications should feel free to refer purely virtual objects (or stuff that has been deleted)
value VARCHAR id INT
source
Map of source categories. For a description of what a source is see the explanation below.
value VARCHAR id INT
content
For a description of what a content is see the explanation below.
value VARCHAR id INT
Item Tables
item
Base class for everything in the data model
id INT # uri.id content_id INT # content.id source_id INT # source.id origin VARCHAR # url the item can be said to originate from text VARCHAR # the title or the name of the item (first thing the users sees) mimetype VARCHAR # icon VARCHAR # payload BLOB # Free-for-use array of raw bytes
item.id Integer id mapping to the item's URI in uri.id
item.content_id Integer id mapping to the id of the content type in content.id. The content type is a formal namespaced URI designating the conceptual content of the item, as perceived by the user. If the abstract interpretation of an item is "this is a document", then the content type would be http://freedesktop.org/standards/xesam/1.0/core#Document
item.source_id Integer id mapping to the id of the source type in source.id. The source type is a formal namespaced URI designating the abstract origin or location of the item, as perceived by the user. This could be "this is from my web history" in which case the source type would be http://gnome.org/zeitgeist/schema/1.0/core#WebHistory
item.origin A URL pointing to the actual place the item came from. If the item is from my web history when i watched http://www.youtube.com/watch?v=_ZSbC09qgLI then the origin could be http://www.youtube.com. It is up to the individual data providers to apply sensible origins to their items
item.text Free form text field with usage depending on the content type of the item. Tags will use item.text for their labels and Notes/Comments will use item.text for their body text.
item.mimetype The mimetype of the item
item.icon Zeitgeist can guess a relevant icon to use for an item from the metadata but sometimes it is useful to override this behaviour. This column is used for overriding the default icon chosen by Zeitgeist. It may contain either a full URL to an icon file or simply a string designating an icon name from the theme
item.payload Free form binary content that can be controlled by applications. Zeitgeist will never tamper with data in this column, however, one should be warned that other apps might do so.
annotation
An annotation is a subtype of item, in an object oriented mindset think that Annotation extends Item. This way you can annotate your annotations, and add new annotation types at whim
id INT # item.id subject_id INT # uri.id
event
An event is a subtype of item and inherits its content and source types from the data table. In an object oriented mindset think that Event extends Item.
id INT # item.id - the id of the event itself subject_id INT # uri.id - the subject of the event, eg. the file being changed etc. start INT # timestamp end INT # timestamp app INT # app.id
app
An application is a subtype of item
id INT # item.id info VARCHAR # Uri of .desktop file
Content, Source, and Mimetype Explained
The content category of a data object refers to the abstract way a user perceives the item, so this could be "a document", "an image" etc. The source of an item refers to where the item originates "a file", "online", etc. Lastly the mimetype represents the format physical format of the binary datastream, fx this is a jpeg 2000 image, this is zip file. You can read more about these concepts on "http://xesam.org/main/XesamOntology100"
For example, an image from my digital camera would have the following characteristics:
- Content: Image
- Source: File
- Mimetype: image/jpeg
The actual types are specified as namespaced URIs, so that instead of just writing "Tag" we write "http://freedesktop.org/standards/xesam/1.0/core#Tag".
Events
Event Content- and Source Types
The content type of an event specifies "what happened" and the source type "what triggered it".
Pre-defined Event source types:
http://gnome.org/zeitgeist/schema/1.0/core#UserActivity
http://gnome.org/zeitgeist/schema/1.0/core#UserNotification
Pre-defined Event content types:
http://gnome.org/zeitgeist/schema/1.0/core#CreateEvent
http://gnome.org/zeitgeist/schema/1.0/core#ModifyEvent
http://gnome.org/zeitgeist/schema/1.0/core#VisitEvent
http://gnome.org/zeitgeist/schema/1.0/core#LinkEvent
http://gnome.org/zeitgeist/schema/1.0/core#ReceiveEvent
http://gnome.org/zeitgeist/schema/1.0/core#WarnEvent
http://gnome.org/zeitgeist/schema/1.0/core#ErrorEvent
Annotations
The annotation system provides the underpinnings for stuff such as tags, bookmarks, and user comments. The system is designed such that new annotation types fit right into the model.
To add an annotation to an item, create a new annotation (with its own unique item.id) and set the annotation.subject_id to the id of the annotated item.
Bookmarks
Set item.content to http://freedesktop.org/standards/xesam/1.0/core#Bookmark and disregard item.text. Apps can now add a little star next to the items that have a bookmark annotation.
Tags
Set item.content to http://freedesktop.org/standards/xesam/1.0/core#Tag and item.text to the user defined tag label
Comments
Set item.content to http://gnome.org/zeitgeist/schema/1.0/core#Comment" and set item.text to the user defined comment string
Annotation Content- and Source Types
Pre-defined Annotation content types:
http://freedesktop.org/standards/xesam/1.0/core#Tag
http://freedesktop.org/standards/xesam/1.0/core#Bookmark
http://gnome.org/zeitgeist/schema/1.0/core#Comment"
See http://xesam.org/main/XesamOntology100#xesamTag for more information about the Xesam content and source types Source types for Annotations: http://gnome.org/zeitgeist/schema/1.0/core#UserActivity - user defined tags
Icons should be generated and cached according to the Freedesktop.org thumbnail specification. (Is there already an implementation of http://live.gnome.org/ThumbnailerSpec which we can use?)
Currently based on the Storm ORM, written in Python. When the Zeitgeist engine API has been frozen a C-based implementation will be started (unless Tracker is clearly the way to go at that point (even assuming Tracker has gone gold, a simple C-based backend is still useful)).
id value 1 file://home/rainct/Images/kitty.png 2 zeitgeist://tags/images 3 http://www.youtube.com/watch?v=_ZSbC09qgLI 4 file:///home/rainct/readme.txt 5 zeitgeist://events/UserActivity/1244984333#2 6 file:///usr/share/aplications/eog.desktop
id value 1 http://gnome.org/zeitgeist/schema/1.0/core#UserActivity 2 http://gnome.org/zeitgeist/schema/1.0/core#WebHistory 3 http://freedesktop.org/standards/xesam/1.0/core#File 4 http://gnome.org/zeitgeist/schema/1.0/core#SystemResource
id value 1 http://freedesktop.org/standards/xesam/1.0/core#Image 2 http://freedesktop.org/standards/xesam/1.0/core#Tag 3 http://freedesktop.org/standards/xesam/1.0/core#TextDocument 4 http://gnome.org/zeitgeist/schema/1.0/core#CreateEvent 5 http://gnome.org/zeitgeist/schema/1.0/core#Application
id (uri.id) content_id source_id origin text mimetype icon payload 1 1 3 file://home/rainct/Images NULL image/png NULL NULL 2 2 1 NULL kitten NULL NULL NULL 3 3 (*) 2 http://youtube.com NULL text/html NULL NULL 4 3 3 NULL NULL text/plain NULL NULL 5 4 1 NULL NULL NULL NULL NULL 6 5 (**) 4 NULL NULL application/x-desktop NULL NULL Here the first row points at the PNG file for uri.id = 1. The second row represents a Tag as evident from the content_id = 2 which points at a content type of http://freedesktop.org/standards/xesam/1.0/core#Tag in the content table. The third row is the video on YouTube and the fourth row is a readme text file. I think the both cells marked with (*) and (**) above are wrong: -- MarkusKorn 2009-06-19 12:42:03 I changed it to be a content_id=3, ie. a document - which is what a html is semantically speaking -- MikkelKamstrup 2009-06-19 13:32:00 (**) this does not match the actual implementation in lp:zeitgeist. There Application is a source and not a content Yeah, the approach I think makes sense is to set the content as "Application" and the source as "SystemResource". Other content types for system resources could be man pages and help pages - basically "stuff" installed in system-scope. This also how I modelled it in datamodel.py -- MikkelKamstrup 2009-06-19 13:32:00
item_id subject_id 2 1 2 3 Here we see that item 2 (the Tag from the item table above) points at the subject with uri.id = 1 ie. our PNG image file and our kitten video from YouTube with uri.id = 3. Meaning that we have tagged the file /home/rainct/Images/zeitgeist.png and the kitten YouTube videowith the tag "kitten".
item_id subject_id start end app_id 5 2 1244984333 NULL 6 We created the "kitten" tag at 1244984333 o'clock with EOG.
id info 6 /usr/share/aplications/eog.desktop
We should consider creating category annotations. e.g: Youtube videos are of category. Web, Videos, Forums -- SeifLotfy I am not totally against the idea, but it does come at a cost - I am particularly worried about performance because we are introducing a heck of a lot of many-to-many relations and I don't know how well sqlite scales in that direction. Besides, the whole model is simpler if items only has exactly one content+source and that counts for something. No matter how hard we try we can only approximate reality. -- MikkelKamstrup 2009-06-22 06:35:05 Icons
Implementation
Example Database
uri
source
content
item
annotation
event
app
Suggestions