Zeitgeist 0.2 DB Design

This page contains a the documentation for the database used in the Zeitgeist engine version 0.2.

Design Principles

  • Stick close to the data models of Xesam and Nepomuk to enable close integration with them in the future (ease development of a Tracker backend etc.)
  • Be extensible. Developers should be able to be free and innovative in their use of the engine. The underlying DB should be able to support this, not lock developers into some specific mindset. This point of course relies heavily on the actual API that is exposed

  • Support everything found in the recent DB draft, and a lot more!

Relation Tables

uri

We store the uri/id map in a separate table (ie. not in the item table, because we might not always have data associated to a uri, applications should feel free to refer purely virtual objects (or stuff that has been deleted)

value VARCHAR
id    INT

source

Map of source categories. For a description of what a source is see the explanation below.

value VARCHAR
id    INT

content

For a description of what a content is see the explanation below.

value VARCHAR
id    INT

Item Tables

item

Base class for everything in the data model

id          INT       # uri.id
content_id  INT       # content.id
source_id   INT       # source.id
origin      VARCHAR   # url the item can be said to originate from
text        VARCHAR   # the title or the name of the item (first thing the users sees)
mimetype    VARCHAR   #
icon        VARCHAR   # 
payload     BLOB      # Free-for-use array of raw bytes 
  • item.id Integer id mapping to the item's URI in uri.id

  • item.content_id Integer id mapping to the id of the content type in content.id. The content type is a formal namespaced URI designating the conceptual content of the item, as perceived by the user. If the abstract interpretation of an item is "this is a document", then the content type would be http://freedesktop.org/standards/xesam/1.0/core#Document

  • item.source_id Integer id mapping to the id of the source type in source.id. The source type is a formal namespaced URI designating the abstract origin or location of the item, as perceived by the user. This could be "this is from my web history" in which case the source type would be http://gnome.org/zeitgeist/schema/1.0/core#WebHistory

  • item.origin A URL pointing to the actual place the item came from. If the item is from my web history when i watched http://www.youtube.com/watch?v=_ZSbC09qgLI then the origin could be http://www.youtube.com. It is up to the individual data providers to apply sensible origins to their items

  • item.text Free form text field with usage depending on the content type of the item. Tags will use item.text for their labels and Notes/Comments will use item.text for their body text.

  • item.mimetype The mimetype of the item

  • item.icon Zeitgeist can guess a relevant icon to use for an item from the metadata but sometimes it is useful to override this behaviour. This column is used for overriding the default icon chosen by Zeitgeist. It may contain either a full URL to an icon file or simply a string designating an icon name from the theme

  • item.payload Free form binary content that can be controlled by applications. Zeitgeist will never tamper with data in this column, however, one should be warned that other apps might do so.

annotation

An annotation is a subtype of item, in an object oriented mindset think that Annotation extends Item. This way you can annotate your annotations, and add new annotation types at whim

id         INT       # item.id
subject_id INT       # uri.id

event

An event is a subtype of item and inherits its content and source types from the data table. In an object oriented mindset think that Event extends Item.

id         INT       # item.id - the id of the event itself
subject_id INT       # uri.id - the subject of the event, eg. the file being changed etc.
start      INT       # timestamp
end        INT       # timestamp
app        INT       # app.id

app

An application is a subtype of item

id      INT       # item.id
info    VARCHAR   # Uri of .desktop file

Content, Source, and Mimetype Explained

The content category of a data object refers to the abstract way a user perceives the item, so this could be "a document", "an image" etc. The source of an item refers to where the item originates "a file", "online", etc. Lastly the mimetype represents the format physical format of the binary datastream, fx this is a jpeg 2000 image, this is zip file. You can read more about these concepts on "http://xesam.org/main/XesamOntology100"

For example, an image from my digital camera would have the following characteristics:

  • Content: Image
  • Source: File
  • Mimetype: image/jpeg

The actual types are specified as namespaced URIs, so that instead of just writing "Tag" we write "http://freedesktop.org/standards/xesam/1.0/core#Tag".

Events

Event Content- and Source Types

The content type of an event specifies "what happened" and the source type "what triggered it".

Pre-defined Event source types:

  • http://gnome.org/zeitgeist/schema/1.0/core#UserActivity

  • http://gnome.org/zeitgeist/schema/1.0/core#UserNotification

Pre-defined Event content types:

  • http://gnome.org/zeitgeist/schema/1.0/core#CreateEvent

  • http://gnome.org/zeitgeist/schema/1.0/core#ModifyEvent

  • http://gnome.org/zeitgeist/schema/1.0/core#VisitEvent

  • http://gnome.org/zeitgeist/schema/1.0/core#LinkEvent

  • http://gnome.org/zeitgeist/schema/1.0/core#ReceiveEvent

  • http://gnome.org/zeitgeist/schema/1.0/core#WarnEvent

  • http://gnome.org/zeitgeist/schema/1.0/core#ErrorEvent

Annotations

The annotation system provides the underpinnings for stuff such as tags, bookmarks, and user comments. The system is designed such that new annotation types fit right into the model.

To add an annotation to an item, create a new annotation (with its own unique item.id) and set the annotation.subject_id to the id of the annotated item.

Bookmarks

Set item.content to http://freedesktop.org/standards/xesam/1.0/core#Bookmark and disregard item.text. Apps can now add a little star next to the items that have a bookmark annotation.

Tags

Set item.content to http://freedesktop.org/standards/xesam/1.0/core#Tag and item.text to the user defined tag label

Comments

Set item.content to http://gnome.org/zeitgeist/schema/1.0/core#Comment" and set item.text to the user defined comment string

Annotation Content- and Source Types

Pre-defined Annotation content types:

  • http://freedesktop.org/standards/xesam/1.0/core#Tag

  • http://freedesktop.org/standards/xesam/1.0/core#Bookmark

  • http://gnome.org/zeitgeist/schema/1.0/core#Comment"

(!) See http://xesam.org/main/XesamOntology100#xesamTag for more information about the Xesam content and source types

Source types for Annotations:

  • http://gnome.org/zeitgeist/schema/1.0/core#UserActivity - user defined tags

Icons

Icons should be generated and cached according to the Freedesktop.org thumbnail specification. (Is there already an implementation of http://live.gnome.org/ThumbnailerSpec which we can use?)

Implementation

Currently based on the Storm ORM, written in Python. When the Zeitgeist engine API has been frozen a C-based implementation will be started (unless Tracker is clearly the way to go at that point (even assuming Tracker has gone gold, a simple C-based backend is still useful)).

Example Database

uri

id

value

1

file://home/rainct/Images/kitty.png

2

zeitgeist://tags/images

3

http://www.youtube.com/watch?v=_ZSbC09qgLI

4

file:///home/rainct/readme.txt

5

zeitgeist://events/UserActivity/1244984333#2

6

file:///usr/share/aplications/eog.desktop

source

id

value

1

http://gnome.org/zeitgeist/schema/1.0/core#UserActivity

2

http://gnome.org/zeitgeist/schema/1.0/core#WebHistory

3

http://freedesktop.org/standards/xesam/1.0/core#File

4

http://gnome.org/zeitgeist/schema/1.0/core#SystemResource

content

id

value

1

http://freedesktop.org/standards/xesam/1.0/core#Image

2

http://freedesktop.org/standards/xesam/1.0/core#Tag

3

http://freedesktop.org/standards/xesam/1.0/core#TextDocument

4

http://gnome.org/zeitgeist/schema/1.0/core#CreateEvent

5

http://gnome.org/zeitgeist/schema/1.0/core#Application

item

id (uri.id)

content_id

source_id

origin

text

mimetype

icon

payload

1

1

3

file://home/rainct/Images

NULL

image/png

NULL

NULL

2

2

1

NULL

kitten

NULL

NULL

NULL

3

3 (*)

2

http://youtube.com

NULL

text/html

NULL

NULL

4

3

3

NULL

NULL

text/plain

NULL

NULL

5

4

1

NULL

NULL

NULL

NULL

NULL

6

5 (**)

4

NULL

NULL

application/x-desktop

NULL

NULL

Here the first row points at the PNG file for uri.id = 1. The second row represents a Tag as evident from the content_id = 2 which points at a content type of http://freedesktop.org/standards/xesam/1.0/core#Tag in the content table. The third row is the video on YouTube and the fourth row is a readme text file.

  • I think the both cells marked with (*) and (**) above are wrong: -- MarkusKorn 2009-06-19 12:42:03

    • (*) the content with content_id == 2 is a tag, which is not the cases here, maybe this should be 3
      • I changed it to be a content_id=3, ie. a document - which is what a html is semantically speaking -- MikkelKamstrup 2009-06-19 13:32:00

    • (**) this does not match the actual implementation in lp:zeitgeist. There Application is a source and not a content

      • Yeah, the approach I think makes sense is to set the content as "Application" and the source as "SystemResource". Other content types for system resources could be man pages and help pages - basically "stuff" installed in system-scope. This also how I modelled it in datamodel.py -- MikkelKamstrup 2009-06-19 13:32:00

annotation

item_id

subject_id

2

1

2

3

Here we see that item 2 (the Tag from the item table above) points at the subject with uri.id = 1 ie. our PNG image file and our kitten video from YouTube with uri.id = 3. Meaning that we have tagged the file /home/rainct/Images/zeitgeist.png and the kitten YouTube videowith the tag "kitten".

event

item_id

subject_id

start

end

app_id

5

2

1244984333

NULL

6

We created the "kitten" tag at 1244984333 o'clock with EOG.

app

id

info

6

/usr/share/aplications/eog.desktop

Suggestions

  • We should consider creating category annotations. e.g: Youtube videos are of category. Web, Videos, Forums -- SeifLotfy

    • I've already added a content category for Videos to datamodel.py. What you are suggesting is basically that we make the content/source relations many-to-many instead on the current one-to-many. Eg. as it is now each item has exactly one content category, and a content category can have many members, but what you want is basically just to say that an item can have several content categories.
    • I am not totally against the idea, but it does come at a cost - I am particularly worried about performance because we are introducing a heck of a lot of many-to-many relations and I don't know how well sqlite scales in that direction. Besides, the whole model is simpler if items only has exactly one content+source and that counts for something. No matter how hard we try we can only approximate reality. -- MikkelKamstrup 2009-06-22 06:35:05

Projects/Zeitgeist/Document/DatabaseDesign0.2 (last edited 2013-12-03 14:54:41 by WilliamJonMcCann)