This site has been retired. For up to date information, see handbook.gnome.org or gitlab.gnome.org.


[Home] [TitleIndex] [WordIndex

Zeitgeist 0.2 DB Design

This page contains a the documentation for the database used in the Zeitgeist engine version 0.2.

Design Principles

Relation Tables

uri

We store the uri/id map in a separate table (ie. not in the item table, because we might not always have data associated to a uri, applications should feel free to refer purely virtual objects (or stuff that has been deleted)

value VARCHAR
id    INT

source

Map of source categories. For a description of what a source is see the explanation below.

value VARCHAR
id    INT

content

For a description of what a content is see the explanation below.

value VARCHAR
id    INT

Item Tables

item

Base class for everything in the data model

id          INT       # uri.id
content_id  INT       # content.id
source_id   INT       # source.id
origin      VARCHAR   # url the item can be said to originate from
text        VARCHAR   # the title or the name of the item (first thing the users sees)
mimetype    VARCHAR   #
icon        VARCHAR   # 
payload     BLOB      # Free-for-use array of raw bytes 

annotation

An annotation is a subtype of item, in an object oriented mindset think that Annotation extends Item. This way you can annotate your annotations, and add new annotation types at whim

id         INT       # item.id
subject_id INT       # uri.id

event

An event is a subtype of item and inherits its content and source types from the data table. In an object oriented mindset think that Event extends Item.

id         INT       # item.id - the id of the event itself
subject_id INT       # uri.id - the subject of the event, eg. the file being changed etc.
start      INT       # timestamp
end        INT       # timestamp
app        INT       # app.id

app

An application is a subtype of item

id      INT       # item.id
info    VARCHAR   # Uri of .desktop file

Content, Source, and Mimetype Explained

The content category of a data object refers to the abstract way a user perceives the item, so this could be "a document", "an image" etc. The source of an item refers to where the item originates "a file", "online", etc. Lastly the mimetype represents the format physical format of the binary datastream, fx this is a jpeg 2000 image, this is zip file. You can read more about these concepts on "http://xesam.org/main/XesamOntology100"

For example, an image from my digital camera would have the following characteristics:

The actual types are specified as namespaced URIs, so that instead of just writing "Tag" we write "http://freedesktop.org/standards/xesam/1.0/core#Tag".

Events

Event Content- and Source Types

The content type of an event specifies "what happened" and the source type "what triggered it".

Pre-defined Event source types:

Pre-defined Event content types:

Annotations

The annotation system provides the underpinnings for stuff such as tags, bookmarks, and user comments. The system is designed such that new annotation types fit right into the model.

To add an annotation to an item, create a new annotation (with its own unique item.id) and set the annotation.subject_id to the id of the annotated item.

Bookmarks

Set item.content to http://freedesktop.org/standards/xesam/1.0/core#Bookmark and disregard item.text. Apps can now add a little star next to the items that have a bookmark annotation.

Tags

Set item.content to http://freedesktop.org/standards/xesam/1.0/core#Tag and item.text to the user defined tag label

Comments

Set item.content to http://gnome.org/zeitgeist/schema/1.0/core#Comment" and set item.text to the user defined comment string

Annotation Content- and Source Types

Pre-defined Annotation content types:

(!) See http://xesam.org/main/XesamOntology100#xesamTag for more information about the Xesam content and source types

Source types for Annotations:

Icons

Icons should be generated and cached according to the Freedesktop.org thumbnail specification. (Is there already an implementation of http://live.gnome.org/ThumbnailerSpec which we can use?)

Implementation

Currently based on the Storm ORM, written in Python. When the Zeitgeist engine API has been frozen a C-based implementation will be started (unless Tracker is clearly the way to go at that point (even assuming Tracker has gone gold, a simple C-based backend is still useful)).

Example Database

uri

id

value

1

file://home/rainct/Images/kitty.png

2

zeitgeist://tags/images

3

http://www.youtube.com/watch?v=_ZSbC09qgLI

4

file:///home/rainct/readme.txt

5

zeitgeist://events/UserActivity/1244984333#2

6

file:///usr/share/aplications/eog.desktop

source

id

value

1

http://gnome.org/zeitgeist/schema/1.0/core#UserActivity

2

http://gnome.org/zeitgeist/schema/1.0/core#WebHistory

3

http://freedesktop.org/standards/xesam/1.0/core#File

4

http://gnome.org/zeitgeist/schema/1.0/core#SystemResource

content

id

value

1

http://freedesktop.org/standards/xesam/1.0/core#Image

2

http://freedesktop.org/standards/xesam/1.0/core#Tag

3

http://freedesktop.org/standards/xesam/1.0/core#TextDocument

4

http://gnome.org/zeitgeist/schema/1.0/core#CreateEvent

5

http://gnome.org/zeitgeist/schema/1.0/core#Application

item

id (uri.id)

content_id

source_id

origin

text

mimetype

icon

payload

1

1

3

file://home/rainct/Images

NULL

image/png

NULL

NULL

2

2

1

NULL

kitten

NULL

NULL

NULL

3

3 (*)

2

http://youtube.com

NULL

text/html

NULL

NULL

4

3

3

NULL

NULL

text/plain

NULL

NULL

5

4

1

NULL

NULL

NULL

NULL

NULL

6

5 (**)

4

NULL

NULL

application/x-desktop

NULL

NULL

Here the first row points at the PNG file for uri.id = 1. The second row represents a Tag as evident from the content_id = 2 which points at a content type of http://freedesktop.org/standards/xesam/1.0/core#Tag in the content table. The third row is the video on YouTube and the fourth row is a readme text file.

  • I think the both cells marked with (*) and (**) above are wrong: -- MarkusKorn 2009-06-19 12:42:03

    • (*) the content with content_id == 2 is a tag, which is not the cases here, maybe this should be 3
      • I changed it to be a content_id=3, ie. a document - which is what a html is semantically speaking -- MikkelKamstrup 2009-06-19 13:32:00

    • (**) this does not match the actual implementation in lp:zeitgeist. There Application is a source and not a content

      • Yeah, the approach I think makes sense is to set the content as "Application" and the source as "SystemResource". Other content types for system resources could be man pages and help pages - basically "stuff" installed in system-scope. This also how I modelled it in datamodel.py -- MikkelKamstrup 2009-06-19 13:32:00

annotation

item_id

subject_id

2

1

2

3

Here we see that item 2 (the Tag from the item table above) points at the subject with uri.id = 1 ie. our PNG image file and our kitten video from YouTube with uri.id = 3. Meaning that we have tagged the file /home/rainct/Images/zeitgeist.png and the kitten YouTube videowith the tag "kitten".

event

item_id

subject_id

start

end

app_id

5

2

1244984333

NULL

6

We created the "kitten" tag at 1244984333 o'clock with EOG.

app

id

info

6

/usr/share/aplications/eog.desktop

Suggestions

  • We should consider creating category annotations. e.g: Youtube videos are of category. Web, Videos, Forums -- SeifLotfy

    • I've already added a content category for Videos to datamodel.py. What you are suggesting is basically that we make the content/source relations many-to-many instead on the current one-to-many. Eg. as it is now each item has exactly one content category, and a content category can have many members, but what you want is basically just to say that an item can have several content categories.
    • I am not totally against the idea, but it does come at a cost - I am particularly worried about performance because we are introducing a heck of a lot of many-to-many relations and I don't know how well sqlite scales in that direction. Besides, the whole model is simpler if items only has exactly one content+source and that counts for something. No matter how hard we try we can only approximate reality. -- MikkelKamstrup 2009-06-22 06:35:05


2024-10-23 11:37