Log-database Schema Changes (SUPERSEDED)

(!) This blueprint has been implemented in manner resembling solutions 2. and .3. Consider this blueprint obsolete

Launchpad blueprint: database-schema-changes

During the 0.2 cycles we have identified a collection of changes we want to do to the database schema. This blueprint suggests three solution models of which one should be picked.

Solution 1: Small Changes

Documentation on the database design used in the 0.2 can be found in Zeitgeist/Document/DatabaseDesign0.2

Relational Mimetype

In the 0.2 series we have the mimetype of an item as a VARCHAR column in the db. We should the item.mimetype column to be an INT and introduce a new table called mimetype looking exactly like the uri, content, and source tables. Ie. create the table like:

CREATE TABLE IF NOT EXISTS mimetype
                        (id INTEGER PRIMARY KEY, value VARCHAR UNIQUE)
CREATE UNIQUE INDEX IF NOT EXISTS mimetype_value
                        ON mimetype(value)

Remove event.end

The event.end column is not being used and should be removed

Relational Origin

Just like the item.mimetype was split out in its own table, item.origin should simply point into the uri table (so no need for a new table).

Multiple Subjects Per Event

Right now the event table uses item_id as primary key, meaning that we can only have one subject per event. This might be a very bad idea considering that the user can very well delete 1000 selected files with 1 mouse click, or apply the tag "puppies" to the same 1000 files with one click from Nautilus. If we use a compound PK like the annotation-table already does then we could have multiple subjects per event. Specifically the event table should be created as

CREATE TABLE IF NOT EXISTS event (item_id INTEGER ,
                                  subject_id INTEGER, start INTEGER,
                                  end INTEGER, app_id INTEGER,
                                  PRIMARY KEY (item_id, subject_id))

(we use almost the same now, except that item_id is declared as PK)

Questionable: Origin moved to uri Table?

Siegfried has an idea - I don't completely get it :-) https://bugs.launchpad.net/zeitgeist/+bug/425258

Questionable: Remove item.origin?

Is item.origin being used, and is it useful?

Questionable: Remove the app table all together?

When we reference applications we use the app table. It currently has two columns id and info. We not just make applications normal items and set item.content = Content.APPLICATION? One less table can save us a SQL JOIN.

Solution 2: Separate Tables

This model is more invasive than solution 1, but addresses the same issues and then some. The idea is to apply solution 1 and then also not make the event and annotation tables "inherit data" from the item table. Specifically the item table will be exactly as from solution 1 - the event and annotation tables will look exactly like the item table but adding an extra column subject_id:

id          INT       # uri.id, PRIMARY KEY
subject_id  INT       # uri.id, the subject of the event or annotation
content_id  INT       # content.id
source_id   INT       # source.id
origin      INT       # uri.id, url the item can be said to originate from
text        VARCHAR   # the title or the name of the item (first thing the users sees)
mimetype    INT       # mimetype.id
icon        VARCHAR   # 
payload     BLOB      # Free-for-use array of raw bytes 

Solution 3: Split Databases

This model is more invasive than solutions 1 and 2, and addresses the same issues. In order to better fit with having items stored in an external storage (such as Tracker or DesktopCouch) we should split items, events, and annotations not just into separate tables but different SQLite database files on disk. This probably means that we can no longer do JOINs between these bases - perhaps, haven't checked the SQLite docs yet.

Perhaps keep the items+annotations in one DB and the events in another...

Projects/Zeitgeist/Blueprint/DatabaseSchemaChanges (last edited 2013-12-03 14:54:41 by WilliamJonMcCann)