Zeitgeist Engine API Draft

Contents

Zeitgeist Engine API Draft

This page is only a draft and is still work in progress

This page presents a new DBus API for the Zeitgeist engine. To help implement the API a data model as described Zeitgeist database design doc will be needed (unless the API is implemented on top of a full triple store like Tracker or KDEs Nepomuk work of course).

Design Ideas

Be compact, don't specify any more methods than is strictly necessary. Convenience methods go in helper libs
Minimize number of DBus roundtrips
Make it possible to split the item+metadata repository implementation from the event logging framework. This way data and metadata can be stored in Tracker, CouchDB, or what ever, while the event log is implemented in another way

Problem: Sorting

We have to carefully consider how we want to sort the results... There are three cases:

When we take a list of URIs as input we can return the results in the same order as the input URIs
Sort by mtime - this probably requires an item.mtime column (which begs the question if we should have item.ctime and item.atime too)
Sort by use frequency, this might get tricky because we have to sum the use counts for each URI from the event table

Before we proceed much more with the design we should agree on a solution here...

Problem: Repository/Log Joins

While it is nice to separate the item and metadata repository from the event log it does incur some problems. We will likely need to do something equivalent to an SQL join between the two things. Ie. select items based on given log criteria and/or select events for a set of items meeting certain criteria.

We really want to avoid forcing ourselves to filter through large result sets. We also want to minimize the number of DBus roundtrips.

Interface: org.gnome.zeitgeist.Repository

The Zeitgeist engine needs detailed knowledge of all entities involved in the user's daily workflow in order to make proper guesses about usage patterns.

FIXME: This interface needs to be completely reconsidered!

Item Struct

We will use the DBus signature I to represent an item corresponding to the signature sssssssay

uri - Item URI encoded as a string
content - The content type of the item, uniquely identified by the URI of the content type as found in the Nepomuk ontology
source - The source type of the item, uniquely identified by the URI of the source type as found in the Nepomuk ontology
origin - The URL from which the item originates
text - Human readable label for the item
mimetype - Mimetype of the item - if applicable
icon - Name of stock item as found in the XDG icon naming spec, or URL to icon file
payload - Freeform bytearray

GetItems

GetItems(in as uri_list, in b expand_annotations, out a(IaI) results)

Retrieve a collection of items given by their URIs, optionally expanding all annotations on each item. The returned values are in the same order as in uri_list.

FindAnnotations

FindAnnotations(in as target_filter, in as content_filter, in as source_filter, in as origin_filter, in as mimetype_filter, out FIXME)

Any empty filter denotes a wildcard

FindItems

FindAnnotations(in as content_filter, in as source_filter, in as origin_filter, in as mimetype_filter, out FIXME)

Any empty filter denotes a wildcard

SetItems

SetItems(in FIXME item_list)

DeleteItems

DeleteItems(in as uri_list)

Interface: org.gnome.zeitgeist.Log

This is the primary interface to access the Zeitgeist event log. It should be noted that events can be "promoted" to real items in the repository, in which case they can be the targets of annotations and other events!

Log Event Struct

For breveity we will use the bus signature E to describe (ssssu). This denotes an "event struct" containing:

uri - The URI of the event
actor - The program or entity responsible for generating the event. Applications should use the URI to their .desktop file.
- FIXME: For other types of events such as those generated by webapps (Google Docs etc.) and system notifications we need to figure out what to do.
action - A string identifying the type of event that happened. "Sent email", "Modified file", etc. This is encoded as a URI signifying some formal type in an Event Ontology we need to define.
subject - String URI of the item being affected by the event
timestamp - Unix epoch as unsigned int

LogEvents

LogEvents(in aE event_list, out as event_uris)

event_list : A list of event structs with empty strings as URIs. All fields of the struct must be valid and the engine will return an error if they are not. In this case none of the supplied events will be logged. If the URI field of the events is set it will simply be ignored.
Returns: A list containing the URIs the Zeitgeist engine assigned to the logged events

GetEvents

GetEvents(is as event_uris, out aE events)

Look up a set of events given their URIs

FindEvents

FindEvents(in as actor_filter, in as action_filter, in as subject_filter, in u time_start, in u time_end, in u offset, in u count, out aE)

For the *_filter arguments an empty string denotes a wildcard.

actor_filter - A list of actors that the events must be logged for
action_filter - A list of actions that the events must be logged as
subject_filter - A list of subject URIs that the events must be logged on
item_content_filter - A list of content types the subject items must belong to
item_source_filter - A list of source types the subject items must belong to
after - The events must be logged with a timestamp bigger than/after this value
before - The events must be logged with a timestamp smaller than/earlier than this value
offset - The offset into the result set from which to return events. For most uses this will be 0. For paging through large result sets it is recommended to use this argument to do paging. Pulling in something like 20-100 results at a time
count - The number of events to return
Returns: A list of event structs matching the provided filters

Event URI Construction

We should assign a URI to each logged event. The event URI should probably contain a timestamp of some sort. The catch here is that we might have more than 1 event is any given millisecond, so that alone wont make the event URI unique.

The event URIs does not have to include a lot of information. Their prime purpose is to be unique ids for each event. Something like the below should do:

zeitgeist://event/${system_time}/${global_sequence_number}

Here global_sequence_number is simply a long that is incremented by 1 each time we log an event.

Iterator Idea

The Find* methods could use the following approach to do cursoring over large result sets (we don't want to send 1M objects in one DBus message). The find methods take on an object path as an extra argument, this object is used as an iterator where more items can be requested - the find method itself returns the first batch of items. The object path to the iterator is fed into the method instead of being a return value to enable the caller to do "pipelining" of DBus requests - meaning that it doesn't have to wait for the response to Find*, it can simply issue a NextPage() on the iterator object straight after it has called Find*.

Example:

FindFoo (in o iter_path, in s what_foo, out as first_page)

The server will then bind on object on path iter_path with the following interface:

org.gnome.zeitgeist.Iterator.NextPage(in i max, out as values, out i offset)
org.gnome.zeitgeist.Iterator.Close()

A client could use this interface like this:

function receive_results(as values, i offset) {
   print "Got new page with data %s starting at %s" % (values, offset)
}

iter = CreateDBusProxy("/my/app/iter27", "org.gnome.zeitgeist.Iterator")
zeitgeist.FindFoo("/my/app/iter27", "this foo", async_result=receive_results)
// Don't wait for a reply, just continue fetching results!
iter.NextPage(20, async_result=receive_results)