Zeitgeist Event Ontology Draft

Launchpad blueprint: events-ontology

Bzr branch: lp:~zeitgeist/zeitgeist/ontology_definition

Hierarchical Types

Our current type system for events and subjects is a flat one, in order to fit better in with the rest of the semantic world we should support a hierarchical type system; fx. so that USER_ACTIVITY and USER_NOTIFICATION both are sub types of a general EVENT type (I am using our variable names from datamodel.py here, but we will of course use formal URIs to identy the types).

Hierarchical types have impact on the way we perform queries against the DB. If I query for everything of type EVENT my result shoud contain events of any type, eg. USER_ACTIVITY, USER_NOTIFICATION, or any other event sub types we have.

Event Ontology

The event ontology should be written in TriG - see fx. the Nepomuk File Ontology in TriG.

Technical Stuff

Query Expansion

To handle hierarchical types we must perform query expansion eg. recusrsively expand the type tree to a flat list of OR clauses:

event.interpretation='http://foo.com/types#UserActivity' OR event.interpretation='http://foo.com/types#UserNotification'

(!) Note: Query expansion requires that we know the entire ontology we use before hand, so that we can parse it up into a tre structure in memory. We can still handle unknown fields, but they will not be expanded.

Runtime Loading of Ontologies

We need to load the ontologies at runtime somehow, and we need to do so efficiently. I don't know what Tracker et al. does but here are some ideas:

  1. Depend on the shared-desktop-ontologies package (which may or may not be packed for <insert favorite distro>) and parse and load then when the engine starts

  2. Distribute .trig files with the engine and parse and load then when the engine starts

  3. At release/build time we compile ontologies into pickled Python objects and include these in the Zeitgeist package
  4. At release/build time we compile ontologies into Python code and include these in the Zeitgeist package
  5. Variations of these, caching pickled ontologies compiled from system wide desktop ontologies etc.

I dislike 1. and 2. because it will increase our IO load at startup a fair bit as we will probably have to parse a handful ontologies or such...

Kamstrup's Proposed Solution

In a nutshell: Use the Python module called rdflib to parse .trig files and write out Python modules. This will be managed at build-time by some autotools magic.

Ubuntu ships the rdflib in main, and it's installed by default. I don't know about the other major distros.

  • (!) If you are on Ubuntu look in /usr/share/doc/python-rdflib/examples/swap_primer.py for an example using rdflib

Concrete Details on Kamstrup's Proposal

Ontology Generation

  • We keep our ontology in data/ontology/zeitgeist.trig. We also keep a .trig version of the required Nepomuk ontologies in data/ontology (eg. nie.trig and nfo.trig).

  • In zeitgeist/ontology/ we keep a list of stub files for our ontologies, fx. zeitgeist/ontology/zeitgeist.py.in

  • On 'make' - For each stub ontology zeitgeist/ontology/*.py.in create zeitgeist/ontology/*.py based on data/ontology/*.trig (with some tool we've written using rdflib)

    • I have already made good progress on this last point, see the branch linked above -- MikkelKamstrup 2010-03-04 21:51:54

Generated Python Classes

  • The generated ontology Python modules contains an instance of a zeitgeist.datamodel.SymbolCollection (this class already exists in trunk)

  • We must add methods to the zeitgeist.datamodel.Symbol class to return a list of all children (recursively). These methods are then called on query time to do query expansion to all subtypes.

Projects/Zeitgeist/Blueprint/EventOntology (last edited 2013-12-03 14:54:41 by WilliamJonMcCann)