Web history - Bookmarks handling in tracker

Summary

In tracker we are interested on storing the WebHistory and the bookmarks to offer to the user a better "search experience". Those data can provide a very valuable information to find "related items".

On the epiphany side, they need an efficient storage for this information. They have a clear set of requirements and documented work about it in this page

Web history - Bookmarks in tracker

Relevant classes and properties

  • nfo:WebHistory: An instance of this class is an entry in the WebHistory, it has the following properties:

  • nfo:Bookmark: An instance of this class represent one bookmark, with the following properties:

    • nie:title "Text we want to associate with the link"
    • nie:ContentCreated "date the bookmark is created"

    • nfo:bookmarks: The actual URL that we want to bookmark
  • nao:Tag: Any of the previous items can be tagged, linked with a nao:Tag instance.

    • nao:prefLabel The property with the actual tag-text

Examples of data in turtle format

The user has visited http://www.semanticdesktop.org/ontologies/nie/

<urn:uuid:1331264514> a nfo:WebHistory;
        nie:title "Nepomuk information ontology";
        nie:contentCreated "2009-03-12T15:02:53";
        nfo:domain "http://www.semanticdesktop.org";
        nfo:uri "http://www.semanticdesktop.org/ontologies/nie/".

A bookmark to "Git for GNOME developers" tagged with "tutorial" and "git"

<urn:uuid:2094738855> a nfo:Bookmark;
        nie:title "Git for GNOME developers - Gnome Live!";
        nie:contentCreated "2008-04-12T14:17:54";
        nao:hasTag [a nao:Tag; nao:prefLabel "tutorial"];
        nao:hasTag [a nao:Tag; nao:prefLabel "git"];
        nfo:bookmarks <http://live.gnome.org/GitForGnomeDevelopers>.

Example of some insertions/queries

Creating the previous WebHistory item. This is the Sparql to tracker (using the SparqlUpdate method)

INSERT { 
<urn:uuid:1331264514> a nfo:WebHistory;
        nie:title "Nepomuk information ontology";
        nie:contentCreated "2009-03-12T15:02:53";
        nfo:domain "http://www.semanticdesktop.org";
        nfo:uri "http://www.semanticdesktop.org/ontologies/nie/". }
  • Viewing history/bookmarks items by visit date (e.g. today, yesterday, last week, last month, more than a month ago, etc.).

SELECT ?title ?date WHERE {
      ?entry a nfo:WebHistory ;
         nie:title ?title ;
         nie:contentCreated ?date .
} ORDER BY ?date LIMIT 10
  • For yesterday/last week/last month/... (untested query :) but it should look pretty similar)

SELECT ?entry ?title WHERE {
      ?entry a nfo:WebHistory ;
         nie:title ?title ;
         nie:contentCreated ?date .
      FILTER (?date > "last_week_day")
} ORDER BY ?date 
  • Searching history/bookmark items using page URL and title keywords (and optionally full page text)
    • Searching the work "design" anywhere in the webhistory entry

SELECT ?entry ?title WHERE {
      ?entry a nfo:WebHistory ;
             fts:match "design".
} 
  • Viewing history items grouped by websites
    • Last 100 entries in the WebHistory grouped by domain:

SELECT ?entry ?title WHERE {
      ?entry a nfo:WebHistory ;
         nie:title ?title ;
         nfo:domain ?domain.
} GROUP BY ?domain LIMIT 10
  • Viewing bookmarked items grouped by topics
    • Grouping by tag (assuming tags = topics)

SELECT ?entry ?title WHERE {
      ?entry a nfo:WebHistory ;
         nie:title ?title ;
         nao:hasTag ?tag.
} GROUP BY ?tag
  • Sorting history/bookmark items by visit date or relevance (e.g. typed or clicked pages are given more weight)

SELECT ?entry ?title WHERE {
      ?entry a nfo:WebHistory ;
         nie:title ?title ;
         nie:contentCreated ?date.
} ORDER BY ?date
  • It would be possible to add a "hit" property (It should be set/updated by the browser) and then do something like:

SELECT ?entry ?title WHERE {
      ?entry a nfo:WebHistory ;
         nie:title ?title ;
         nao:hit ?hit.
} ORDER BY ?hit

How to test this (Development)

  • Development ongoing in the git branch 'vstore' in git://git.codethink.co.uk/git/tracker
  • Checkout, compile and install. WARNING: Remove completely your old tracker installation!!!

  • Use tracker-sparql to send queries to the daemon
  • Join #tracker in GIMPNet and the mailing list

Please note that this code is under HEAVY development, not ready for release.

Some remarks

  • The applications are responsible of removing old/invalid entries

Attic/Tracker/WebHistory (last edited 2023-08-14 12:50:31 by CarlosGarnacho)