Allow Tracker to query remote services

What is available : the APIs and concepts

What we're trying to build is a bridge between Tracker and several websites. The idea is to pull the data from remote sites directly into Tracker, so that there's no delay when the user runs a query. The bridges insert the pulled data directly into Tracker's store, using its DBus interface.

Flickr

Flickr has a rather complete API, which allows searching and listing albums quite easily. It can do full text search in tags, title, author...

Facebook

Facebook also offers a complete API to access pictures, friends etc. Facebook also has a query language called FQL, also I don't know what's possible with it yet.

Google Documents

You can access many features of Google Docs using the GData API. The API allows full text query on documents, as well as queries on metadata.

How we do it : the bridges, and the bridge manager

The bridges are standalone processes, and communicate with Tracker via DBus. This approach has several advantages :

  • we limit the impact on Tracker's codebase
  • it makes developping bridges and shipping them easier
  • if a bridge crashes, the others services are not affected
  • bridges can be written in any language

of course, there are also a few drawbacks, among them a slight overhead due to IPC. The memory usage is reduced to its minimal by making all the common function available in a lib shared by all the bridges.

All the bridges are controlled by a bridge manager, which is part of Tracker. Bridges are started via DBus, and turned off as soon as they're not needed anymore. The manager asks the bridge to pull data every N minutes, or when it knows some new data is available (for example when Tracker indexes a mail/IM mentioning an online resource).

The common lib provides the bridges with credentials storage functions (which allows plugging any backend, so far gnome keyring is the only one).

The technical stuff

Let's not reinvent the wheel

As much as possible, I use existing libraries. Developing everything from scratch would be time consuming, and harder to maintain. I'll have to spend some time evaluating the different available solutions, and the potential modifications they would require.

Flickr

There is a lib to access Flickr API based on Curl, called flickcurl [1]. According to their webpage, they cover all the Flickr API, and Curl is a well known and stable piece of software. Ideally, I'd know like to ditch flickcurl for librest, but I still have a few bugs preventing me to do so.

Facebook

A little googling didn't turn up any C library to access Facebook. I therefore use librest to access Facebook's API.

Google Documents

There's an ongoing effort, libgdata [2], to provide a C library binding Google's API. Thibault Saunier implemented GDocs support for his 2009 GSOC project.

The bridge manager will use NetworkManager if possible to take care of switching off the bridges when the user goes offline. Every bridge has an associated desktop file to describe it (DBus name and path, name, icon...).

Ontologies

So far, I only index remote pictures, so I use nmo. The Nepomuk ontologies provide everything I need to store the metadata I can retrieve online.

The DBus interface

DBus is used for communication between the bridge manager and the bridges. Basically, it's how Tracker talks to the bridges. Bridges are activation started, and turned off when they're not needed.

Current limitations

  • The tracker extractor does not allow to index a data stream directly, the data to index has to be a valid gvfs url. We need a way for the extractor either to be able to index a data stream, or to index a file but allow us to modify the resource identifier afterwards. --> SOLVED : we can use the same functions Tracker uses when files are renamed.

Bookmarks

I want to play !

I'm in the process of including my work in Tracker's git. Meanwhile, you can have a look at the code at http://git.mymadcat.com/gsoc

Remarks

  • The flickcurl bindings are very incomplete (because I have to write them manually). I'll add missing functions when I need them, I have to do the bindings by hand which is not that much fun...
  • I use a custom version of the gnome-keyring vala bindings to fix some issues. I'll push it upstream very soon.

Attic/Tracker/TrackerWebService (last edited 2023-08-14 12:50:11 by CarlosGarnacho)