Allow Tracker to query remote services
What is available : the APIs and concepts
What we're trying to build is a bridge between Tracker and several websites. The idea is to pull the data from remote sites directly into Tracker, so that there's no delay when the user runs a query. The bridges insert the pulled data directly into Tracker's store, using its DBus interface.
Flickr has a rather complete API, which allows searching and listing albums quite easily. It can do full text search in tags, title, author...
Facebook also offers a complete API to access pictures, friends etc. Facebook also has a query language called FQL, also I don't know what's possible with it yet.
You can access many features of Google Docs using the GData API. The API allows full text query on documents, as well as queries on metadata.
How we do it : the bridges, and the bridge manager
The bridges are standalone processes, and communicate with Tracker via DBus. This approach has several advantages :
- we limit the impact on Tracker's codebase
- it makes developping bridges and shipping them easier
- if a bridge crashes, the others services are not affected
- bridges can be written in any language
of course, there are also a few drawbacks, among them a slight overhead due to IPC. The memory usage is reduced to its minimal by making all the common function available in a lib shared by all the bridges.
All the bridges are controlled by a bridge manager, which is part of Tracker. Bridges are started via DBus, and turned off as soon as they're not needed anymore. The manager asks the bridge to pull data every N minutes, or when it knows some new data is available (for example when Tracker indexes a mail/IM mentioning an online resource).
The common lib provides the bridges with credentials storage functions (which allows plugging any backend, so far gnome keyring is the only one).
The technical stuff
Let's not reinvent the wheel
As much as possible, I use existing libraries. Developing everything from scratch would be time consuming, and harder to maintain. I'll have to spend some time evaluating the different available solutions, and the potential modifications they would require.
There is a lib to access Flickr API based on Curl, called flickcurl . According to their webpage, they cover all the Flickr API, and Curl is a well known and stable piece of software. Ideally, I'd know like to ditch flickcurl for librest, but I still have a few bugs preventing me to do so.
A little googling didn't turn up any C library to access Facebook. I therefore use librest to access Facebook's API.
There's an ongoing effort, libgdata , to provide a C library binding Google's API. Thibault Saunier implemented GDocs support for his 2009 GSOC project.
The bridge manager will use NetworkManager if possible to take care of switching off the bridges when the user goes offline. Every bridge has an associated desktop file to describe it (DBus name and path, name, icon...).
So far, I only index remote pictures, so I use nmo. The Nepomuk ontologies provide everything I need to store the metadata I can retrieve online.
The DBus interface
DBus is used for communication between the bridge manager and the bridges. Basically, it's how Tracker talks to the bridges. Bridges are activation started, and turned off when they're not needed.
The tracker extractor does not allow to index a data stream directly, the data to index has to be a valid gvfs url. We need a way for the extractor either to be able to index a data stream, or to index a file but allow us to modify the resource identifier afterwards. --> SOLVED : we can use the same functions Tracker uses when files are renamed.
http://git.gnome.org/cgit/gio-strigi/ : GIO bridge for streamanalyzer
I want to play !
I'm in the process of including my work in Tracker's git. Meanwhile, you can have a look at the code at http://git.mymadcat.com/gsoc
- The flickcurl bindings are very incomplete (because I have to write them manually). I'll add missing functions when I need them, I have to do the bindings by hand which is not that much fun...
- I use a custom version of the gnome-keyring vala bindings to fix some issues. I'll push it upstream very soon.