Tracker Roadmap and Wishlist

Roadmap

Tracker doesn't have a full time development team. The maintainers welcome your contributions to improve the project and will volunteer our time to review them, but a formal roadmap would be dishonest -- this blog post explains why.

Our general goals are...

Wishlist

The maintainers use GitLab issues for issues which are small and/or high priority. This wishlist records things we would like to see, but which aren't actively being worked on. If you're interested in working on something here, please contact us -- we would love to help you get started.

The website has no logo. We would like a new one! You can see the older logo here. And what about a more disctinctive website design?

Better file system monitoring

Currently tracker-miner-fs uses 'inotify' kernel API. You can't do recursive watches with inotify, so Tracker must create one for each directory that is monitored. Each inotify watch has a performance impact on the kernel. When indexing large content collections, Tracker might use all the available inotify watches, breaking other apps.

See also: http://wingolog.org/archives/2018/05/21/correct-or-inotify-pick-one

Since Linux 5.1, the 'FANotify' kernel API allows recursive directory watches. We should try to use this for Tracker's monitoring, and see if it works better.

Content apps need to index custom locations

See: https://gitlab.gnome.org/GNOME/gnome-music/-/issues/69

Totem desires a similar change, and Photos may also.

Content-addressed file IDs

Tracker separates the concepts of Data object (a file) and Information element (contents of a file). We refer to the file by a URI such as <file:////home/sam/Photos/Cat.jpg>. We return to the contents with a UUID such as <urn:bnode:123456ABCDE>.

Apps use the latter ID to refer to the file. For example, Photos may create a "cat pictures" album that contains <urn:bnode:123456ABCDE>. The good thing is that if Cat.jpg is renamed, the UUID remains the same and the album doesn't need to be updated. The downside is that if the user runs tracker3 reset --hard, the filesystem will be reindexed from scratch and the Cat.jpg file will have a different UUID -- effectively losing the photo albums. (This could be mitigated by exporting the albums before resetting the database.)

Instead of generating a UUID, we could use the hash of the file to refer to its contents. The 'database reset' problem would no longer exist as the files would have a stable ID.

This has some open questions:

  • Can we hash every file with acceptable IO performance ?
    • We can hash low-resolution version of photos - maybe even the thumbnail if it already exists.
  • What do we do when the contents of the file change ?
    • For some MIME types, the file can considered immutable. eg., video and audio content will almost never change. If a photo changes, then it can be considered to be an entirely different thing. Most photo editors (from Darktable to GNOME Photos) are non-destructive these days. I guess it's mostly a problem for editable documents.

Help when formulating SPARQL queries

It's easy to accidentally write SPARQL queries that return no matches. A simple example:

SELECT ?url ?mime {
  ?url a nfo:FileDataObject ;
     nie:mimeType ?mime .
  FILTER (tracker:uri-is-descendant('file:///home/sam', ?url))
}

There are two problems with this query -- 'nie:mimeType' should be 'nie:isStoredAs/nie:mimeType'; and 'file:///home/sam' should be <file:///home/sam>.

Writing queries would be better if Tracker warned the user in case of mistakes like these. SPARQL itself allows queries to return unbound variables, but I think we could (optionally) warn in these cases.

Index the whole home directory

Tracker defaults to indexing a limited set of directories. This leads to the problem described here where search will not return any results for filenames in ignored locations.

We don't content-index the whole home directory by default, because...

  • first-time indexing so many files may be expensive
  • the index may grow very big (can we store 1,000,000+ nfo:FileDataObject resources comfortably?)

  • media content from unexpected places may be indexed, e.g. a video game folder containing assets in .mp3 and .jpeg format that Tracker processes if they were part of the user's music and photo collection.

We should investigate if there's way to do 'filenames-only' indexing across the whole home directory, so that search can return all the user's files even if it doesn't do full-text search on everything.

Investigate `serd` library for Turtle reading and writing

Tracker contains code to read and write Turtle and TriG syntax. The https://gitlab.com/drobilla/serd library is a small, fast implementation that could do this for us.

Tracker's Turtle parser reports errors without line number information. Serd has better error handling.

Implement XESAM/Recoll query language

The tracker CLI currently supports SPARQL queries and keyword searches. Other tools, such as Recoll support a more powerful syntax (see this documentation for example). We could implement the same thing in Tracker.

SPARQL templates

Generating SPARQL queries through string concatenation is common and problematic. Escaping of special characters needs to be done correctly by the caller. And the query is hard to read when spread throughout code.

A better approach is to write the query as a template, which can later be expanded. The TrackerSparqlStatement function goes some way towards this, but it's quite limited:

  • Only works for SELECT, not INSERT / UPDATE
  • Only works for simple literal values, not lists or triples
  • Does not work for GRAPH or SERVICE names

All of these limitations are because TrackerSparqlStatement maps directly to sqlite3_prepare(), so the underlying SQL query cannot change between statements.

In GNOME Photos we added a separate 'sparql template' class, using {{ }} template substitutions. This worked well and could be expanded. We need to avoid confusion with TrackerSparqlStatement, however.

Update Tracker wikipedia page

It's rather out of date. See https://en.wikipedia.org/wiki/Tracker_(search_software)


Old TODO items are available in the Projects/Tracker/Roadmap/Old/0.x


CategoryRoadMap

---

Projects/Tracker/Roadmap (last edited 2021-03-12 13:44:26 by DebarshiRay)