DataStore 2013 BOF

Attendees

Use Cases

  • RSS Reader
  • Mail
  • Shotwell
  • Geary

Data Format Choices

  • GVariant

  • JSON
  • BSON
  • XML
  • SQLite

Based on the discussion, we decided that the use of GVariant is ideal since it is increasingly used across the desktop.

Disk Format

The disk format will contain a few types of items within it.

  • Rows will be GVariant
  • Collections of rows will be within a single file.
  • There will be sections of the file, that have a version associated and a collection.
  • A set of freelists based on sizes will help to locate existing free space.
  • Compaction will occur by building a new file.
  • GVariant is based on system endian, so translation may need to occur.
  • We do not want a version number per record since that will cause the GVariant space to be less efficient.
  • We want to be able to have an fsck tool to find corruptions. (Backpointers, I am <here>, Check free space is accounted for).

  • Always write in new location (and mark old location as free).
  • Indexes and collections/rows/etc live within the same file.
  • We will use one large file, with minimal preallocations as we add new chunks of data.
  • OID will be monotonic 64-bit unsigned integer.

Implementation Discussions

  • pwrite/pread. mmap for reads is hard because of GVariant API and referencing. Requires keeping mmap() region open past what we know as lifecycle.
  • Registration of GVariant translations.
  • Versions could possibly be in record header.
  • Do we need a schema/catalog? (GVariant type info might be everything we need?)
  • How about migrations, shoudl we migrate everything ahead of time? Allow the application to specify the version? It sounds like we might have chunks of versioned variants. When changing them the API can update the version or specify a translation callback for us.

API High Level

  • The high-level API should deal with GObjects to/from storage.
  • The underlying storage will use GVariant.
  • The API will be synchronous, callers can deal with async nature if necessary.
  • Signals might be needed to hook up stuff like GtkTreeModel efficiently, however signals have overhead too.

  • Consumers register functions to (de)serialize to/from the underlying database and GObjects (using GVariant).

Indexing

  • Basic B+Tree indexing to start.
  • Index will also likely contain GVariant info, but typed so they are all the same.

Journaling

  • Write Ahead Log is likely to be used. Traditional journal possible as well to get started.

Hackfests/DataStore2013 (last edited 2013-08-06 13:39:04 by ChristianHergert)