Camel DataBase Summary

It has been a long time since Notzed, blogged/wrote/hacked Evolution/CamelDS. It has been proposed as a design that would reduce Evolution's memory consumption. One of the prime problems for Evolution currently is that the summaries of mail (like flags, from, to,cc what ever you see-in / need-to-build message-list are kept in memory. You can't affort to read from the summary file all the time, when you need to query summary for some specific message(uids). Trash/Junk and any other vfolders query these summary for their updates. Even if you don't have vfolders. Trash/Junk would ref your summary when ever you open a folder. Means that If you switch to 5 different folders, all the 5 folders summaries are in memory, because your Trash/Junk would need them

My Thought/Design

Remove the current file based summary and make it a DB based one (I used sqlite in my prototype).

Every store has a folders.db file, which has information about every folder. and one table to get the folderinfo of all the folders.

The special table is called "folders". The table structure is
CREATE TABLE folders (folder TEXT PRIMARY KEY, version INTEGER, lflags INTEGER, nextuid INTEGER, time INTEGER, savedcount INTEGER, unread INTEGER, deleted INTEGER, junk INTEGER, bdata TEXT)

  • folder - folder name
  • lflags - store flags
  • nextuid - next uid
  • time - time
  • savedcount - number of messages
  • unread - unread count
  • deleted - deleted count
  • junk - junk count
  • bdata - string list of information that the derived class wants to store. (In imap it can be server flags/version etc.)


Apart from that every folder is a table, where the message infos can be queried The table structure is as follows.

CREATE TABLE %s (uid TEXT UNIQUE PRIMARY KEY, gflags INTEGER, eunread INTEGER, ldeleted INTEGER, kjunk INTEGER, isize INTEGER, dsent INTEGER, dreceived INTEGER, jsubject TEXT, ffrom TEXT, tto TEXT, cc TEXT, mlist TEXT, part TEXT, userflags TEXT, usertags TEXT, cinfo TEXT, bdata TEXT)

the odd prefixes are to avoid a strcmp.

  • uid - message uid
  • glflags - message flags
  • eunread - unread status (For easy query of junk/trask)
  • ldeleted - deleted status (For easy query of junk/trask)
  • kjunk - junk status
  • isize - message size
  • dsent - date sent in unix time format
  • dreceived - date received
  • jsubject - subject
  • ffrom/tto/cc - From/To/CC
  • mlist - message list specific fields
  • part - It is a string, which has space seperated part ids with counts accordingly (May be it needs more design about how we want to represent it and restore it)
  • userflags - string list of userflags "[COUNT] [STRLEN]String [STRLEN]String [STRLEN]String" is the format
  • usertags - string list of usertag
  • cinfo - content info store in the same way as part.
  • bdata - string list of data that the derived class wants to store as part of every message info. (It could be frompos in mbox or some imap things in IMAP, etc)


In the base approach, what will be in memory will be what is shown in the message list. Means that even the VFolder/trash/junk shouldn't be.

When ever a folder is loaded, the bd is loaded and to speed up the startup the message infos of the all the folders are loaded to memory (Need to be redesigned). and the summary has two new things.

  1. uids which will have all the uids of the messages that it has (both in cache + unsaved in the hash table)
  2. message_uids - which will have the actual message infos.


On summary construct these two will be on memory. With the message info's being reffed by 1. All apis like camel_folder_get_array etc that returns the list of message info will be broken to just return list of uids. Then the caller has to use camel_folder_summary_uid or camel_folder_summary_index the get the actual info which be reffed and given to the caller and the call has to unref it. When message list completes, it would have reffed Just the list of things show in the list.

For example, if you have a folder of 1000 messages, and 100 junk and 100 deleted. The message list would show 800 (in hide deleted mode). The 800 will be reffed and kept in memory.

In my current prototype I have a periodic timer, that runs through the message_uids and sees if the ref count is 1. it frees it. (Need to implemnt save to summary if flagged). This needs to be better designed, to have a time-to-live-for-a-ref. Every info has to live x minute sort of thing has to be done.


Which means after x mins you would loose the 200 message infos from memory. I'll come to how trash/junk works down.

CamelVeeMessageInfo is a CamelMessageInfo that will be in memory for VFolders/trash/Junk but not the actual thing. (Need to be designed yet - so don't rant about this)
The current structure is broken to

 CamelVeeMessageInfo {
    CamelMessageInfo info
    CamelFolderSummary *s
    char *uid (The real uid)
 };


So when the viewed folder gets added to junk/trash (for the first time) all the junk/trash are queried and the uids are returned. The uids are made to CamelVeeMessgeInfo but the original info won't be refferd. Means that when you go to another folder all the contents are unloaded from memory. When you go to trash, you would still see the deleted items from the first folder you saw. And any folder change gets the uids which are also stored the same way. (FIXME: There is a performance hit, when having a vfolders with huge folders. Since the viewed items aren't in memory and fetched from the respective sumary from db, it is a bit slow. I think I will go back to just ref the viewed ones, when they are viewed. Not when they are updated. Means that I have to change the structure a bit. But all future work when I implement it)

When the time out triggers, it removes from cache the expired items with refcount as 1. All modified message infos are synced+freed and other are just freed. (FIXME: I need to have a uids of modified list messages just to upsync in case of IMAP/GW/EXchange etc...)

In my prototype
All providers have to implement

  • message_info_decode = message_info_load
  • message_info_encode = message_info_save
  • summary_header_decode = summary_header_load
  • summary_header_encode = summary_header_save


camel_folder_db_summary_save/load would call these the way camel_folder_summary_save/load called. They all have a MessageInfoRecord and SummaryInfoRecord that it uses to fetch from DB and save to DB which camel_folder_db_summary_save/load would do it.

The same way clear also clears from the db using a similar call.

I think all these have to be refactored and designed better in terms of code org. (Just my prototype was a hack to my proof of concept)
More to be updated...

Things to be deciced/thought/

  • Currently vfolders doesn't have a persistent summary. Having that could even speed up, but may have to see how that changes things.
  • How to have the CamelVeeMessageInfo's real info reffed only when they are viewed.

  • How to determine the interval to remove cache. It would be bad to have frequent disk/db read. (LRU - algo)
  • Have indexed tables (Currently I have it just indexed on uids)
  • Have the code refactoring, so that junk/trash are fetched efficiently from summary rather than a all-load-and-test.
  • Move the search from in-memory to search in db so that it can be fast and absolutely no more memory requirement
  • Remote view: Map the message-list view + top buffer + bottom buffer only on memory rather than the entire message list contents. This can be a little slower but should be using absolutely very low memory.


Data with my Prototype


Valgrind-expl.png

Comments during the discussion with Fejj on the design:

  • Naming of columns: Have the actual names (well, for any given column name, there's none that are gonna have more than say 2 equal characters)
  • should nextuid really be an integer? - Exchange/GW have them as alphamnumeric ?
  • Persistent summary for vfolders can speed up; Store vuid (8char hash + read uid): Need to design a appropriate table struct for vfolders's summary
  • Look at and use gmime/utils/cache.[c,h] - LRU cache implemntation using EMemchunk (MemChunk)

...

Apps/Evolution/CamelDBSummary (last edited 2013-08-08 22:50:08 by WilliamJonMcCann)