(Note that this page is outdated and is kept here for historical purposes)

Distributed SCM in GNOME

This page is for the discussion and planning of a possible move to a distributed source code management system for the Gnome project. Please back up opinions with citations or at least a reasoned argument. Try not to use vacuous and possibly heated words like "Wicked", "PERFORMANCE!", "Very Robust", etc.

Hard requirements

Needs to be able to import GNOME (cvs2svn) SVN:

Need to have server side hooks:

  • Must have ability to reject the commit / push
  • Must have ability to explain rejection via custom message
  • MAINTAINERS file check
  • svn-commits-list
  • CIA (via email, not xml-rpc)
  • Able to determine affected branch(es) and send short email to library.gnome.org and l10n.gnome.org
  • PO file sanity check

Need to log the GNOME userid making the push somehow

Must not allow (malicious/unintended) user to corrupt the repository

  • No VFS methods in the $RCS.gnome.org daemon
  • Nothing that could wipe out history on $RCS.gnome.org

Must allow+understand some sort of 'ACL'

  • e.g. simply follow the rwx (e.g. restrict writing to sysadmin/'gnomecvs' group)

Soft requirements

Should be able to commit / push via SSH (to reuse existing infrastructure)

Should allow generic (GNOME-wide) server side hooks together with module specific ones

  • e.g. in SVN, the module specific ones call the generic ones per default

Have DSCM expert also be sysadmin

  • I don't think it will be good to use something that can't be supported by sysadmin (e.g. like ${RCS}master@gnome.org should be able to provide assistance and fix things / provide maintenance of repos, etc)

  • perhaps by making expert a sysadmin (requires person to be a sysadmin type person)

Extra points wishes (?)

Note: Not yet fully decided if these are good. More Blue Sky.

Allow signing via GPG

  • Optional! More for sysadmin modules to ensure $RCS.gnome.org wasn't hacked.

Allow somehow to have bot in between to only do the real commit/push after successful run of test suite

  • Perhaps bad idea due to slowing down development
  • Perhaps bad idea due to needing latest software every time on $RCS.gnome.org

Something that allows to combine/split up repository (SVN terms)

  • e.g. like libgweather into separate module
  • e.g. to move separate module into existing repository (build -> gtk+.. bad example, but was needed before)

Allow people to have their own branches (of GNOME modules) hosted by GNOME

  • e.g. like gitweb.freedesktop.org (users part), or bzr.launchpad.org/~someuser
  • should be visible to the world
  • extra points: efficient in storage (share with main GNOME module.. so if 100 users branch gtk+, only the differences are stored)
  • people should be able to branch again from these 'people branches'

D-SCM Implementations

An important part of the migration discussion is selecting which Distributed system to use, at the time of this writing, there are really only 3 clear contenders who are mature enough, and meet the feature set we will most likely need. They are:

  • Bazaar (bzr)
  • Git
  • Mercurial (hg)

There really needs to be a serious evaluation of each, with regards to what exactly the GNOME Project needs. Given the size of such an undertaking, I would like to get a list of pros and cons for each from the community.

For Bazaar

  • Good Performance. Bzr status in a tree of 5,000 files takes just 0.5 seconds (http://bazaar-vcs.org/).

  • Very Robust Feature Set. (Elaborate please)
    • bzr has a very extensive testsuite

  • Used by [[many projects including MySQL
  • Extensible via plugins.

  • Bisect feature provided as a plugin.
  • Nautilus Integration — Great for usability and integration with GNOME platform.

  • Quite user friendly. (Elaborate please). Examples:
    • Reasonably decent tab completion, no executable sub-commands left in your $PATH like git-*
    • Friendly error messages: Failed bzr pull suggests using bzr merge instead
    • Easy transition from cvs/svn commands
    • Submit your changes to the maintainer with bzr send - sends a diff, history data, and cover letter

    • Default bzr log shows which revisions merged which others, unique(?) to bzr

  • Remembers pull location separately from push location, for easy tracking of read-only repositories.
  • Repository format can automatically be pulled from over http (if put it user's ~/public_html/) (allows user to push suggested branches over sftp even if the remote server doesn't have bzr installed)
  • Bound branches (makes bzr feel/look/smell like svn)

  • Solid development community and commitment to upstream developments. As this is Ubuntu's SCM, we can be confident that the project will be maintained.
  • Written in Python (Python is everywhere in Gnome, so adding Gnome-specific features would be very possible.)
    • Regular releases
    • API stability policy
  • Eclipse/Emacs/vim/Monodevelop/Text Mate/Visual studio/gedit/pida integration
  • Avahi/Zeroconf plugin (GUADEC nicety): http://blogs.gnome.org/jamesh/2008/02/19/bzr-avahi/

  • Can be used with any host without requiring the installation of bzr on that host (Host agnostic)
  • Good Windows support, (with GUI support)
  • Can associate a commit with a bugnr, e.g. bzr commit --fixes bgo:12345 -m "Properly close the connection"
  • Supports lightweight checkouts — For those times when you really don't need the whole history.
    • FIXME: Would appreciate more detail on above. Only a file? Only a directory?
      • Stores no history, but checks out the whole tree. "Filtered views" are in development and allow checking out the whole tree.

Against Bazaar

  • Relatively slower compared to Git or Mercurial. Some very simple tasks (like branches of a remote repo) can take a very long time. 1 (Outdated ?)

    • New packs format has been introduced in 1.0, and fixes most of performance issues for network operations. As for local ops, they're quite fast.
    • for operations like branching from a remote repo, this can be improved if you use a smart server

  • In particular, the bzr benchmarks at http://bazaar-vcs.org/Benchmarks are run without any project history, which is unrealistic. GNU Emacs is in the process of moving to bzr, and has found that once there is extensive history, some commands which use it become unreasonably slow. However, the bzr developers have been responsive to these problems, and are working to fix them. See for example: https://lists.ubuntu.com/archives/bazaar/2008q1/039313.htm

  • bzr-gtk needs a UI usability study, and more compliance to the HIG.
    • Some patches have been posted for greater HIG compliance
  • Doesn't do EOL conversion / file encodings (win32/mac os x?)
  • FIXME: Saw something about not handling large files?

    • Currently limited to files that fit in memory (say 2-500MB)
  • Doesn't support nested trees yet (svn:externals)
  • Repositories are much larger than their Git or Mercurial counterparts. See here

  • 'bzr push --overwrite' allows to rewrite history and potentially wipe out history even on the server

Bazaar FYI

  • Disk usage is the same as a packed Git/Hg (bzr 1.0+)
  • SVN support is a plugin. Not as robust/reliable as git-svn (Citation please, or remove this *opinion*)
    • Although it works in most scerarios, svn plugin contains race condition which can lead to unintentional revertion of other people's commits (https://bugs.launchpad.net/bzr-svn/+bug/281460).

    • From my experience bzr-svn is truly reliable and robust, it is a bit slow though.
    • Old versions of bzr-svn badly leaks while importing a big number of revisions (>= 800), due a [http://subversion.tigris.org/issues/show_bug.cgi?id=3052|bug] in the Subversion Python bindings. This bug has been fixed in bzr-svn 0.4.11 by writing its own svn bindings, and also in Subversions 1.4.7 and 1.5.

  • Already in wide use at Ubuntu/Launchpad
  • SVN support as a plugin : truely integrated with bzr (svn repo and bzr repo are treated the same)

For Git

  • Relatively fast Benchmarks 1 (Benchmarks too outdated for comparison purposes)

  • Solid development community and commitment to upstream developments.
  • SVN migration tools (http://git.or.cz/course/svn.html)

  • git commands have individual man pages that, while semi-technical, explain deeply and often have a very instructive examples section.
  • True distributed merging. (what does that mean ? octopus-merge ? and on top any usable DRCS has a plugable merge strategy)
  • Bugzilla Integration (Link is giving a 404 error as of 18-Feb-2008)

  • Has hosted X.org since ~2005 (?) and Linux since June 2005, as well as a number of Freedesktop.org projects (it's the default fd.o SCM). These cases suggest it works well with very large projects, and it also means there is a large pool of migration experience to draw from.

  • giggle a nice, GNOME GUI application.

  • Network effect:
    • fd.o and many core GNOME hackers already use it
    • probably more extra tools/plugins/websites/etc due to this? (NEED FACT CHECKING)
  • Integration for Emacs, Vim and Eclipse available; TextMate is a work-in-progress

  • preserves history when moving parts of files (functions) between files. For example, git blame still knows which line was written by whom.

  • Provides CVS emulation layer (via git-cvsserver) to allow users to use a CVS client instead of git (useful for translators who don't need to learn git)
  • Very efficient repository storage, see here.

Against Git

  • Mediocre to Bad Windows Support
  • Complicated to use:
    • Git uses names for commands which aren't quite the same that SVN, CVS, Bazaar or most other VCS use. This could raise barrier of entry for those used to other systems who want to contribute. As for people new to VCS anyway, Git is much more difficult to grasp at first. Operation like rebasing, cherry picking and pack are as useful as difficult to fully understand for newcomers. In Git it's not enough that it works, it has to be clear also how, or you'll end shooting yourself in the foot.

    • Publishing a git branch to your public webspace is not obvious
      • you need to run git-update-index for plain HTTP to work: if you cannot run it on the remote host you have to run it on a local repo and then copy the repo with sftp
  • Mixture of C, Perl and bash script, which makes it far less obvious to port to other systems while maintaining the same feature set.
    • This is not true anymore. Almost all scripts that remain are very obscure. Even without them git would still cover everything the other VCSes cover.
  • No true extension mechanism (beside writing shell scripts ...) (Elaborate please. How is "true extension" better?)
    • It is not possible (as far as I know, correct me if I am wrong) to override/extend of an existing command (unless you overwrite the executable)
      • Aliases can be created (git youralias) they may not override, but can add new commands that either alias a more complicated a git command or a new script or another tool entirely.
    • Not all commands can be customized to use option flags differing from their default values.
      • most of the commands have customizable configuration options which can be tweaked on a per machine, per user or per repository basis.
  • Needs periodic repacking due to disk usage exponential growth, to avoid performance degradation.
  • I'm not sure if it's allowed to checkout only a part of history in Git (known as "history horizon"?). If true, this could be a major drawback.
    • Shallow clones, checking out just a few (or one) revision, are available since git 1.4 -- see the --depth argument to git-clone
    • Git provides for 'filter-branch', a tool that can split up a repository both by subdirectories or by history.
  • tags are not recorded in the history
    • security implications: they are not covered by the same crypto (SHA1) guarantees like the rest of the history
    • robustness implications: if one accidentally moves a tag the old value will not be recorded anywhere, making more difficult to revert the change
      • There is no command for moving tags. You can delete and recreate them, that action is recorded in a so-called reflog (by default for 90 days).

    • in general, one cannot say when a tag has been introduced, who introduced it or if it has been changed, but a lot of GIT users say this is the Right Thing(TM). It's not clear how to handle conflicting flags, though.
  • Partial checkouts (only certain sub-directories) are impossible in git. Submodules are not a good solution (see blogs Elijah).

Git FYI

  • Guaranteed content history (Strong SHA1 hashes mean that all content is checked and rechecked, corruptions in the repository are almost impossible. In addition, the system was designed so that git's repair/recovery tools would be effective. The safety of data was one of Linus' top priorities in design)
    • Monotone got stronger checks, and Mercurial got the same level of checks
      • bzr stores sha1s of the tree and revision contents and can detect corruptions in the database.
        • moved to FYI as it seems that most have the same checks (NEED FACT CHECKING FOR BZR!)
  • License won't probably move to GPLv3, since Linus bias towards it. This becomes an important point if GNOME decide to start moving to it along with the whole GNU project. Linking to part of Git and releasing with GPLv3 would be difficult, I believe. The importance of this issue remains to be valued, though.
    • Git is GPLv2, so it's "free enough" for GNOME usage
  • Some stuff that makes initial import easier (this isn't pro Git -- one time thing):
    • Graft points are a unique feature of git. No idea what it is used for (page isn't clear). For some reason this is useful for migrating repositories to git. X and Linux used them.

      • Grafts are "virtual" links in the git history. So if you have a git tree with versions gnome 0 -- gnome 2.22, and one for post-conversion development from gnome 2.23.0 -> future, these would be separate repositories. For some uses and the history-interested, you can join them in a large repository with a graft (virtual link) between the top of the old one to the start of the new one. Otherwise git would require rewriting the new part (see Guaranteed content history).

    • git-filter-branch is extremely useful while importing a repository. It enables all kinds of cosmetics to the newly imported history.
  • SVN integration through git-svn
  • Core feature set includes history destroying commands.
    • OlavVitters: IIRC this can be disabled in the central repo by non-fast-forward pushes. What people do locally is of no concern.

    • UlrikSverdrup: You can fix mistakes to ease code review etc

For Mercurial

  • Simple: it is possible to use it effectively with knowledge of only a few commands, ie commit, pull, update and merge. This can be learned by reading a couple of pages in the hgbook, and summarized into recipes for eg translators in even less.
  • Reliable: uses the same model of GIT, with changeset identified by SHA1 hashes
    • The internal storage is append-only and after an interrupted transactions a simple truncate() is done to restore the repo coherence
  • Fast: the internal storage is designed to minimize the disk seeks done to checkout a revision
  • Quite user friendly. It displays helpful messages after command execution. For instance, when a changeset is pulled from an external branch, it will tell you that the changes are fetched, but not yet applied to the repository, suggesting the update command which will do that.
  • Mostly Written in Python, with just a small extension in C
  • Great performance on both Windows and Linux (Citation please)
  • Very Small disk usage.
    • See the evaluation report prepared for Opensolaris. It provides some evidence about low disk usage and performance.

  • Lots of extensions that add extra functionality (keyword expansion, incremental converters, patch queues management, graphical viewers...)

  • Hooks to implement custom checks, either with external scripts

  • Packaged for Windows, MacOSX and UNIX systems
  • Excellent documentation (http://hgbook.red-bean.com/ and http://www.selenic.com/mercurial/wiki/index.cgi)

  • Used for large projects, such as Mozilla, Opensolaris, OpenJDK, Xen
  • Easy to share changes (other than with the usual 'hg pull'/'hg push'):
    • 'hg import'/'hg export' for diff in GNU and git format
    • 'hg serve' to quicky instantiate a local HTTP server for others to pull from
    • 'hg bundle'/'hg unbundle' to safely and efficiently send groups of changesets via email
    • 'hg email' to send the diffs for a group of changesets for review
  • Uses a smart protocol over HTTP with a simple cgi, thus gaining efficient communications event through HTTP proxies
    • With the cgi you can pull from the same address used for visualizing the repo (e.g. http://www.selenic.com/hg/) reducing the chances for confusion

    • It's even possible to allow push over HTTP(S)
    • Static HTTP is supported, but obviously very slow
      • A more efficient, but less automated, way of publishing some changes with static-only HTTP hosting is offering bundles to be manually fetched from other users
  • Bugzilla Integration

  • quilt-like changeset editing with Mercurial Queues (usable only for unpublished changes, like git-rebase, though you can publish these queues)

  • simple tools for managing authorization and access control to repository collections using ssh-keys

  • an improved submodules support integrated in mercurial core is in the working for the next versions

Against Mercurial

  • "Limited" core feature set (when compared with other 2) (extra functionality is added by extensions). Features that are not in core normally fall in these two categories: they alter history or they're needed by few users.
  • Preliminary SVN write-access functionality, currently being worked on with a GSoC project to improve SVN conversion tools

    • It needs the history rewriting support also being worked on for another GSoC project to implement a rebase feature for mercurial. The rebase feature is already done and distributed along with main mercurial.

Mercurial FYI

  • Tags in Mercurial are part of the history, in a simple text file called .hgtags (this could be put in the "For" section, but a lot of GIT users says that Mercurial got tags *wrong*)
    • local non-versioned tags are created with the --local switch which puts them in .hg/localtags
    • conflicts in .hgtags are handled in a special way by the merge logic
    • the set of current tags is computed by merging the information from the .hgtags from every head in the repo plus the local tags
  • There is no explicit "resolve" command, it's up to the user to avoid commiting files with conflict markers
    • This is no longer true, and a core resolve command exists for some time now.

  • If non-trivial conflicts are found, the merge is handled by the external 'hgmerge' shell script which defaults to an internal merger or meld/kdiff3 or some other commonly found tools. It can be tweaked to the desired merge strategy.

Other Information

DSCMClientsBenchmark

There are some nice Wiki articles BzrForGnomeDevelopers and GitForGnomeDevelopers.

Other Comparisons

Consider building up a list of desired features and seeing how the various systems compare for GNOME's purposes. This can be a lot more focused than just a feature list. The most useful template is probably FreeBSD's but there are several:

From the above notes here's a start at such DSCM features list:

  • Ease of use:
    • What is the basic command set? (This might be similar across all SCMs. It's at least: clone/branch, update/checkout, commit, diff, pull, push, merge. Allow for variations in command names for particular operations.)
    • How large is the full command set?
    • How mature is the documentation?
    • How hard is it for an inexperienced user to "just get started"?
  • Efficiency:
    • How fast are the basic commands?
  • Power:
    • Are there features that no one else has? (This is obviously flame territory, but there are things that each of them does particularly well.)
  • Extensions:
    • How hard is it to add new features?
    • Are new features added as plugins?
  • Portability:
    • Does it run on Linux? On Windows? On MacOS [does GNOME even run on MacOS??]
  • Server support:
    • How hard is it to run a local server to make branches available?
  • GNOME integration:
    • Are there graphical interfaces over the tool for GNOME?
  • SVN integration:
    • How easy is it to use the tool against a Subversion repository?
    • Is push supported as well as pull?
  • Tool integration:
    • How well integrated is it with other tools? Examples are Bugzilla, quilt and so on.

Discarded/Invalid arguments

Instead of deleting. Move obvious non arguments here. This will probably end up as a flamefest ;)

  • (For Bazaar) Not as slow as it used to be. (this is to remind that the benchmarks are not very accurate)
    • So?
  • (Against Git) Does "over hyped" count as an argument against ? (for example mercurial should get more attention than what it gets).
    • No "over hyped" is emotional.
  • (For/Against Bzr) Launchpad integration isn't something to skim over.
    • Launchpad is not Free Software. Why does it matter?
      • Launchpad provides useful services (just like gmail, flickr, etc); if you don't want to use it for political reasons, then don't.
    • Disk space is cheap. How relevant is this?
  • (For Git) Some relevant Git documentation (How this relevant for comparison purposes?)

  • Back and forth about C / Python vs the C&&bash&&perl :

    • This applies to any SCM not written in C.
      • I wouldn't tend to agree, both bzr and mercurial are written in python and they provide the same set of functionality across the supported platforms, without involving hacks.
      • and since when Perl is a "hack"? there are a couple of different releases of Perl for win32 - including the commercially supported ["http://www.activestate.com/" Active State Perl]. as for bash, it's been out there for ages, so I don't consider it a "hack". if you depend on a non-compiled language then you'll have to depend on the availability of the interpreter on the target architecture.

        • I think you are misunderstanding what was meant by "hack": I don't think it means that Perl or bash are hacks. The question is: how much hacking around do you need to do to get the tool running on a new system? Having the system be in multiple languages is a potential porting hazard. In the case of git the main issue is bash, which for Windows presumably means it needs Cygwin to run. That's not true of the C and Perl parts.

and the winner is...

Git

Attic/DistributedSCM (last edited 2013-11-22 23:18:38 by WilliamJonMcCann)