rfc:dvcs

This is an old revision of the document!


Request for Comments: Choosing a distributed version control system for PHP

  • Version: 0.1
  • Date: 2011-07-30
  • Author: David Soria Parra <dsp at php dot net>
  • Status: Draft
  • First Published at: http://wiki.php.net/rfc/dvcs

Introduction

PHP uses Subversion (SVN) as its version control system of choice at the moment. These has a few drawbacks. A decentralized version control system can solve some of these drawbacks. This RFC aims to provide information to choose one of the proposed decentralized version control systems (DVCS).

Current Situation

Subversion is used to host the main PHP repository and sub projects such as PEAR and PECL. Access is granted through an implemented Karma System (KARMA). Using Subversion has drawbacks:

  • requires network access
  • slow log/annotate commands
  • large checkout sizes
  • single point of failure
  • painful merging
  • no implizit consistency checks

Decentralized version control system can solve this:

  • better merging support
  • consistency checks using SHA1 checksums
  • local repository, no network access required to commit
  • network access only for push/pull
  • no single point of failure, easy to setup multiple hosting
  • advanced features such as rebase to linearise history, bisect to find regression bugs
  • “social coding platforms” enabling developers to easily submit patches.

Decentralized version control system have some drawbacks:

  • no partial checkout of subdirectories
  • no empty directories, .keep file needed
  • no global unique incrementing rev numbers, sha1s are global unique revnums
  • no svn:externals, no svn:eol-style

Overview Competitors

Git

Git was written by Linus Torvalds as a replacement for BitKeeper that was used for Linux Kernel development until 2005. Git is used by various large Open Source Projects, including Perl, VLC and Gnome. It is considered the fastest Open Source DVCS.

Git is written in C, Shell and Perl. It runs under Linux, BSD and Mac OS X. A Windows version based on msys is available through the msysgit project.

URL:          http://git-scm.org
Mailinglist:  http://vger.kernel.org/vger-lists.html#git
Wiki:         https://git.wiki.kernel.org/
Version:      1.7.6

Mercurial

Mercurial was written by Matt Mackall in 2005. It is used by large Open Source Projects like OpenJDK, Python and Mozilla. It is written in Python with some modules written in C for performance reasons. Mercurial is available for Linux, BSD, Mac OS X and Windows. Mercurials command line name is 'hg' - a reference to the symbol of the chemical element Mercury.

URL:          http://mercurial.selenic.com
Mailinglist:  http://mercurial.selenic.com/wiki/MailingLists
Wiki:         http://mercurial.selenic.com/wiki/
Version:      1.9

Concepts

While every version control system has it's own terminology, some terms are used in every decentralized version control system. Here is a list of common definitions:

repository

  A collection of revisions, organized into branches

clone

  A complete copy of a branch or repository

commit

  To record a revision in a repository

merge

  Apply all the changes and history from one branch or repository to another

pull

  To update a checkout/clone from the original branch/repository, which can be
  remote or local

push

  To copy a revisions from one repository to another

Revision Model

Git and Mercurial use string representations of SHA1 checksums to identify a changeset. Both version control systems offer reserved names to access often used changesets such as the topmost commit. In both systems a user can specify only the a part of the full SHA1 as long as this part identifies a single changeset.

In addition to global revision numbers, Mercurial offers local revision numbers. They are incrementing integers that can be used to indentify a changeset. Multiple repositories of the same project do not necessarily have the same local revisions.

Branching Model

Git and Mercurial have fundamental differences in their branching model.

Git uses pointers to a changeset to define a branch. Every ancestor of a changeset that is marked that way is part of the branch. If you delete the pointer the name of the branch is gone and can only be recovered using the so called reflog if it's not yet expired. This means that you cannot bring back the name of a branch after a few years.

Mercurial on the other side records the name of the branch in the changeset itself. Once you've comitted to a branch, the branch name will stay. You can close a branch, but you cannot remove the branch name without altering history. The drawback of this approach is that branches are not suited very well for small living test branches as naming conflicts can occur. Mercurial offers so called Bookmarks and Anonymous Branches that work similar to Git's branching model to solve this.

Workflows

The following section describes typical work flows. Note that not all Subverison work flows translate one-to-one to a DVCS.

Setup

Git
  git config --global ui.user "David Soria Parra"
  git config --global ui.email "dsp@php.net"
Mercurial

Edit ~/.hgrc

  [ui]
  username = David Soria Parra <dsp@php.net>

Checkout and Patch

Git
  $ git clone git://git.php.net/php-src.git
  $ git checkout PHP_5_4
  ... hack Zend/zend.c ...
  $ git commit Zend/zend.c
  $ git push
Mercurial
  $ hg clone http://hg.php.net/php-src
  $ hg update PHP_5_4
  ... hack Zend/zend.c ...
  $ hg commit
  $ hg push

Port patches across branches

Git
  $ git checkout master
  $ git merge PHP_5_4
  or
  $ git checkout trunk
  $ git cherry-pick a32ba2 # assuming a32ba2 is the commit to port
  
Mercurial
  $ hg update default
  $ hg merge PHP_5_4
  or with the transplant extension installed
  $ hg update default
  $ hg transplant a32ba2

Releasing a version

Git
  $ git checkout PHP_5_4
  $ git tag --sign v5.4.1
  $ git push origin v5.4.1
Mercurial
  $ hg update PHP_5_4
  $ hg tag v5.4.1
  $ hg push

Backport Patch

Git

To backport a changeset you can use the git cherry-pick feature.

In some circumstances this can lead to duplicated commits that can cause troubles during merges, so backporting a feature is discouraged. Try to apply a patch to the oldest currently maintained branch and merge this branch to maintained release branches.

  $ git checkout master
  .. hack hack ..
  .. commit rev 3ab3f
  $ git checkout PHP_5_4
  $ git cherry-pick 3ab3f
Mercurial

To backport a changeset you can use the hg transplant feature from from the transplant extension that is shipped with Mercurial.

In some circumstances this can lead to duplicated commits that can cause troubles during merges, so backporting a feature is discouraged. Try to apply a patch to the oldest currently maintained branch and merge this branch to maintained release branches.

  $ hg update master
  .. hack hack ..
  .. commit rev 3ab3f
  $ hg update PHP_5_4
  $ hg transplant 3ab3f

Moving extension from/to core to/from pecl

We will use separate repositories for PECL and PEAR modules. php-src will be a separate module. We need a mechanism to move extensions from PECL to core and vice versa.

Both Mercurial and Git support subrepositories (called submodules in git). These are external references to repositories. The advantage of this approach is, that it's very easy to add and remove modules by just modifying the external references. The drawback of this approach is that you will not have a combined history log of all subrepositories. Commits across multiple subrepositories will lead so separate commits. Mercurial and git will not know that these commits are related.

An alternative to this approach is the use of subtree merges. You can merge a repository into a subdirectory of a repository. This way you will end up with the merged history being a full part of the repositories history. The drawback of this approach is that you need a depth knowledge to perform such merges or splitting repositories again. A similar approach can be used with Mercurial by using the convert extension and merge.

In our use case it makes more sense to use subtree merges. Moving extension from or to core doesn't happen frequently and the overhead in performing the merges and splitting is worth the benefit of having one php-src repository that contains all extensions including their full history.

Tools and Platform Support

Operating Systems

Mercurial is available for all major platforms: Linux, BSD, Mac OS X, Windows. All core features are available on supported platforms.

Git is available for Linux, BSD and Mac OS X. Windows binaries based on msys are provided by the msysgit project. As early Git versions primarly targeted Linux, some commands can still be slower or even non-existant on Windows.

CLRF -> LF

Git supports CRLF to LF conversion. This can be configured using the variables core.autocrlf, core.safecrlf and gitattributes.

Mercurial supports CRLF to LF conversion using the EOL extension.

GUI

Mercuial: Various GUI tools are available. TortoiseHG, HGK, MacHG, Eclipse, Emacs, etc

Git: Various GUI tools are available: TortoiseGit, gitk, git-cola, qgit, Eclipse, Emacs, etc

Web

Unqiue features

Git

Index

Git implements a index (also called staging area) between repository and working directory that keeps track of changed files. Only changes that are tracked in the index are part of the next changeset upon commit. This makes it possible to just “stage” parts of the changes made to the working directory (see: git add -p). The drawback of this approach is that you have to manually stage a change or use the --all switch in git commit. While this is a powerful feature, it can confuse people coming from other version control systems.

Separate Author and Commiter

Git separates author and commiter and records both in a changeset. A commit can have a different author than a committer. This is useful for PHP as a patch from the mailinglist will be committed with the original author and the information who committed it, making it easier to identifier who wrote the patch initially.

Mercurial

Local Revision Numbers

Both Git and Mercurial use SHA1 to identify a changeset and ensure it's globally unique. The string representation of a SHA1 is the revision number. Incremented integers cannot identify a changeset globally but they are useful shortcuts in local repositories.

Mercurial uses incremented integers, similar to SVN revision numbers on changesets. These can be used on a local repository to identify a revision. Git only supports SHA1.

Changeset

  changeset:   75485:61e266b471e4
  branch:      PHP_5_4
  tag:         tip
  parent:      75482:b5e860dc2f05
  user:        sixd@c90b9560-bf6c-de11-be94-00142212c4b1
  date:        Mon Jul 25 17:30:09 2011 +0000
  summary:     Patch r313663 and r313665 to allow PECL builds to work with earlier releases

has the local revision number 75485 and the global 61e266b471e4. =

Revsets and Filesets

Mercurial has a powerful query language to select changests or files to include in log, diff and similar commands. For example, to search for the last tagged revision that includes a given changeset you can run the following query:

  hg log -r 'limit(descendants($1) and tagged(), 1)'

Extensions

Mercurial is written in Python and supports loading and executing Mercurial extensions written in Python that can access the internal Mercurial API.

Mercurial extensions are a common way to implement additional features such as rebase, commit signatures and access control. Mercurial ships with a set of core extensions. A full list of extensions can be found in the Mercurial wiki.

Benchmarks

The tests were done with Git v1.7.6 and Mercurial 1.9 on a Thinkpad X201, Intel i5 M540 @2.53 Ghz, Samsung 128 GB SDD, 4 GB RAM.

For the php-src repository (not the complete repository):

Benchmark Git Mercurial
Repository Size 120 MB 197 MB (with hg 1.9 generaldelta)
Switching branch trunk -> PHP_5_4 0.182s 1.328s
Annontate file (Zend/zend.c) 2.936s 0.745s
Log over the last 1000 commits 0.752s 1.111s

Hosting Infrastructure

The main repository will be hosted at php.net. This will make implementing infrastructure or scripts easier and gives us full power over our development environment.

Other hosting sites can be used to attract more developers. DVCS make it easy to push a repository to different locations and keep them in sync.

Git / Github

The most popular hosting site for Git based projects is github.com. Github encourages people to interact with hosted projects by making it easy to clone a repository and send a “pull request” to the upstream project.

Github reserved a PHP user for the PHP project wiht unlimited public repositories. We can use this account to create a repository and use pull requests from github to integrate into PHP.

A typical workflow for example is:

1. Pull pull request from github.com
2. Merge locally
3. Push merge to git.php.net
4. Automatic sync between git.php.net <-> github will push the changes
   to github. The pull request will be closed automatically.

Subversion Integration

Github supports a SVN bridge. You can checkout and commit to a repository using SVN. Advanced subversion features such as properties do not work.

Mercurial / Bitbucket

Bitbucket is a popular hosting site for Mercurial projects. It is build similar to github, although the amount of hosted projects is smaller then on github.

A typical workflow for example is:

1. Pull pull request from bitbucket.org
2. Merge locally
3. Push merge to hg.php.net
4. Automatic sync between hg.php.net <-> bitbucket will push the changes
   to github. The pull request will be closed automatically.

Subversion Integration

Bitbucket supports a SVN bridge. You can checkout and commit to a repository using SVN. The subversion integration is in beta phase. Commits can fail under certain circumstances.

Implementation

The implementation details are handled in subsequent RFCs. Important issues that can influence the decision are outlined in this RFC.

We will not convert the whole SVN repository at once. As Git and Mercurial do not allow checkouts of subdirectories. We have to split up the repository into modules. To guarantee minimal downtimes and having a smooth transition, we will move module after module. The first module will be php-src and the karma system. Other modules like systems will follow.

Migration RFC

The migration of modules is handled through the following RFCs:

  1. Migration of php-src and KARMA

more to follow.

Discussion

In this section I'll try to outline some qualitative evaluation based on discussion that I had about this topic with people from the PHP community and with other users of Git and Mercurial.

It seems that Mercurial is considered easier to learn when coming from Subversion. It's extension system keeps the core commands simple while advanced users can add more features. This extensibility can be useful when it comes to implementing the Karma system or documentation related translation synchronization. Mercurial comes with excellent Windows support. It also has a very capable HTTP support making it easy for people to pull and push through proxies. The ui is kept simple and Mercurial does not expose low level commands. For people without much knowledge about version control systems and people coming from Subversion, Mercurial is more suitable than Git.

Git is considered the most used DVCS in the Open Source community. Git does not have a plugin system, but the Karma system can be implemented through push and pull hooks. Git offers HTTP support that can be used to push and pull through a proxy. Newer Git versions offer a smart HTTP protocol, that can be considered equally good as the Mercurial HTTP support. Due to it's history concepts and the large set of commands, git has a higher learning curve than similar systems. Git is widely used by Open Source projects written in PHP, such as Zend Framework, Symphony 2, phpBB and xdebug. The most important argument in favour of Git, however, is not git itself, but Github. It has a large user base and makes it very easy for people to participate on a project. It's far more popular than Bitbucket.

Asked for a evaluation which system to choose in the end, I personally would lean towards Git. It seems that a lot of developers already know Git. Also Github is a very important factor to enable people to contribute to PHP.

Further Readings and References

[1] http://code.google.com/p/msysgit/

Further Readings:

Changelog

rfc/dvcs.1312746122.txt.gz · Last modified: 2017/09/22 13:28 (external edit)