Defying Classification

by Malcolm Tredinnick

Fri 2 Nov 2007

Development experiences with version control

Posted at 23:34 +1100

I recently received an email from somebody asking for about experiences using git to do development for Django. It turned out they weren't interested specifically in git, but in how it was working using a different system from the upstream master repository.

Well, I can write a few words about that. I've written about some of my previous experiences here and here, so this is an update based on accumulated experience.

Frankly, I don't really care which version control system people choose to you. Smart developers are going to be vaguely familiar with most and very familiar with the one or two they mostly work with. The barrier to exchanges between different systems is annoying at times, but survivable. The patch program exists for a reason and is still quite adequate for its purpose. So this isn't going to be a post about the why I'm right and others are wrong. It's just not interesting to me. At the base level, most distributed version control systems are very similar, despite all the heat generated about their differences. I'm not working in an area where the differences matter that much, so I can afford to pay attention out of interest but not care too much. I've made a choice, but it's not necessarily the right one for other people.

Although, as some of my earlier posts indicated, I have used cogito and st-git in combination with git in the past, these days I am primarily just using git. Most of the cogito features have been made easier in git directly and it turns out I just don't manage patches in a way that st-git helps; branches and rebasing the branch against the master branch is easier for me (see below for the branch discussion).

My requirements

For a project like Django, I have a number of different hats to wear. These include developing new, non-trivial features and applying patches from others. The latter case may be as simple as applying the patch and giving it a quick polish (I've written before on mailing lists that only about two out of every five submitted patches I review go in unchanged on a good day and that seems to match other maintainers' experiences, too), or taking the idea in the patch and substantially rewriting it to be more robust or maintainable, or fill in some holes, or make it a more complete solution.

So I need to be able to track one or more branches from the subversion repository and keep my local changes in sync with them. I'm often working on more than one item at a time, so I'll have a number of local branches going. I'll often have code that takes anywhere from a few days to a few weeks or more (currently, one branch is over a year old) before it lands upstream.

I also tend to work primarily at my desktop machine: reasonable chair, reasonable monitor, decent keyboard. Laptops simply don't provide good ergonomics (the keyboard is attached to the base of the screen, so unless you're an elf, that simply encourages bad posture and eye movement); I choose not to use mine as my primary machine. Still, I do travel a lot and don't mind the occasional lazy morning spent in a coffee shop, pecking away at a problem. So I need easy synchronisation between my local work on the desktop machine and my laptop. Again, any distributed VCS will do here. But the key is "distributed" -- requiring a third, central repository is too much overhead.

Lessons learnt

Names

Sensible naming of branches has become important to me. Git encourages creating branches quickly for temporary work or development that is unsynchronised with whatever "head" might mean at the moment. However, this temporary work can sometimes take on a life of its own or be interrupted by something else, so using a good name that makes sense tomorrow or next week is a sensible precaution.

Because git allows branches to be names like directories, I've gotten into the habit of using prefixed like local/ for branches that are strictly for local development and I would never push to a public repository (not so relevant for Django work, but important for cases where I publish a repository of branches and I like to be consistent across all projects). Similarly, local branches based off a particular subversion branch will tend to have names that look like the subversion branch.

Right now, my local git repository has these local branches:

autoescape
field-subclassing
i18n-fix
master
qs/optimise
qs/refactor-master
qs/select-related
urlresolver

plus a number of remote branches:

git-svn
git-svn-queryset-refactor
laptop/autoescape
laptop/field-subclassing
laptop/master
laptop/qs/optimise
laptop/qs/refactor-master
laptop/qs/select-related

Without even revealing the secrets of my naming scheme, you can probably puzzle out which of the remote branches are for syncing with the laptop and which two are the upstream subversion branches I'm tracking. In the local list, it should be clear which of my local branches are related to the queryset-refactor work, just from the prefix.

Branch freely

Bill de hÓra wrote last month about the advantages he's seeing in being able to branch easily for both permanent and temporary purposes. I've seen similar benefits in my work.

Being able to cherry-pick particular commits from one branch to merge into another branch is something I wouldn't be without now. As well as the ability to both merge with history (merging branch A into branch B along with the commit messages from branch A in the right chronological places) and in a "squashed" version (merge branch A into branch B as a single commit as though it were just one patch) are both handy.

I recently posted a screenshot or some commercial work I was doing where git was helping me with my local development. I could juggle four or five simultaneous feature developments and have them tested and approved in pieces. By the end of two weeks with the client, I was back down to everything merged into the master branch. That kind of work is just harder if branches aren't effectively zero cost to work with.

Subversion integration

Here's where things get a little git-specific, I guess. Git has fairly good subversion integration in two formats. One is git-svnimport, for maintaining a read-only copy of a repository. The other, git-svn, allows both committing and updating from the subversion repository. I only use git-svn these days.

I used to just do a subversion checkout and then use git to manage local branches, but I had a number of near missed when I typed svn commit instead of cg commit (I was mostly using cogito, a layer over git, in those days) and I needed a safer way. Fortunately, git-svn is actively developed within the core git source and works acceptably well.

There are a few problems in the transition phase. Notably, working out where two upstream branches were merged is difficult. You can tell git to graft the two local copies at a particular revision, but it seems there are still some bugs to be worked out and I'm not convinced it will always commit to the right branch the next time I run git svn dcommit. I am terrified of accidentally pushing to the wrong upstream branch or screwing things up in other ways upstream, so until I've had a chance to test much more thoroughly locally, I'm staying away from grafting. It's not too hard to do things manually at the moment.

Merging between two subversion branches (copying the changes from trunk to a branch, mostly) is a little fiddly still. There are scripts to help make things a little easier, but they're not perfect yet. I use a small variation on this one, but it still can't handle commits that change nothing but subversion properties (see below), which show up as empty commits in git. So sometimes you have to hit things with hammers. Rare enough to not be a showstopper. Annoying enough that I don't look forwards to it.

One of the things I think subversion got wrong in their original design is that they got rid of the .cvsignore equivalent and decided to store all properties in the .svn directories. There are some advantages to this, but it's slightly over-engineered for the common case. You can't easily supply a patch to update subversion properties and you absolutely need to be able to set svn:eol-style and it's handy to be able to set svn:ignore on certain files. If properties were a text file stored as part of the code, it would be easy to work with other version control tools. You could diff .cvsignore, for example. As it is, git can't work easily with subversion properties at the moment and you can't just ignore that for real-world projects.

The good news is that git-svn development is currently focusing on this area, so there's hope for the near future. The bad news is that right now it's a pain in the hindquarters and the one thing I really hate about cross-VCS work.

Portability

As I mentioned above, I chose git because it works for me. I don't publish any git repositories at the moment, although that's mostly out of laziness. I should publish a lot more. The point is, though, that cross-platform portability isn't a big issue for me. It's annoying, though.

There was a Google Summer of Code project for git to try and improve this a bit. It seemed to have moved along to a certain point — the idea being to port everything to use C, rather than a mix of C, Perl and Bash — but I didn't get the feeling that it was a top priority.

Some of the features I rely heavily on, such as git-svn use Perl modules underneath to do the heavy lifting. So I might not be in a position to use a C-only version for a while. I'm periodically seriously tempted to help out on this porting effort. I like C programming and need to stay in practice. But it's a lot of work and I'm not overrun with free time at the moment, so my commitment would be intermittent at best.

Still, the fact that I can't share git repositories easily with Windows users is a downer. I don't approve of their choice of what we'll charitably call an operating system (for purely technical reasons having nothing to do with "open source politics"), but often they don't have a choice, so I won't hold them hostage over it.

I do have to smile when people ask why I'm not using Mercurial, because it's written in Python. Apparently, in their world, one is only permitted to know a single programming language and one must always be prepared to fix bugs in the software they use. Nonsense! Firstly, only programming in one language must be very limiting. As I said, I like C. It has its place in the world and not knowing it would be very limiting for lots of things that I'm interested in. I also know Perl and shell. So if I wanted to hack on git, the languages wouldn't be a barrier to entry. But I'm also not really that interested in working on it. If I find a bug, I can understand enough to make a valid report, but mostly it works fine for my purposes. I don't have time to write new features, and, again, my choice of VCS is hardly going to change that. I can add extensions to git if I need to and it's designed to be usable in scripts, so incorporating it into other processes is very easy.

Size and speed

This is an area where I haven't done an in-depth comparison between systems. Git is fine for my requirements. You periodically have to run git gc to compact the repository, but that takes a few seconds to complete on a slow day, so big deal. As I mentioned in an earlier article, git's storage requirements are minuscule compared to subversion. And I'm storing a lot more information in git than I am in a single subversion checkout (the whole revision history for each branch, for a start). My current Django repository is 30 MB, with the .git/ directory taking up 15 of those megabytes. That's for all the branches combined.

Speed isn't an issue for me. As usual, "fast enough" is the requirement here. For the things I do, annotating files for changes, viewing logs, bisecting to find the source of an error... everything feels as fast as I might wish.

Topics: software/django, software/version control