Archive

Posts Tagged ‘mercurial’

The “D” in DVCS stands for “Different People”

September 6, 2010 4 comments

Someone on Stack Overflow disagreed with me about using centralized version control for a solo project. As I predicted, of course; as I said in my last post, DVCS is a fad and it will have many converts who support it in a knee-jerk fashion. I think the person who disagreed may just be misinformed, or not thinking about this hard enough. I may not have made things clear in my last post, however, about why centralized version control makes more sense for a solo developer. Also, I admit that it’s paradoxical that someone would use centralized version control for a solo project. However, as I’ll show, centralized version control does make more sense (not only that, but I’m not the only one who thinks so).

Consider this: if you are a solo developer working on the same machine all the time (e.g. a laptop), a DVCS repository in your current working directory and a Subversion repository in your home directory are practically the same. The repository is always online and you can always make commits. This is what makes the “commit while offline” argument of DVCS proponents so weak: for certain situations you can do that with Subversion, or most of the time you won’t need to.

Now consider that if you are a solo developer working on multiple machines, DVCS only creates an extra step in your development. I always work on my big ol’ workstation during the day. If I were using git or hg for my most-frequently-worked-on projects, I would need to remember to push my changes from one repository to the other. With Subversion I just don’t have that step. All other steps are identical with the two workflows.

All through those above arguments, I am only one developer, and this is the crux of the idea. I think people often confuse the situation of multiple computers with that of multiple people. In the former argument, I posed that with one computer there is no difference between using file URLs and using a DVCS repository in the current working directory. However, there would be a difference if there were multiple developers. Then you have to configure access to a single machine (either for cloning or checking out) for multiple users; this is when things get more complex.

The “D” in DVCS stands for “Different People.” What I mean by that is that the “forking is fundamental” argument posed by DVCS proponents doesn’t apply when there’s a single developer or author working on a project. If by myself I want to create branches and work on different features, that is perfectly easy to do with Subversion, and so is merging. What distributed version control is for different people maintaining different features and seeing how they work. I completely reject the idea that centralized version control is obsolete and I will keep recommending it to solo developers.

Distributed version control: I get it, but I don’t need to

September 4, 2010 4 comments

Over the past few months I’ve been working with new tools to enable me to work on several machines and keep the same important data, such as configurations and bookmarks. The impetus for this was that I’ve started to use a laptop: I never wanted one, but then somebody on the TriLUG mailing list offered one for sale for $20, and I just bought it. It’s really useful for when I’m watching the kids or when travelling: I don’t have to use someone’s antiquated, slow Windows computer just because I’m visiting them. Also my wife needs to be on our main machine after we put the kids to bed. Short of acting out my fantasy and installing a server with LTSP terminals throughout the house, the laptop is good for me to keep working. Unfortunately the usual routine of setting up my shell, Emacs, Firefox, email and my Org Mode agenda files seemed so laborious that I realized “there’s potential for automation here.”

Sync Tools

To tackle email, I switched from MH to using IMAP. For Firefox I started using Weave. I was using Delicious for a long time, but Delicious is not free software so I decided I didn’t trust it. Weave is as free as Firefox, so when I heard about it I decided to go for it and it’s worked really well. I rarely used the social aspect of Delicious and mostly used it for portable bookmarks.

For the other two areas, Emacs and Org Mode files, the solution was less clear. I had tried using ssh urls with tramp to have my agenda on multiple machines, then I saw Carsten Dominik’s Tech Talk about Org Mode, where he described using git to manage his Org files on multiple computers. For config files (shell and Emacs) I had tried using rsync to mirror them on different machines, using a Makefile with rsync commands. However, different needs of different machines would always screw things up. Then I remembered that version control might be the right tool. I had tried that before with Subversion (SVN), my main version-control system (VCS), but things had not gone much better than with rsync. Then I thought perhaps a distributed version-control system (DVCS) would make more sense.

Using DVCS

My first impetus for using something other than Subversion was that I’ve discovered having one project per repository makes the most sense; that way I can make branches and not worry about confusing anybody. So I have a repository for my webpages, and a repository for my biggest project. That works well. However, I also started working on a book (i.e. it became a book) and it really didn’t fit in either repository. I came down to a choice between adding another Subversion repository, with all the Apache setup, or using something else that would be more convenient. Although setting up Apache is not hard after the third or fourth time you’ve done it, I still felt like it was unnecessary. I knew I would be the only person working on this, and therefore something that I could configure to use ssh made the most sense.

This is the most compelling argument for distributed version control: it’s easy to set up a repository and it’s easy to configure access for local system users. With Mercurial (similar for git and bzr), you just do

joel@chondestes: ~/tmp > hg init repo
joel@chondestes: ~/tmp > cd repo
joel@chondestes: ~/tmp/repo > touch myself
joel@chondestes: ~/tmp/repo > hg add myself
joel@chondestes: ~/tmp/repo > hg status
A myself

That to me is really compelling. Setting up a Subversion repository is also pretty easy, depending on where you want to put it, but configuring access from the internet is not as simple. I can access the above-initialized Mercurial repository just using an ssh url argument to its clone or pull commands.

Another thing that is really good about DVCS systems (all that I’ve tried) is that they’re really good at merging. They do not have a monopoly on this, however; Subversion has been good at merging for as long as I’ve been using it. Again, I may be different from the people writing the manuals in that I don’t work on large projects with lots of contributors.

For some reason, however, the biggest advantage of distributed version control touted by its proponents is that you can make commits when you don’t have internet access. Wow! That is huge. Oh wait, that’s sarcasm. I am in this situation pretty often working on my Subversion projects and it really doesn’t bother me. If I’ve come to a point where I should make a commit and I don’t have network access I can do one of two things: I can make a ChangeLog comment and keep working, or I can stop working. I always have other things I can do. Seriously I don’t see this as an advantage, especially when if what you want is to update another repository you have to push your changes anyway, and that requires network access. Committing changes to my Org-files would be useless unless I could push them to the computer I know I’ll be sitting at in the morning.

Another ridiculously inflated advantage proponents mention is that you don’t have to worry about breaking the build when you commit, because you make your commits to your local repository. I have spent another blog posting on this concept already, but again this is not a distinct advantage. I commit things that are broken when they’re broken already, but not if I’m adding a new feature. If you want to commit something on a new feature that might screw other things up, the proper way to do it with centralized version control is to make a branch. It seems like some people don’t think this is possible with Subversion, but I’ve been doing it since the beginning. Not only is it possible, it’s recommended.

There are two more big problems I have with distributed version control. First: it’s a fad. I don’t mean that like it’s something that is overblown and bound to die out like MC Hammer. However, it seems like everyone is switching to it and citing the same bad reasons. That to me seems like a warning. The rest of us who know how to use Subversion will just keep on going doing it the right way, reading the manual.

My second big problem that people who like DVCS seem to love is this “fork happens” idea. Forking is when there’s some kind of split in the development philosophy or goals between project members that leads to factionalism. The most famous example is the creating of XEmacs. The author of The Definitive Guide to Mercurial uses socially-oriented rhetoric (thank you, Eric S. Raymond) to justify distributed version control. He says basically that forking is natural and we’ve all been in denial by using centralized version control. Using DVCS on the other hand, brings us out of our comfort zone into some new promised-land where we all have our own fork.

This argument doesn’t really hold up. As others have pointed out, the idea that you’re always working on your own fork is kinda ridiculous. Unless you’re happy to just keep your own version, your contribution to the larger piece of software will always have to be transmitted to someone else. Why would you develop software in a vacuum?

The same Definitive Guide author says that some people use centralized version control because it gives them an illusion of authority, knowing that people won’t be off-putting out their code as someone else’s. While that’s possible, it certainly goes against the tone of the Subversion book, which encourages administrators to let developers develop their own branches freely. And again, even if you don’t have the illusion of authority over a project, you’re going to have to express authority at some point and decide whose changes to merge into the mainline. Now who’s delusional? People don’t want to download some dude’s fork, they want to download from a central point, where all the developers and users meet to discuss and code, and decide which changes should make the software the best.

Which DVCS?

My previous experience with distributed version control was using git to maintain my webpages. There was so much about it that didn’t make sense that I decided its creator was not using the Rule of Least Surprise. He claims to have never used CVS for Linux, so I can understand him not using CVS-like commands. However, the creators of Mercurial and Bazaar seem to have noticed that a lot of people who aren’t Linus Torvalds have also been writing software over the past few years: these two DVC systems do use syntax that is mostly familiar to a habitual Subversion user like me.

I got pretty psyched about Bazaar and read a lot about it over the past few weeks. However, despite the claims made on the bzr website, bzr is really slow. No matter what I do, it’s really slow. I was just playing around with it by cloning a repository on the same machine (again, that is a selling point) and no matter what it was deathly slow with just a few text files and minor changes. I’m not the only one who thinks it’s slow: Emacs maintainer Staffan Monnier recently wrote Emacs-devel to say that some minor pushes had taken over 30 minutes. I liked the bzr interface, but considering that hg has pretty much the same interface and is way faster I decided to stick with using hg for my Org Files. [Update: I am using bzr for a project on LaunchPad.net] The only remaining task is to figure out how to maintain forks of my configuration files on different machines. I think I have just been ignoring very basic rules of usage for merging, so that should not be hard.

Conclusions?

My conclusion is that using Mercurial is a good idea for my needs, and perhaps I can make it work using configuration files again. However, it is not the panacea that its proponents advertise, nor do we need to necessarily rethink version control completely. Version control is good for many things and making bold statements like “forking is fundamental” is really uncalled for. Those sorts of conclusions are specific to the particular needs of the developers involved and not necessarily for me. I’m not going to convert any of my major SVN projects to bzr as I originally intended because as I see it, DVCS does not offer any major advantages over centralized version control. Maybe it does for Linux kernel developers, Firefox developers, etc. For me it’s not a major improvement and it’s not going to help me work any better. I’m going to keep using Subversion and Mercurial, and we’ll see what happens.

Back up your Subversion and Mercurial repositories

June 28, 2010 1 comment

This morning I cooked up two scripts to back up my VCS repositories. I have two Subversion repositories locally, one for my main research project and another for my homepage. I have three hg repositories at the moment, one for local shell scripts (like the ones I was writing), one for my configuration files and another for an extended book project I’m working on.

The svn-oriented script takes either command-line arguments or reads repository paths from a file (~/.svnbk), each one on a single line. It makes (not “takes”) an svnadmin dump to stdout, which is compressed with the best available compression program, in the specified directory.

#!/bin/sh

# svnbk.sh: A tool for backing up my SVN repositories
#
# Copyright (C) 2010 Joel James Adamson
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
# 02110-1301, USA.

# Usage:
#  svnbk.sh [repositories] bkupdir
#
# If `repositories' is empty then svnbk.sh will look in ~/.svnbk for a
# list of repositories to back up; repositories will be dumped to
# compressed files in the directory `bkupdir'

# If available, use xz compression; otherwise use bzip2; if
# unavailable use gzip
XZ=$(which xz)
BZ2=$(which bzip2)
GZIP=$(which gzip)

if [ -x $XZ ]; then
    ZIP="$XZ"
    EXT=".xz"
elif [ -x $BZ2 ]; then
    ZIP="$BZ2"
    EXT=".bz2"
elif [ -x $GZIP ]; then
    ZIP="$GZIP"
    EXT=".gz"
else
    printf "Compression unavailable; dumping to uncompressed dump files\n";
    ZIP="none"
    EXT=""
fi

# Command-line variables
declare -a argv
argv=("$@")
# number of command-line args
argc="${#argv[@]}"
LAST=$(($argc - 1))
BKDIR=${argv[$LAST]}
# now to get a list of repositories from the command-line, we copy
# argv, then destroy the last element:
declare -a REPOS
REPOS=(${argv[@]})
unset REPOS[$LAST]
if [[ -z $REPOS ]]; then
    while read repo ; do
	FILE="$BKDIR/svn_$(basename $repo).dump$EXT"
	svnadmin dump $repo | $ZIP - > $FILE
    done  $FILE
    done
fi

The Mercurial backup requires you to initialize remote repositories, which is easy if you’re backing up to a mounted CIFS or NFS partition. Unfortunately, hg init does not create the directories on the CIFS partition I’m using, so I used the following:

joel@cifshost > mkdir hgbak
joel@cifshost > cd hgbak
joel@cifshost > mkdir bin
joel@cifshost > cd bin
joel@cifshost > hg init

After that the hg push should work just fine. The config file for this one has the local repository followed by the remote repository on each line. It does not use any compression:

#!/bin/sh
#
# Copyright (C) 2010 Joel J. Adamson
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
# 02110-1301, USA.

# hgbk.sh: A tool for backing up my hg repositories
#
# Usage:
#  hgbk.sh [repositories] bkupdir
#
# Read ~/.hgbk for a list of repositories to back up; the syntax of
# the file specifies
#
# [local repository] [remote (backup) repository]

# now to get a list of repositories from the command-line, we copy
# argv, then destroy the last element:
while read -a repo ; do
    cd ${repo[0]}		# full pathname!
    hg push ${repo[1]}
done < $HOME/.hgbk

Pretty easy, but saves a lot of time. The only non-automated part is the initial remote repository. I then activate them with the following script, placed in ~/.cron.daily/:

#!/bin/sh

ERRLOG=$(mktemp)
/home/joel/bin/svnbk.sh /home/joel/bioark 2>| $ERRLOG
SVNRC=$?
/home/joel/bin/hgbk.sh >> $ERRLOG
HGRC=$?
if [ $SVNRC -ne 0 ] || [ $HGRC -ne 0 ]; then
    mail joel < $ERRLOG
fi

rm $ERRLOG

You can download these from my homepage. I welcome comments and improvements.

%d bloggers like this: