Recently I decided that maintaining my homepage in HTML was getting too laborious: the primary problem was things like lists and hierarchies. I have used Emacs’ Org Mode for my daily agenda for almost five years, and decided that it was the right tool for organizing these structures. Org Mode allows you to view “your life in plain text,” which is, of course, the most versatile way to do so. What Org can also do is export your hierarchical documents to HTML, LaTeX and many other formats (including formatted ascii, which is very nice). Along with this is the feature org-publish that uses Tramp to transfer a set of exported HTML files (and other files) to another location.
Read the Org Manual’s section on org-publish: you can find a simple example there. A single variable called `org-publish-project-alist’ configures all the stuff you need to publish an entire website. Here’s mine:
(setq org-publish-project-alist '(("mysite" :base-directory "~/Documents/web/" :base-extension "org" :recursive t :section-numbers nil :table-of-contents nil :publishing-directory "/ssh:firstname.lastname@example.org:~/public_html" :style "") ("imgs" :base-directory "~/Documents/web/imgs/" :base-extension "jpg\\|gif\\|png" :publishing-directory "/ssh:email@example.com:~/public_html/imgs" :publishing-function org-publish-attachment :recursive t) ("etc" :base-directory "~/Documents/web/" :base-extension "css\\|bib\\|el" :publishing-directory "/ssh:firstname.lastname@example.org:~/public_html" :publishing-function org-publish-attachment) ("docs" :base-directory "~/Documents/web/docs/" :base-extension "html\\|tex\\|bib" :publishing-directory "/ssh:email@example.com:~/public_html/docs" :publishing-function org-publish-attachment) ("thewholedamnshow" :components ("mysite" "imgs" "etc" "docs"))))
After a few days of having this in my .emacs I decided this needed its own file, which I called “project.el” and placed in the home directory of my project.
Each one of the members of this list is a “project.” Projects can include other projects by using the “:components” property. Suppose my website’s files are in the directory “~/Documents/web/”. This is where I keep the actual org-mode files, css files and any other files I want to publish. The property “:publishing-directory” puts the exported files in the specified location, which is a tramp url. The trick is really the property “:publishing-function,” which tells `org-publish’ how to treat the files. If left blank, this will translate the files into HTML. For .css files and other stuff you might link to (e.g. my .bib or tex .files, or images) you can use the function `org-publish-attachment’, which does no translation.
The crucial part of this variable is then the last “project,” which has only a “:components” property. This includes all the other projects, and hence when I publish “thewholedamnshow” using `org-publish’ my entire set of files is exported and uploaded.
Now I have all the sources for my website in one directory. Before I had used a highly hierarchical setup that made links very complicated. After realizing that I didn’t have actually that much content, I now have all the org files in the toplevel directory, with two subdirectories: one for images and one for special documents that are not in Org Mode. These are essays or LaTeX documents that are already finished works and I do not expect to change them.
I keep all the Org Mode source files in Bazaar. This greatly simplifies things. With project.el included along with the website, I can work on this on any machine as long as I evaluate that variable before I upload using `org-publish’.
A huge advantage is that now everything (including my CV, publications, and my ever-expanding academic FAQ) is in Org Mode. This means that changes are super-easy, even structural changes that I wouldn’t have attempted with HTML. So now when I need to update my CV, or add an FAQ, all I do is edit in Org Mode, something that I am very familiar with because I do it most of the day every day. I actually just categorized my FAQ using Org Mode in a matter of minutes. Linking with Org Mode is also incredibly easy, and the exporter knows how to handle links to files, headlines within files and internet urls. Also, since these documents are now in Org Mode, if someone wants a PDF version, all it takes is a few keystrokes to produce it.
The weirdest thing is how easy this is once I figured it all out. After only a week of tinkering, I now have a website that I can update or make major changes to in a matter of minutes. It looks better, is easier to maintain and easier to configure.
Lately you’ve heard me say that my feelings toward laptops have changed. Since getting my new laptop, some of my feelings toward reading have changed as well. I love paper, and I love the look of printed letters, and well typeset text on the page. That won’t change. However, I noticed that most of the texts that I read (journal articles) I can read in online versions without missing much of the content. I’ve started exclusively reading current articles either online or in PDF form on my laptop and I’m glad to be conserving paper.
One thing that hasn’t changed — things that I’ve always read on my computer — are GNU manuals. GNU manuals are written in an ingenious format called TeXinfo which enables the author to produce appropriate output for several different ways of reading: PDF, HTML and the online info format, most easily read in Emacs. If you’re running GNU/Linux, you will find tons of manuals in this format by typing “info” into a terminal. Within Emacs, type “F1 h” (that’s press and release F1, then press and release ‘h’). Either way you should get a menu of topics, each covered by its own info manual.
Since deciding on Sunday that my programming goal should be better programming, rather than learning a new language, I started reading advanced topics in Unix/GNU programming: processes, pipes, IPC, etc. I was thinking “Man, I need to get that classic book Advanced Programming in the Unix Environment.” Unfortunately this book is HUGE, I wouldn’t carry it around with me, as reducing back strain is currently high on my agenda. It also dates from 1992 (around the time I first used Unix) and some things have changed since then. Most of the things the book is about have not changed, but most texts show their age in one way or another. Most Unix texts from this time look like casualties of the Unix wars, with more than half their content explaining incompatibilities between different version of Unix, and the pitfalls of writing portable programs.
So of course, I went for (what I thought was) the next best thing, something I already had and could carry around with me at no extra weight: the GNU C Library Manual (in the info menu, type “m” and then enter “libc” and hit Enter). I have been reading about the basics of IPC and processes for a while off and on, and there were things that I just didn’t get about them. I get them now, having read them in the Libc manual. For example, I didn’t understand that a child process and its parent process receive different return values from fork(); the Libc Manual spells this out so clearly I wonder why I didn’t think of it before. I didn’t get how the child process and parent process’ distinct code portions were triggered, but that was only because I hadn’t read the f’ing manual.
These manuals don’t read like terse manpages, they read like manuals that you would actually want to read. The Libc manual and the Emacs manual both repeatedly surprise me. Emacs users often joke about learning new “features” of Emacs that have actually existed for decades. Whenever I am frustrated with Emacs in some way, I’ll usually find a workaround, and then months later I’ll be reading the manual for some unrelated cause and find a solution to my problem. It was right there the whole time! You can imagine how empowering reading these manuals is.
The weird thing is that although I’ve repeatedly had this experience with GNU Manuals, they aren’t the first thing that I go to. I need to change that habit. We often treat reference materials as though we shouldn’t sit and read them, we should instead browse through them until we find what we need and then put them away. That’s what manpages are for. GNU Manuals are different. GNU Manuals actually tell you what’s going on and what to do: they are great for beginning programmers. I’m not going to waste my time going to the library; I’m going to read the Libc Manual.
eBooks, rms and DRM
Recently ebooks outsold paperbacks on Amazon.com. People may be treating this as the final sign that the death of paper is coming, but I don’t, for one considering that Amazon has been set up as an ebook store from the very beginning, i.e. they’re on the friggin’ web — it’s obvious they would try to compete by delivering their content as quickly and conveniently as possible. I’ve always seen it as a goal of theirs, although I think back in the 90s most of us thought ebooks would just be webpages, rather than something you’d actually carry around, i.e. we thought they would be different enough from regular books to combat the problems of regular books.
Amazon however has a different idea: they and their competitors would like you to think of ebooks as the same as regular books, just lighter weight, and easier to pay for. Their ridiculous idea of “e-lending” is so stupidly backward that I laughed out loud when I heard of it:
They have managed to recreate, in the palm of a reader’s hand, the thrill of tracking down a call number deep in the library stacks only to find its spot occupied by empty space. With a clever arrangement of bytes, they have enabled users to experience the equivalent of being without their books while their friends’ dogs chew on them. Maybe if we’re lucky, next they’ll implement the feature that allows two electronic pages to be stuck together as if by gum, or that translates coffee spilled on the screen into equivalent damage to the digital pages.–John Sullivan, Lending: A solved problem
They’ve done this with DRM or “Digital Restrictions Management.” Its practitioners call it “Digital Rights Management,” which I think is sinister enough: do you want your rights digitally managed? They’ve managed to make ebooks just as problematic as paper books, and why?
The question of their motives becomes so much clearer when we consider that not only did Richard Stallman create great free books about computing, like the Emacs Manual and the GNU Libc manual, he also helped create the best ebook reader out there (info), and all with the goal that it will facilitate user freedom. The choice is yours: do you want ebooks to be as inconvenient as regular books? Or would you rather have convenient, indexed, hyperlinked text written by people who care about you and your freedom? The choice is clear to me.
Some Further Reading
The history is about as interesting as the books themselves. Some people think that ebooks (or the concept) is new, just as they think about tablet computers and touch screens. Both touch screens and “ebooks” are about as old as computing itself. If you’re skeptical about that, think of how simple an idea it is: many, many books that you can carry around in your pocket at no additional weight. “Hey let’s use computers,” is a pretty simple solution. Computers were almost built for the task. The only new idea is making ebooks as inconvenient as paper books. I’m reminded of Douglas Adams‘ explanation that if a hitch-hiker wanted to carry a paper copy of The Hitchhiker’s Guide to the Galaxy (a text that bears a strange resemblance to Wikipedia), he would have to carry several enormous buildings with him.
A few weeks ago I migrated two major projects to distributed version control systems (DVCS), leaving only one project in Subversion, the one hosted on Savannah. As you can read in my prior posts, I have resisted switching over to DVCS. However, recently I’ve understood the benefits propounded by DVCS adherents, and I’ve found that it has more features than most tutorials let on.
Why Did I Resist?
I resisted DVCS so strongly for a few reasons:
- Most arguments for DVCS I encountered were actually anti-Subversion arguments; much of them based on incorrect information about Subversion and CVS
- Much of what I read sounded like knee-jerk trendiness: it sounded like people were doing it just because Linus Torvalds says Subversion is stupid
- I had an important project (my dissertation!) in Subversion, managed with Trac. I didn’t want to lose all that history by doing a crappy conversion.
When the anti-Subversion arguments didn’t hold up, I ignored them. I thought maybe my working conditions were just different or other people just weren’t reading the manual. Those are still possibilities, but the harder thing to examine was my second reason for dismissal: I assumed that anyone who said these things was a total newbie, who had just been told that DVCS was better. I’ve talked about object-oriented programming proponents often just sound inexperienced with programming. I figured the same was true of DVCS proponents.
However, two things happened that really changed my mind. The first was that I’ve realized that the most annoying thing about somebody questioning my decisions is the feeling that they think my decision is poorly considered when it is deliberate, careful and took me weeks of preparation. It’s very easy to take that attitude with people online: when I don’t hear or see people, I don’t have that mirror held up to me. It’s very easy to just brush something off and say that the other person “just isn’t thinking about it.” Realizing how much that pisses me off when people take that attitude with me, I’ve thought a little more about how I consider peoples’ attitudes online.
Many experience hackers have switched
The second thing was realizing that people whose opinions I know I can value, people who definitely have done their homework, have switched major projects to DVCS. Emacs, my favorite piece of software that I am using right now to right this, is kept in Bazaar now. I know the people who made that decision were doing their homework, not going by knee-jerk reaction, certainly not just to copy Linus Torvalds. Bazaar is also part of the GNU Project.
What about my revisions?
svn2bzr answered my third concern. svn2bzr is a featureful-enough tool that will create Bazaar branches or repositories from SVN repository dumps. It’s really freakin’ easy to create whatever configuration you want:
> python ~/.bazaar/plugins/svn2bzr/svn2bzr.py --prefix=subdir svndump newrepo
This will create a new Bazaar repository in the directory `newrepo’ that contains all the revisions in the subdirectory `subdir’ of the svn repository. This is where Bazaar’s concept of repositories shows its difference.
In a Bazaar repository you can have many branches beneath the repository in the filesystem, and you import a branch by branching into a subdirectory. I did’t get this for a few weeks, so let me give you an example. Suppose I have a branch called `branch’ located at `~/Public/src/branch’ and a repository called `repo’:
> cd repo > bzr branch ~/Public/src/branch here
That creates a branch within the repository called `here’. Now I can create other branches, merge them, etc. The only tricky thing about getting my revisions into a place where Trac could use them was that I needed a repository hosted on HTTP. Then I used the TracBzr plugin to add the repository to Trac. I realized that changeset links are only used in Trac tickets, and since I had so few of those referencing current revisions, changes in the revision numbers wouldn’t matter that much.
Features of DVCS
I heard many, many anti-Subversion arguments and some really bogus arguments for DVCS. People have said “you can’t merge,” “you can’t make branches,” “Subversion causes brain damage” and on and on. The bogus pro-arguments I heard were that you can commit without a network connection, “forking is fundamental,” and that DVCS is “modern.” Answering these arguments is simple: committing without a network connection is not a big deal. On the other hand updating without a network connection is impossible, and it’s a situation I’ve found myself in more often, especially working with a laptop, instead of just two workstations. This is where DVCS was nice. Updating is a bigger problem than committing.
As to “you can’t merge” and “you can’t make branches,” we all know that’s bologna. However, what you can do much better with DVCS systems like git and Bazaar is edit directory structure and rename files. This is a huge advantage of DVCS systems. Bazaar, for instance, totally keeps track of all renames and copies in its history. Subversion, on the other hand, does renames with a DELETE operation and an ADD operation. Not so smooth. A good way to do get something better than CVS, but not the best.
Furthermore, DVCS systems are very good at merging. That doesn’t mean you can’t merge with Subversion — I’ve been doing that for years. However, merging between two branches in Bazaar is much simpler than merging in Subversion. I don’t have to read the help when I’m merging with Bazaar; merging with Subversion is not hard, but it’s not as simple. Simplicity is the name of the game, baby.
A Stupid Git Realization
I had tried using git before and didn’t enjoy it. I’m glad to say I was using it wrong. I had tried using it to manage my webpages, but whenever I pushed my local changes to my remote webpage tree on UNC’s servers, I would get messages about not updating the local tree and stuff like that. It was just confusing. It didn’t really make sense. I wasn’t interested in trying git again, hence using Bazaar for some new projects.
I had a weird realization one night: I was working with the git tree of Guile, and someone on irc had told me that the most updated git source had a known problem. I didn’t want to go get the tarball for Guile 1.9-13, so I thought “Wait, I have the git tree, so I should be able to generate whatever release version I want. How do I do that?”
> git tag -l > git checkout release_1-9-13
and there I had it. Wow! That is cool.
I also followed a simple tutorial to get my webpages working with a hook that would update the local tree (the one served as my homepage) every time.
It seems a simple idea: make a repository in a different directory,
and check out to it every time I push to that repository. Why hadn’t
that occurred to me before? Conversion from SVN to git was insanely simple:
> sudo yum install git-svn > git svn clone http://path/to/repo webgit
I think I’m done with Subversion. DVCS, at least git and Bazaar, can do a hell of a lot and I really like their features. I wouldn’t mind using Subversion for an existing project, but I think I’m not going to start any new projects with it. I’m also going to take it easy on people who disagree with me online. I’ve seen that at least some of them were speaking from the same position I hope to.
You may have noticed that my RSS feed from Connotea has gone comatose. Unfortunately Connotea itself went totally comatose while I was preparing a poster this week. All they’ve said is that they’re aware of the problem and will be fixing it soon. I’m glad they noticed.
Someone on Stack Overflow disagreed with me about using centralized version control for a solo project. As I predicted, of course; as I said in my last post, DVCS is a fad and it will have many converts who support it in a knee-jerk fashion. I think the person who disagreed may just be misinformed, or not thinking about this hard enough. I may not have made things clear in my last post, however, about why centralized version control makes more sense for a solo developer. Also, I admit that it’s paradoxical that someone would use centralized version control for a solo project. However, as I’ll show, centralized version control does make more sense (not only that, but I’m not the only one who thinks so).
Consider this: if you are a solo developer working on the same machine all the time (e.g. a laptop), a DVCS repository in your current working directory and a Subversion repository in your home directory are practically the same. The repository is always online and you can always make commits. This is what makes the “commit while offline” argument of DVCS proponents so weak: for certain situations you can do that with Subversion, or most of the time you won’t need to.
Now consider that if you are a solo developer working on multiple machines, DVCS only creates an extra step in your development. I always work on my big ol’ workstation during the day. If I were using git or hg for my most-frequently-worked-on projects, I would need to remember to push my changes from one repository to the other. With Subversion I just don’t have that step. All other steps are identical with the two workflows.
All through those above arguments, I am only one developer, and this is the crux of the idea. I think people often confuse the situation of multiple computers with that of multiple people. In the former argument, I posed that with one computer there is no difference between using file URLs and using a DVCS repository in the current working directory. However, there would be a difference if there were multiple developers. Then you have to configure access to a single machine (either for cloning or checking out) for multiple users; this is when things get more complex.
The “D” in DVCS stands for “Different People.” What I mean by that is that the “forking is fundamental” argument posed by DVCS proponents doesn’t apply when there’s a single developer or author working on a project. If by myself I want to create branches and work on different features, that is perfectly easy to do with Subversion, and so is merging. What distributed version control is for different people maintaining different features and seeing how they work. I completely reject the idea that centralized version control is obsolete and I will keep recommending it to solo developers.
Over the past few months I’ve been working with new tools to enable me to work on several machines and keep the same important data, such as configurations and bookmarks. The impetus for this was that I’ve started to use a laptop: I never wanted one, but then somebody on the TriLUG mailing list offered one for sale for $20, and I just bought it. It’s really useful for when I’m watching the kids or when travelling: I don’t have to use someone’s antiquated, slow Windows computer just because I’m visiting them. Also my wife needs to be on our main machine after we put the kids to bed. Short of acting out my fantasy and installing a server with LTSP terminals throughout the house, the laptop is good for me to keep working. Unfortunately the usual routine of setting up my shell, Emacs, Firefox, email and my Org Mode agenda files seemed so laborious that I realized “there’s potential for automation here.”
To tackle email, I switched from MH to using IMAP. For Firefox I started using Weave. I was using Delicious for a long time, but Delicious is not free software so I decided I didn’t trust it. Weave is as free as Firefox, so when I heard about it I decided to go for it and it’s worked really well. I rarely used the social aspect of Delicious and mostly used it for portable bookmarks.
For the other two areas, Emacs and Org Mode files, the solution was less clear. I had tried using ssh urls with tramp to have my agenda on multiple machines, then I saw Carsten Dominik’s Tech Talk about Org Mode, where he described using git to manage his Org files on multiple computers. For config files (shell and Emacs) I had tried using rsync to mirror them on different machines, using a Makefile with rsync commands. However, different needs of different machines would always screw things up. Then I remembered that version control might be the right tool. I had tried that before with Subversion (SVN), my main version-control system (VCS), but things had not gone much better than with rsync. Then I thought perhaps a distributed version-control system (DVCS) would make more sense.
My first impetus for using something other than Subversion was that I’ve discovered having one project per repository makes the most sense; that way I can make branches and not worry about confusing anybody. So I have a repository for my webpages, and a repository for my biggest project. That works well. However, I also started working on a book (i.e. it became a book) and it really didn’t fit in either repository. I came down to a choice between adding another Subversion repository, with all the Apache setup, or using something else that would be more convenient. Although setting up Apache is not hard after the third or fourth time you’ve done it, I still felt like it was unnecessary. I knew I would be the only person working on this, and therefore something that I could configure to use ssh made the most sense.
This is the most compelling argument for distributed version control: it’s easy to set up a repository and it’s easy to configure access for local system users. With Mercurial (similar for git and bzr), you just do
joel@chondestes: ~/tmp > hg init repo joel@chondestes: ~/tmp > cd repo joel@chondestes: ~/tmp/repo > touch myself joel@chondestes: ~/tmp/repo > hg add myself joel@chondestes: ~/tmp/repo > hg status A myself
That to me is really compelling. Setting up a Subversion repository is also pretty easy, depending on where you want to put it, but configuring access from the internet is not as simple. I can access the above-initialized Mercurial repository just using an ssh url argument to its clone or pull commands.
Another thing that is really good about DVCS systems (all that I’ve tried) is that they’re really good at merging. They do not have a monopoly on this, however; Subversion has been good at merging for as long as I’ve been using it. Again, I may be different from the people writing the manuals in that I don’t work on large projects with lots of contributors.
For some reason, however, the biggest advantage of distributed version control touted by its proponents is that you can make commits when you don’t have internet access. Wow! That is huge. Oh wait, that’s sarcasm. I am in this situation pretty often working on my Subversion projects and it really doesn’t bother me. If I’ve come to a point where I should make a commit and I don’t have network access I can do one of two things: I can make a ChangeLog comment and keep working, or I can stop working. I always have other things I can do. Seriously I don’t see this as an advantage, especially when if what you want is to update another repository you have to push your changes anyway, and that requires network access. Committing changes to my Org-files would be useless unless I could push them to the computer I know I’ll be sitting at in the morning.
Another ridiculously inflated advantage proponents mention is that you don’t have to worry about breaking the build when you commit, because you make your commits to your local repository. I have spent another blog posting on this concept already, but again this is not a distinct advantage. I commit things that are broken when they’re broken already, but not if I’m adding a new feature. If you want to commit something on a new feature that might screw other things up, the proper way to do it with centralized version control is to make a branch. It seems like some people don’t think this is possible with Subversion, but I’ve been doing it since the beginning. Not only is it possible, it’s recommended.
There are two more big problems I have with distributed version control. First: it’s a fad. I don’t mean that like it’s something that is overblown and bound to die out like MC Hammer. However, it seems like everyone is switching to it and citing the same bad reasons. That to me seems like a warning. The rest of us who know how to use Subversion will just keep on going doing it the right way, reading the manual.
My second big problem that people who like DVCS seem to love is this “fork happens” idea. Forking is when there’s some kind of split in the development philosophy or goals between project members that leads to factionalism. The most famous example is the creating of XEmacs. The author of The Definitive Guide to Mercurial uses socially-oriented rhetoric (thank you, Eric S. Raymond) to justify distributed version control. He says basically that forking is natural and we’ve all been in denial by using centralized version control. Using DVCS on the other hand, brings us out of our comfort zone into some new promised-land where we all have our own fork.
This argument doesn’t really hold up. As others have pointed out, the idea that you’re always working on your own fork is kinda ridiculous. Unless you’re happy to just keep your own version, your contribution to the larger piece of software will always have to be transmitted to someone else. Why would you develop software in a vacuum?
The same Definitive Guide author says that some people use centralized version control because it gives them an illusion of authority, knowing that people won’t be off-putting out their code as someone else’s. While that’s possible, it certainly goes against the tone of the Subversion book, which encourages administrators to let developers develop their own branches freely. And again, even if you don’t have the illusion of authority over a project, you’re going to have to express authority at some point and decide whose changes to merge into the mainline. Now who’s delusional? People don’t want to download some dude’s fork, they want to download from a central point, where all the developers and users meet to discuss and code, and decide which changes should make the software the best.
My previous experience with distributed version control was using git to maintain my webpages. There was so much about it that didn’t make sense that I decided its creator was not using the Rule of Least Surprise. He claims to have never used CVS for Linux, so I can understand him not using CVS-like commands. However, the creators of Mercurial and Bazaar seem to have noticed that a lot of people who aren’t Linus Torvalds have also been writing software over the past few years: these two DVC systems do use syntax that is mostly familiar to a habitual Subversion user like me.
I got pretty psyched about Bazaar and read a lot about it over the past few weeks. However, despite the claims made on the bzr website, bzr is really slow. No matter what I do, it’s really slow. I was just playing around with it by cloning a repository on the same machine (again, that is a selling point) and no matter what it was deathly slow with just a few text files and minor changes. I’m not the only one who thinks it’s slow: Emacs maintainer Staffan Monnier recently wrote Emacs-devel to say that some minor pushes had taken over 30 minutes. I liked the bzr interface, but considering that hg has pretty much the same interface and is way faster I decided to stick with using hg for my Org Files. [Update: I am using bzr for a project on LaunchPad.net] The only remaining task is to figure out how to maintain forks of my configuration files on different machines. I think I have just been ignoring very basic rules of usage for merging, so that should not be hard.
My conclusion is that using Mercurial is a good idea for my needs, and perhaps I can make it work using configuration files again. However, it is not the panacea that its proponents advertise, nor do we need to necessarily rethink version control completely. Version control is good for many things and making bold statements like “forking is fundamental” is really uncalled for. Those sorts of conclusions are specific to the particular needs of the developers involved and not necessarily for me. I’m not going to convert any of my major SVN projects to bzr as I originally intended because as I see it, DVCS does not offer any major advantages over centralized version control. Maybe it does for Linux kernel developers, Firefox developers, etc. For me it’s not a major improvement and it’s not going to help me work any better. I’m going to keep using Subversion and Mercurial, and we’ll see what happens.