Archive

Posts Tagged ‘programming’

What no one has told me about programming

November 15, 2011 Leave a comment

I had another interesting breakthrough yesterday with regard to how I think about programming, or rather creating applications using programming. I’ve learned over the past couple years of creating an application that seemed simple at the outset (a simple number-crunching program!) that the really hard parts of programming are not the parts that people typically write about, and the really challenging parts are things that are obvious, but nobody talks about those things, perhaps because they are so challenging. This brings me to another question about how I should handle those really hard parts.
Yesterday I toyed again with the idea of learning C++, and decided against it yet again. I’d heard that C++ had a number of tools that are good for numerical programs, like vectors, and I recently heard of a new matrix library called Eigen. However, I’ve avoided C++ because I still believe object-oriented programming is one of those bad habits people pick up in programming classes, and it didn’t seem to offer any advantages over C. C is good; I mean C is The Right Thing.

There was still something else nagging me, however. When you read the books with the trivial examples that you could do easily with a pocket calculator, they don’t match up with the way the language is designed. Even books on Haskell don’t seem to be saying as much as they should about what’s really important in designing an application. That was the realization: what you’re supposed to do with a programming language is build an application. The “program” part (that is, the algorithm) is really immaterial. After exploring many programming languages, I have found that with few exceptions there are very few that really differ in their offerings for completing algorithms. You can write the same algorithm in almost any language and have it perform pretty well on most hardware these days. So what’s missing? What are all the manuals full of? Why does every programming language have a preoccupation with strings?

Let me use an analogy: I play the banjo, and took over a year of lessons, read tons of books and have probably spent over 3,000 hours playing and practicing, and even after getting in a really good band, having great people to jam with, and practicing really well, there was still something about playing that was so difficult. I just kept saying “I don’t know what to play,” or “I can’t make the notes fit there!” After I started graduate school and my second son was born, I needed to shift back to listening and if playing, playing a quieter instrument, so I started doing things I’d never done with my banjo using a guitar: playing scales, picking out melodies, and listening very carefully to my favorite guitar players. Listening to Steve Stevens, Jerry Garcia, David Gilmour and Kurt Cobain, I noticed something: these guys don’t play notes, they play phrases.

Why had absolutely no one mentioned playing phrases to me? Was I not listening? Did no one just say “Melodies, counter-melodies, rhythms, etc., i.e. music (dude!) is composed of phrases. You can construct phrases in many ways, but the key is punctuation.” When I learned to play the banjo, I learned the punctuation marks (licks). I learned how to move my fingers, and I learned chord formations. But I never learned the fundamental thing about music is phrasing. After I figured this out my brother told me how a famous drummer sat him down at a workshop and pointed his finger saying “One thing is important: phrasing.” Luckily this was when my brother was fifteen. Since I’m not a pro like him, I can understand why I didn’t get that opportunity, but still come on! This is hugely important. Why did nobody mention it?

And why has nobody mentioned, in any programming book that I’ve ever found that the crucial thing — the hard thing — about designing a program is the user interface. There are books about user interface, certainly, but they are concerned with superficialities of user interface, like what color the frame should be. Who cares? The difficult part is deciding how your program should interact with its user. Eric Raymond does spend a whole chapter on this, but he doesn’t start with it. I’d like to read a book that starts with “You can figure out all that stuff about your algorithms: you have the equations, you have the data structures, you know what it’s going to do; spend time thinking about how a user would get it to do that well.”

So my realization yesterday is that the reason the C standard library is full of string functions, the reason Lisp programmers are so concerned with reading program text and the reason that there are so many programming languages and libraries and plugins is that the really hard part is between the user and the algorithm. My inclination is to say that the simplest interface is best. The simplest interface would be “stick something in and see what comes out.” That’s called Unix. Even in Unix you can’t just do that: you have to mediate somehow between the algorithm in a world of numbers, and the user who lives in a world of text. This is easiest on Unix, but it’s still not easy.

There are other schools of thought: your user interface should be a pane full of buttons and pretty colors to dazzle your user into thinking they’re doing something useful, or a monolithic shell that does everything with the computer. I don’t really buy either of those things, because I know how to use stream editing and Make to tie things together. However, sometimes I need a program that I don’t have to re-run all the time. I would like something in between: something where I can run a simulation, look at the results, then tweak it a little and run it again, then set it up into batch mode to produce a huge pile of results that I can analyze. There’s no reason that all has to be in one huge program, it could be several, but the point is that the algorithm contained in there would be the same for all those steps. There are languages like this, such as R, Octave and Scilab. However, I don’t like programming in any of their languages. Maybe I can come to like it since they make the hard parts easy.

The approach I should take with my next program is “How do I write a language for running a simulation?”

Advertisements

Which Programming Language Should I Learn Next?

January 30, 2011 10 comments

Fairly often I see people asking in online communities “which programming language should I learn next?” I have asked this question often recently. I want to learn something new. I always enjoy learning new things. It’s what I do. I’m a scientist, and my current occupation (what I put on forms) is “student,” but I think of myself as a student in a much more holistic way. I always enjoy learning and I seek it out on a moment-to-moment basis. Programming is a large part of what I do: in a way, it’s also been my “occupation” for at least the past five years. I programmed in Stata for my job before I came to graduate school and now I use C, Scheme, bash, Emacs Lisp and a few other “languages” every day.

I feel like I reached a plateau of sorts a couple of years ago, after studying languages at a rate of about two per month for at least two years. By that I mean I studied Emacs Lisp and Python for a month, then things seemed to shift to Scheme and R, or Perl and Common Lisp for the next month. I think I intensely studied about ten languages over three years, including various Unix shells and a few specialty languages (like Mathematica: yuck!) . There’s still a whole bunch that I would say I’m conversant in, and some even that I use as a fairly essential part of my work, that I might be able to use better if I knew them better, like TeX. As my graduate school research picked up, however, I settled on C and Scheme as my main languages.

I found this plateau somewhat dismaying: as I said I always want to learn new things, and there seem to be really cool languages out there that I could learn. For about two years I’ve been casually reading about, and doing minor coding in the ML family and Haskell. However in each case I’ve found that there are reasons I shouldn’t bother. Here are my conclusions:

  1. My needs as a programmer are different from the vast majority of people who put the title of this posting into Google
  2. Most programs people want to write are quite different from the ones that I want to write
  3. I really like the Unix workflow

Other Programmers Learn For Jobs

The most common answer I see to “What programming language…?” is “You should know C, C++, Lisp, Python, Javascript,…so that when you go to your interview…” That’s when I stop reading. The authors assume (and often rightly so) that I’m asking because I’m looking for a job. I’m not, as it turns out, but I’m sure a lot of other people are. I’m a scientist, I have a job (in a way), and I wouldn’t wish a corporate job on my worst enemy. As I said, I had a job that had a large programming component, but the interviewers didn’t really care that I didn’t have experience specifically with that language (Stata). What they cared about was whether I was good at learning. In my mind that should always be more important and I will always hold these people in high esteem for doing things that way. I remember the person who became my closest colleague interviewing me and saying “Stata’s pretty easy, it has a very simple syntax, you should be able to pick it up pretty fast.” He was right.

In my discussion of object-oriented programming I got the comment quite often that “You need to know object-oriented programming because it controls complexity, and is therefore essential in corporate programming environments, so if you want a job…” End of discussion. Don’t believe the hype. If you want such a job, then by all means, learn Java. If you’re more like me, and you realize that programming is not the hardest part of most jobs then focus on those other parts, and get good at using whichever programming paradigm is most well-suited to the task at hand. Don’t obsess about which programming paradigm is most suited to having people fire you easily.

Other Programmers Write Monolithic, Interactive Programs

The programming task that I’m most often using is numerical analysis, the oldest programming task in the universe — the one that pre-dates computers. I conclude that the source of my confusion with many programming texts and the explanations given is that other programmers are interested in (or at least authors are trying to interest them in) designing large, monolithic, interactive programs. In my mind there are only a few good examples of such programs, and they are already written: Emacs, the shell, window managers and file managers, and a web-browser (which is really a noninteractive program dressed up as an interactive one). I’m not going to write one of those. Seems to me like most people learning Haskell, for example, are coming from writing monolithic programs in C++ or Java, and probably on Microsoft Windows.

What’s particularly funny to me about this is that this split goes back a few decades to the “Worse is better” controversy of the early nineties. Unix’ detractors generally believed in writing monolithic programs and their favorite development environments were eclipsed by Unix and the hardware it came with (workstations). I guess Microsoft and Apple were able to steer people away from Unix once again; now people come from environments where they are used to building these monolithic programs to Unix-like systems, and they don’t find out they can use computers a particular way. I started using Unix when I was thirteen: I guess this means I’m old. I’d rather be an old Unix-user than a young anything.

There are a few other reasons I’m not writing such big programs: an interactive environment for numerical operations only makes sense up to a point. It’s great for experimenting. However, even in Stata I ended up writing scripts, in a programmatic style, and executing them in batch mode, carefully writing the important results to output, and saving graphs as files. Either those programs have been written and are awesome, or I don’t need monolithic, interactive programs to do the things I’m doing. I have a different perspective on how people should use computers.

Unix Philosophy Works For Me

I often read that the Unix philosophy became “Do one thing and do it well.” Other people seem to want to start a program, work with just that program for a long time, and then do something else using a different huge, monolithic program. I think that’s a waste of time. It sounds extremely limiting. Especially when I have a whole bunch of tools available to integrate my work into a common whole. I often read the derisive aphorism “When all you’ve got is a hammer, everything starts to look like a nail.” I think the supposed wisdom of that remark is placed elsewhere, but it has the opposite meaning when speaking about using Unix tools. Yes, when you have Make, everything starts to look like targets and dependencies. When you have sed and awk, everything becomes text processing.

Consequently all I need is an editor to make me happy. I use Emacs, which becomes a whole “working environment,” but I could get by using vi with shell access (however much it hurts me to say that). Everything becomes editing when you have the idea that to use a computer is to write programs, and you know which tools can glue those programs together. Then all you need is a single command (e.g. “make”) to get the ball rolling. Given this perspective, learning new languages just becomes a matter of fitting those into an existing workflow. I generally think of programs as “input-output” and it’s okay if that input is a program, but it shouldn’t try to be its own environment and supersede Unix.

The language that fits in best with Unix philosophy and GNU Tools is C. Not only does C fit in, the GNU system is built around it, including a huge number of tools that make using C really, really easy. Automake, autoconf and the other auto-tools mean that all I have to do is write a little program, write a little Makefile.am, use autoscan and a few other things, and “make” builds me a program. Writing programs for simple number-crunching also means that most of the problems people associate with C are not my problems. I don’t have memory leaks in my programs, they just don’t happen. Therefore I don’t really need to care about having a language with garbage collection. Everybody’s screaming about making programs faster with parallel execution, but that’s for web-servers, databases that get called by web-servers, and other things that I’m not writing. C is extremely fast for number crunching, and we can make the kernel run parallel jobs using “make -j” or GNU Parallel. C is just fine.

Am I the only one out there interested in using something other than Fortran for number-crunching? Probably yes, but I can use C. I don’t need Haskell. I like the mathematical cleanliness of Haskell, but that doesn’t matter when I already know a functional language (Scheme), can already write bloody-fast numerical algorithms in C, and can run parallel jobs with Make. I read a lot of stuff about writing parallel programs and other features of supposedly “modern” languages, but they are almost always things important for writing web servers or GUIs, things that I’m not doing.

Languages

I’m still tempted to learn certain languages: here’s a run-down of why.

C++

C++ is still tantalizing because so many people know it. In addition to that, it seems to have a very mature set of standard libraries. However, especially when I hear people say stuff like “Many C++ programmers write in C, but just don’t know it,” it seems still more unnecessary. C++ has a large community, GNU development tools, and seems like I’d have to change very little of how I do my work in order to learn it. All I would have to learn is the language.

D

D is an interesting language because it includes well-implemented features of some other language platforms, like garbage collection. D seems basically like C with some added on features, and the ability to extend its programming paradigms. I haven’t taken the steps to see what kind of development tools are available for D, so I haven’t given it the full evaluation yet. Unfortunately, it doesn’t seem to have a large enough user community to fit fully in with GNU yet, which is a critical factor.

Haskell

The big thing Haskell has going for it is that Fedora has a full complement of development tools to form a Haskell environment. Haskell has just as huge a network of tools as Lisp (close enough to look that way), so that would make it easy to get going. I think the problems with Haskell are that it seems too difficult to get going with, it seeks to be its own environment (i.e. doesn’t fit in with my working environment), seems suited to doing other things than I would do with it, and I don’t need to learn it. I would really like to learn it, but all these things just add up to me saying “I don’t have the time.” That’s fewer words than all that other stuff I said.

Javascript

I keep thinking I should learn Javascript. I feel like if I know Emacs Lisp, and I use Firefox as much as I use Emacs, I should know a scripting language for Firefox or be able to add the features that I need. However, all the learning materials for Javascript seem job-focused and doing stuff that I wouldn’t be interested in.

What Makes a Language Usable or Worth Learning?

This is a common question I see people discuss: most often I’ve seen it in “Common Lisp vs. Scheme” discussions common in Lisp forums. The question there seems directed at why Common Lisp has been so much more popular than Scheme. That’s a dubious premise, seeing that many people learn Scheme in college CS classes, at least that’s my impression (as I said, I’ve never taken such a class). The real premise of the question is “Why does Common Lisp have so many libraries, whereas Scheme makes you recreate format?” Paul Graham’s creation of Arc was driven by this contention: people say “If you want to actually get work done, use Common Lisp,” but Scheme is so cool, right? I have come to a different question which is “How does this language fit into my workflow?” This was also a critical part of choosing a Scheme implementation. There are tons of them, but they are all designed for slightly different purposes, or they are someone’s proof-of-concept compiler. Guile is a great example of the reasons I would put time into learning to use a particular language.

I find the relevant factors in choosing to spend time with a language are (a) fitting in with Unix workflow/mindset, (b) a good community (hopefully aligned with GNU), (c) libraries, utilities and functions that have the features I expect, (d) development and workflow tools and (e) good learning materials. I have found that certain languages or implementations fit all these features, and some fit some, but not others. The best is obviously C, which has all these qualities. Guile is the best Scheme implementation because it has all these qualities. Guile even integrates with C; I think my next big project will be C with Guile fully integrated. Python has a great community, but it’s quite distinct from the GNU community, the community I prefer. I’m less likely to find a fellow Unix-minded friend in the Python community. Haskell has good support on Fedora, but I haven’t found a good text for learning it. Pike looks thoroughly Unixy in its syntax, but its development materials, or even its interactive environment are not available in Fedora repositories. I’ve found the tools that work for me, and I suppose the best thing is to learn how to use them better.

Categories: Programming Tags: , , , , ,

Distributed version control: I get it, but I don’t need to

September 4, 2010 4 comments

Over the past few months I’ve been working with new tools to enable me to work on several machines and keep the same important data, such as configurations and bookmarks. The impetus for this was that I’ve started to use a laptop: I never wanted one, but then somebody on the TriLUG mailing list offered one for sale for $20, and I just bought it. It’s really useful for when I’m watching the kids or when travelling: I don’t have to use someone’s antiquated, slow Windows computer just because I’m visiting them. Also my wife needs to be on our main machine after we put the kids to bed. Short of acting out my fantasy and installing a server with LTSP terminals throughout the house, the laptop is good for me to keep working. Unfortunately the usual routine of setting up my shell, Emacs, Firefox, email and my Org Mode agenda files seemed so laborious that I realized “there’s potential for automation here.”

Sync Tools

To tackle email, I switched from MH to using IMAP. For Firefox I started using Weave. I was using Delicious for a long time, but Delicious is not free software so I decided I didn’t trust it. Weave is as free as Firefox, so when I heard about it I decided to go for it and it’s worked really well. I rarely used the social aspect of Delicious and mostly used it for portable bookmarks.

For the other two areas, Emacs and Org Mode files, the solution was less clear. I had tried using ssh urls with tramp to have my agenda on multiple machines, then I saw Carsten Dominik’s Tech Talk about Org Mode, where he described using git to manage his Org files on multiple computers. For config files (shell and Emacs) I had tried using rsync to mirror them on different machines, using a Makefile with rsync commands. However, different needs of different machines would always screw things up. Then I remembered that version control might be the right tool. I had tried that before with Subversion (SVN), my main version-control system (VCS), but things had not gone much better than with rsync. Then I thought perhaps a distributed version-control system (DVCS) would make more sense.

Using DVCS

My first impetus for using something other than Subversion was that I’ve discovered having one project per repository makes the most sense; that way I can make branches and not worry about confusing anybody. So I have a repository for my webpages, and a repository for my biggest project. That works well. However, I also started working on a book (i.e. it became a book) and it really didn’t fit in either repository. I came down to a choice between adding another Subversion repository, with all the Apache setup, or using something else that would be more convenient. Although setting up Apache is not hard after the third or fourth time you’ve done it, I still felt like it was unnecessary. I knew I would be the only person working on this, and therefore something that I could configure to use ssh made the most sense.

This is the most compelling argument for distributed version control: it’s easy to set up a repository and it’s easy to configure access for local system users. With Mercurial (similar for git and bzr), you just do

joel@chondestes: ~/tmp > hg init repo
joel@chondestes: ~/tmp > cd repo
joel@chondestes: ~/tmp/repo > touch myself
joel@chondestes: ~/tmp/repo > hg add myself
joel@chondestes: ~/tmp/repo > hg status
A myself

That to me is really compelling. Setting up a Subversion repository is also pretty easy, depending on where you want to put it, but configuring access from the internet is not as simple. I can access the above-initialized Mercurial repository just using an ssh url argument to its clone or pull commands.

Another thing that is really good about DVCS systems (all that I’ve tried) is that they’re really good at merging. They do not have a monopoly on this, however; Subversion has been good at merging for as long as I’ve been using it. Again, I may be different from the people writing the manuals in that I don’t work on large projects with lots of contributors.

For some reason, however, the biggest advantage of distributed version control touted by its proponents is that you can make commits when you don’t have internet access. Wow! That is huge. Oh wait, that’s sarcasm. I am in this situation pretty often working on my Subversion projects and it really doesn’t bother me. If I’ve come to a point where I should make a commit and I don’t have network access I can do one of two things: I can make a ChangeLog comment and keep working, or I can stop working. I always have other things I can do. Seriously I don’t see this as an advantage, especially when if what you want is to update another repository you have to push your changes anyway, and that requires network access. Committing changes to my Org-files would be useless unless I could push them to the computer I know I’ll be sitting at in the morning.

Another ridiculously inflated advantage proponents mention is that you don’t have to worry about breaking the build when you commit, because you make your commits to your local repository. I have spent another blog posting on this concept already, but again this is not a distinct advantage. I commit things that are broken when they’re broken already, but not if I’m adding a new feature. If you want to commit something on a new feature that might screw other things up, the proper way to do it with centralized version control is to make a branch. It seems like some people don’t think this is possible with Subversion, but I’ve been doing it since the beginning. Not only is it possible, it’s recommended.

There are two more big problems I have with distributed version control. First: it’s a fad. I don’t mean that like it’s something that is overblown and bound to die out like MC Hammer. However, it seems like everyone is switching to it and citing the same bad reasons. That to me seems like a warning. The rest of us who know how to use Subversion will just keep on going doing it the right way, reading the manual.

My second big problem that people who like DVCS seem to love is this “fork happens” idea. Forking is when there’s some kind of split in the development philosophy or goals between project members that leads to factionalism. The most famous example is the creating of XEmacs. The author of The Definitive Guide to Mercurial uses socially-oriented rhetoric (thank you, Eric S. Raymond) to justify distributed version control. He says basically that forking is natural and we’ve all been in denial by using centralized version control. Using DVCS on the other hand, brings us out of our comfort zone into some new promised-land where we all have our own fork.

This argument doesn’t really hold up. As others have pointed out, the idea that you’re always working on your own fork is kinda ridiculous. Unless you’re happy to just keep your own version, your contribution to the larger piece of software will always have to be transmitted to someone else. Why would you develop software in a vacuum?

The same Definitive Guide author says that some people use centralized version control because it gives them an illusion of authority, knowing that people won’t be off-putting out their code as someone else’s. While that’s possible, it certainly goes against the tone of the Subversion book, which encourages administrators to let developers develop their own branches freely. And again, even if you don’t have the illusion of authority over a project, you’re going to have to express authority at some point and decide whose changes to merge into the mainline. Now who’s delusional? People don’t want to download some dude’s fork, they want to download from a central point, where all the developers and users meet to discuss and code, and decide which changes should make the software the best.

Which DVCS?

My previous experience with distributed version control was using git to maintain my webpages. There was so much about it that didn’t make sense that I decided its creator was not using the Rule of Least Surprise. He claims to have never used CVS for Linux, so I can understand him not using CVS-like commands. However, the creators of Mercurial and Bazaar seem to have noticed that a lot of people who aren’t Linus Torvalds have also been writing software over the past few years: these two DVC systems do use syntax that is mostly familiar to a habitual Subversion user like me.

I got pretty psyched about Bazaar and read a lot about it over the past few weeks. However, despite the claims made on the bzr website, bzr is really slow. No matter what I do, it’s really slow. I was just playing around with it by cloning a repository on the same machine (again, that is a selling point) and no matter what it was deathly slow with just a few text files and minor changes. I’m not the only one who thinks it’s slow: Emacs maintainer Staffan Monnier recently wrote Emacs-devel to say that some minor pushes had taken over 30 minutes. I liked the bzr interface, but considering that hg has pretty much the same interface and is way faster I decided to stick with using hg for my Org Files. [Update: I am using bzr for a project on LaunchPad.net] The only remaining task is to figure out how to maintain forks of my configuration files on different machines. I think I have just been ignoring very basic rules of usage for merging, so that should not be hard.

Conclusions?

My conclusion is that using Mercurial is a good idea for my needs, and perhaps I can make it work using configuration files again. However, it is not the panacea that its proponents advertise, nor do we need to necessarily rethink version control completely. Version control is good for many things and making bold statements like “forking is fundamental” is really uncalled for. Those sorts of conclusions are specific to the particular needs of the developers involved and not necessarily for me. I’m not going to convert any of my major SVN projects to bzr as I originally intended because as I see it, DVCS does not offer any major advantages over centralized version control. Maybe it does for Linux kernel developers, Firefox developers, etc. For me it’s not a major improvement and it’s not going to help me work any better. I’m going to keep using Subversion and Mercurial, and we’ll see what happens.

Subversion Causes Brain Damage (according to Joel Spolsky)

June 22, 2010 4 comments

I just read part of a Mercurial tutorial by Joel Spolsky and I’ve been thinking about it since I read it. Some people just don’t get it. In this case Spolsky doesn’t seem to get what Subversion is for. He makes the claim that people who use Subversion are “Brain-damaged” for the following reason:

Subversion team members often go days or weeks without checking anything in. In Subversion teams, newbies are terrified of checking any code in, for fear of breaking the build, or pissing off Mike, the senior developer, or whatever. Mike once got so angry about a checkin that broke the build that he stormed into an intern’s cubicle and swept off all the contents of his desk and shouted, “This is your last day!” (It wasn’t, but the poor intern practically wet his pants.)

Okay, so his point is made, but the problem is that he and these people obviously haven’t read The Subversion Book. What he’s describing is people checking in code that they haven’t tested, which is really stupid. What the book says to do is always test before you commit, and if you’re trying something radical make a private branch (which is bloody easy with Subversion), and do your experimenting. Then you merge that branch into a working copy of the trunk, test the changes, and commit if it works. The book generally encourages boldness in the name of collaboration; they certainly discourage the despotic “lead developer Mike” depicted above. If you find out someone’s committed a bug, the mature thing to do is revert the change, then go to the person (or send him an email) and say “Please test before committing.”

I’ll admit that I’ve not worked at the same level of collaboration as Mr. Spolsky, but I have read the (friendly) manual. Version control is all about the ability to track changes, make branches, and do thorough testing. And most of all it’s about the ability to maintain the entire history of your project so you can undo big mistakes. I’ll also say that my attempts so far to demonstrate Subversion to people have resulted in people simply not getting it. However, what they didn’t get was version control, not centralized version control. The situation would have been no different if I’d shown them Mercurial.

Furthermore, every Subversion (or git, or bzr, or hg, or CVS) repository I’ve ever gotten code from has a big disclaimer that says “This might be broken, don’t expect it to work.” Even Emacs has that and it has always worked. I guess this sort of thing is expected in the free software community whereas it is part of a huge scheme of denial in the non-free software community (if you can call it a community).

The other thing that Spolsky doesn’t seem to get is Unix. What Subversion does is model a Unix filesystem hierarchy and record the changes to that filesystem hierarchy. One of the beautiful things about Unix is that the filesystem hierarchy is significant, i.e. it means something. Subversion knows that and it does a damn good job of exploiting all the beauty of it. Those of us who can tell that Unix is The Right Thing are always going to have a hard time explaining how simply beautiful it is. Subversion has obviously spread beyond the “worse is better” community, and people out there often just don’t get it.

A little background

If you’re one of my non-programming friends or you’ve never used version control, or are thinking about it: Version control is a scheme for recording changes to a set of files. For example you can keep track of changes to documents, program code, or basic directory structures. If someone makes a change to a program and introduces a bug, you can undo it, in a way.

For a long time the only game in town was a set of programs called CVS (for Concurrent Version System). CVS was cool in its time but had a number of problems, so instead of improving it beyond all recognition, a team of developers created a beefed up, slightly more sophisticated one called Subversion.

With CVS and Subversion, you download (“check out”) code from a centralized repository (a database system on a server somewhere) that everybody checks their code into and out of. The repository is a centralized place for development. This model does have some shortcomings, though none that are completely damning; it’s still good for many types of projects.

As an alternative the Linux Kernel developers and a few others developed distributed version control systems, where every developer has his own “repository” locally where he or she checks in changes to code. To communicate changes to others, or get theirs, you “push” and “pull” changes, respectively. This has a number of advantages, but again it’s not a panacea. It’s good for some projects whereas Subversion is better for others.

For the kind of programming I do, Subversion is my choice. Also for my homepage, I use Subversion because the directory structure of my webpages is very important. However, for my configuration files and a writing project that I expect is years away from publication, I use Mercurial.

The unfortunate thing about distributed version control is that it’s a fad and therefore everybody’s raving about it and dissing the perfectly good system of centralized version control. I think that Mercurial and git are better than a fad, but I don’t think they solve every problem the way some people do. They start referring to decentralized systems as “modern.”

About the author

Joel Adamson is a graduate student in evolutionary biology; he uses Subversion and Mercurial for managing his projects.

%d bloggers like this: