Recently I decided that maintaining my homepage in HTML was getting too laborious: the primary problem was things like lists and hierarchies. I have used Emacs’ Org Mode for my daily agenda for almost five years, and decided that it was the right tool for organizing these structures. Org Mode allows you to view “your life in plain text,” which is, of course, the most versatile way to do so. What Org can also do is export your hierarchical documents to HTML, LaTeX and many other formats (including formatted ascii, which is very nice). Along with this is the feature org-publish that uses Tramp to transfer a set of exported HTML files (and other files) to another location.
Read the Org Manual’s section on org-publish: you can find a simple example there. A single variable called `org-publish-project-alist’ configures all the stuff you need to publish an entire website. Here’s mine:
(setq org-publish-project-alist '(("mysite" :base-directory "~/Documents/web/" :base-extension "org" :recursive t :section-numbers nil :table-of-contents nil :publishing-directory "/ssh:firstname.lastname@example.org:~/public_html" :style "") ("imgs" :base-directory "~/Documents/web/imgs/" :base-extension "jpg\\|gif\\|png" :publishing-directory "/ssh:email@example.com:~/public_html/imgs" :publishing-function org-publish-attachment :recursive t) ("etc" :base-directory "~/Documents/web/" :base-extension "css\\|bib\\|el" :publishing-directory "/ssh:firstname.lastname@example.org:~/public_html" :publishing-function org-publish-attachment) ("docs" :base-directory "~/Documents/web/docs/" :base-extension "html\\|tex\\|bib" :publishing-directory "/ssh:email@example.com:~/public_html/docs" :publishing-function org-publish-attachment) ("thewholedamnshow" :components ("mysite" "imgs" "etc" "docs"))))
After a few days of having this in my .emacs I decided this needed its own file, which I called “project.el” and placed in the home directory of my project.
Each one of the members of this list is a “project.” Projects can include other projects by using the “:components” property. Suppose my website’s files are in the directory “~/Documents/web/”. This is where I keep the actual org-mode files, css files and any other files I want to publish. The property “:publishing-directory” puts the exported files in the specified location, which is a tramp url. The trick is really the property “:publishing-function,” which tells `org-publish’ how to treat the files. If left blank, this will translate the files into HTML. For .css files and other stuff you might link to (e.g. my .bib or tex .files, or images) you can use the function `org-publish-attachment’, which does no translation.
The crucial part of this variable is then the last “project,” which has only a “:components” property. This includes all the other projects, and hence when I publish “thewholedamnshow” using `org-publish’ my entire set of files is exported and uploaded.
Now I have all the sources for my website in one directory. Before I had used a highly hierarchical setup that made links very complicated. After realizing that I didn’t have actually that much content, I now have all the org files in the toplevel directory, with two subdirectories: one for images and one for special documents that are not in Org Mode. These are essays or LaTeX documents that are already finished works and I do not expect to change them.
I keep all the Org Mode source files in Bazaar. This greatly simplifies things. With project.el included along with the website, I can work on this on any machine as long as I evaluate that variable before I upload using `org-publish’.
A huge advantage is that now everything (including my CV, publications, and my ever-expanding academic FAQ) is in Org Mode. This means that changes are super-easy, even structural changes that I wouldn’t have attempted with HTML. So now when I need to update my CV, or add an FAQ, all I do is edit in Org Mode, something that I am very familiar with because I do it most of the day every day. I actually just categorized my FAQ using Org Mode in a matter of minutes. Linking with Org Mode is also incredibly easy, and the exporter knows how to handle links to files, headlines within files and internet urls. Also, since these documents are now in Org Mode, if someone wants a PDF version, all it takes is a few keystrokes to produce it.
The weirdest thing is how easy this is once I figured it all out. After only a week of tinkering, I now have a website that I can update or make major changes to in a matter of minutes. It looks better, is easier to maintain and easier to configure.
Most people think “programming is for programmers,” and by “programmers” they mean people who earn a living writing software, i.e. “end-user” software: software that people will buy, or that will be used in some big company. However, recently I’ve overheard a lot of talk from people in the business world about what those large companies do, and much of it sounds like it could be done by simple computer programs. The problem is that people don’t learn programming, nor do they learn to think of their problems as amenable to programming. I surmise that for most people, a programming solution doesn’t even enter into their thinking.
At a recent breakfast conversation, my brother told me that at his company most of the problems that come up result from people not thinking of something if a notification doesn’t come up on their computer screens and ask them. Even if they know there’s a problem, they won’t do anything about it if they don’t see it right there in front of their faces. They won’t even get up and walk five feet over to the guy in charge of that problem to ask him. These people and their tasks could be replaced with simple programs. He also told me that the corporation he works for uses none of the operations research or systems design theory that he learned in business school. Everything is just left up to guessing at the best solution and sticking with it for years, regardless of how much it costs or the alternatives.
I also sat next to some people in the airport who were using Excel and mentioned SAP (which my brother tells me is basically a money-milking machine; the companies who buy it are the cows). One of these people said her current project was “organizing [inaudible] into categories based on characteristics, including horsepower…they weren’t labeled with horsepower.” She was doing it “by hand.” One of my missions in my last job and in graduate school is to intercede whenever I hear the phrase “by hand.” We have computers. “By hand” should be a thing of the past. This young woman apparently didn’t think of her task algorithmically. Why would she when it’s unlikely any of her education included the word “algorithm?”
These patterns highlight the current failings of commercial computing. Commercial computing has one goal: sell computers. For most of the history of computing, this approach has been focused on hardware, but now people mostly see it as software. Commercial computing’s current goals are to sell people software as if it were hardware and then walk away, wagging your finger when the customer comes back complaining that it doesn’t work. Eric Raymond calls this the “manufacturing delusion.” Programs aren’t truly manufactured because they have zero marginal costs (it costs only as much to make a billion copies of a program as it does to make one copy). Commercial computing focuses on monolithic hardware and software, i.e. programs that try to do everything the user might need, and funneling everyone’s work through that program. That doesn’t work.
Academic computing, on the other hand, has the perspective that if something doesn’t work the way you need it to work, you rewire it, you combine it with something else, or build onto it so that it will accomplish a specific task. People literally rewired computers up until twenty-five years ago, when it became cheaper to buy a new machine (if anyone can correct me on that date, please let me know). Similarly for software, if the software you have doesn’t do the job you need, you write the software to do the job you need. If you have several programs that decompose the problem, you tie them together into a workflow. Suppose you have a specific problem, even one that you will only do once, and might take you one day to program — potentially saving you a week of “by hand” — then you write a program for it. Then if you ever have to do it again, you already have a program. You might also have a new problem that is very similar. So you broaden the scope of the previous program. Recently I wrote a script that inserted copyright notices with the proper licenses into a huge number of files. I had to insert the right notice, either for the GPL or All Rights Reserved based on the content of those files. On the other hand, if you have a program that generally does what you want, e.g. edits text, and you want it to do something specific, you extend that program to do what you need.
Basically I see a dichotomy between the thinking that certain people should make money, and solving problems only to the extent that solving their problems makes those people a lot of money, versus actually solving problems. If you disagree that this dichotomy exists, let me know and I’ll show you academic computing in action.
The solution for all these problems is teaching people to think algorithmically. Algorithmic thinking is inherent in the use of certain software and therefore that software should be used to teach algorithmic thinking. Teaching people algorithmic thinking using Excel is fine, but Excel is not free software, and thus should not be considered “available” to anyone. Teaching these skills in non-computer classes will get the point across to people that they will be able to apply these skills in any job. Teaching this to high school students will give them the skills that they need to streamline their work: they will be able to do more work, do the work of more people, communicate better and think through problems instead of just sitting there. People will also know when someone else is bullshitting them, trying to sell them something that they don’t need. Make no mistake, I’m not saying that teaching programming will get rid of laziness, but it might make it a lot harder to tolerate laziness. If you know that you can replace that lazy person with a very small shell script then where will the lazy people work?
If you teach biology, or any field that is not “computer science,” then I urge you to start teaching your students to handle their problems algorithmically. Teach them programming! I am going to try to create a project to teach this to undergraduate students. I have in mind a Scheme interpreter tied to a graphics engine, or perhaps teaching people using R, since it has graphics included. Arrgh…Scheme is just so much prettier. Teaching them the crucial ideas behind Unix and Emacs will go a long way. Unix thinking is workflow thinking. Unix (which most often these days is actually GNU/Linux) really shines when you take several programs and link them together, each doing its task to accomplish a larger goal. Emacs thinking is extension-oriented thinking. Both are forms of algorithmic thinking.
If you are a scientist, then stop procrastinating and learn a programming language. To be successful you will have to learn how to program a computer for specific tasks at some point in your career. Every scientist I know spends a huge amount of time engaged in programming. Whenever I visit my grad student friends, their shelves and desks are littered with books on Perl, MySQL, Python, R and Ruby. I suggest learning Scheme, but if you have people around you programming in Python, then go for it. I also suggest learning the basics of how to write shell-scripts: a lot of people use Perl when they should use shell-scripts. Learn to use awk, sed and grep and you will be impressed with what you can do. The chapters of Linux in a Nutshell should be enough to get you going. Classic Shell Scripting is an excellent book on the subject. Use Emacs and you’ll get a taste of just how many things “text editing” can be.
Every profession today is highly data-oriented. Anybody hoping to gain employment in any profession will benefit from this sort of learning. Whether people go into farming, business, science or anything else, they will succeed for several reasons. There are the obvious benefits of getting more work done, but there are also social reasons we should teach people algorithmic thinking. The biggest social reason is that teaching algorithmic thinking removes the divide between “customers” and “programmers.” This is why it bothers me to hear “open source” commentators constantly referring to “enterprise” and “customers” and “consumers.” If a farmer graduates from high school knowing that he can program his way to more efficient land use, then he will not be at the mercy of someone who wants to sell him a box containing the supposed secret. Again, you can teach algorithmic thinking using Excel, but then you already have the divide between Microsoft and the user. With free software that divide just doesn’t exist.
Last weekend I thought I just couldn’t wait any longer for Gnome-shell and Gnome 3, so I tried to install Fedora Rawhide on my laptop. After trying to yum update the system, a weird thing happened and the machine turned off spontaneously. Then when I rebooted the Live USB I’d made with Unetbootin, a mysterious mixture of Gnome 3 and Gnome 2 became the default desktop (I am not joking, there was a Gnome 2 panel sitting right on top of Gnome-shell). This was too weird. I decided to just give that up until Tuesday when I knew the Fedora Alpha was coming out.
However, I also knew about a relatively new distro called #! (Crunchbang) that I really wanted to try. The only thing holding me back was that it’s based on Debian. I have had seriously bad times with Ubuntu in the past, and my few attempts at installing Debian had not gone well (the first time the machine completely froze up the first time I opened synaptic). Despite my difficulties with Ubuntu and Debian, I’ve always acknowledged that Debian has a lot going for it, and Crunchbang’s “philosophy” certainly agreed with me. This laptop is not “underpowered” but sometimes I’ve felt like Gnome is a bit of overkill; the only reason I use Gnome is because my two main applications (or “shells”) are Emacs and Firefox, both GTK applications. Much to my delight, I found that Crunchbang has an Xfce version.
I thought I’d give it a try. I download the .iso for the Xfce 4.4 version, made a bootable USB with Unetbootin (no CD drive) and cranked it up. I selected an encrypted system this time, another reason I wanted to reinstall; the university is pretending to enforce rules about keeping student information private, so I’d like to be able to tell them my laptop is encrypted. The only surprise was that the installer took about two hours to erase my partitions before it started the actual install. Once that was done, however, the installation was really nice. The installer asks which features you want to install (Java, web server, development tools like version control systems and the autotools) — this was already much nicer than most other distros I’ve tried — all in a text interface running in a terminal emulator. This is nice because I’d rather just install that stuff before I need it, and while the installer knows which packages those are. In Fedora I can install package groups, but it’s just much nicer to take care of it at install time.
Just about everything that I need on a daily basis works really well with Crunchbang. The laptop speakers work, surprisingly without monkeying around with anything. I had to install the wireless drivers from the Realtek website again, but surprisingly the next morning when I booted the machine the wireless card worked without having to copy any firmware or anything. Nice!
There were two interesting surprises: the default web browser is Chromium. I really like Firefox, so I installed Iceweasel, and have had no problems; I really need Firefox because I use Zotero. Another major surprise is Youtube using gnash works really well. Of course, it’s not perfect, in fact the BAcksliders froze the whole machine when I tried to put it on 1080p.
Remarkably I’ve had no problems yet with package management. Of the three or four times I’ve installed Ubuntu it was only a matter of time before something got really screwed up in the normal course of updating the system. It was pathetic. Once I got a stale file handle of all things and there was nothing I could do to update the system (this would require a fresh install to fix). Nothing’s broken yet. I’ll give it a little time, but I’m taking the blame back from Debian and putting it squarely on Ubuntu.
Probably the nicest thing about Crunchbang is that it lives up its advertisement. I get annoyed when the promo for a software project says their main application or distro is “light and fast”; everybody says their thing is “light and fast.” Crunchbang’s web site merely says Crunchbang offers
a great blend of speed, style and substance. Using the nimble Openbox window manager, it is highly customisable and provides a modern, full-featured GNU/Linux system without sacrificing performance.–Home Page
This also means it’s the antithesis of Ubuntu: I can’t do anything with Ubuntu without it sending a notification my way saying “You can’t do that,” or “Wouldn’t you rather do this?” If I wanted that I would use a Mac.
What I haven’t done
As amazing as the fact that I’ve done all my daily work with Crunchbang for the past week, and only turned on my home workstation for use as a DVD player, is what I haven’t done. I have not changed the theme, I have not changed the wallpaper, I have not changed the … anything. And these are things that I compulsively tinker with, so that’s saying a lot.
If you’ve been hesitating trying Crunchbang for any of the reasons I was (mistaking it for Ubuntu, just not needing to try another distro), I encourage you to try it. It had already changed my mind about swearing off Debian-based distros. I am loving it.
Lately you’ve heard me say that my feelings toward laptops have changed. Since getting my new laptop, some of my feelings toward reading have changed as well. I love paper, and I love the look of printed letters, and well typeset text on the page. That won’t change. However, I noticed that most of the texts that I read (journal articles) I can read in online versions without missing much of the content. I’ve started exclusively reading current articles either online or in PDF form on my laptop and I’m glad to be conserving paper.
One thing that hasn’t changed — things that I’ve always read on my computer — are GNU manuals. GNU manuals are written in an ingenious format called TeXinfo which enables the author to produce appropriate output for several different ways of reading: PDF, HTML and the online info format, most easily read in Emacs. If you’re running GNU/Linux, you will find tons of manuals in this format by typing “info” into a terminal. Within Emacs, type “F1 h” (that’s press and release F1, then press and release ‘h’). Either way you should get a menu of topics, each covered by its own info manual.
Since deciding on Sunday that my programming goal should be better programming, rather than learning a new language, I started reading advanced topics in Unix/GNU programming: processes, pipes, IPC, etc. I was thinking “Man, I need to get that classic book Advanced Programming in the Unix Environment.” Unfortunately this book is HUGE, I wouldn’t carry it around with me, as reducing back strain is currently high on my agenda. It also dates from 1992 (around the time I first used Unix) and some things have changed since then. Most of the things the book is about have not changed, but most texts show their age in one way or another. Most Unix texts from this time look like casualties of the Unix wars, with more than half their content explaining incompatibilities between different version of Unix, and the pitfalls of writing portable programs.
So of course, I went for (what I thought was) the next best thing, something I already had and could carry around with me at no extra weight: the GNU C Library Manual (in the info menu, type “m” and then enter “libc” and hit Enter). I have been reading about the basics of IPC and processes for a while off and on, and there were things that I just didn’t get about them. I get them now, having read them in the Libc manual. For example, I didn’t understand that a child process and its parent process receive different return values from fork(); the Libc Manual spells this out so clearly I wonder why I didn’t think of it before. I didn’t get how the child process and parent process’ distinct code portions were triggered, but that was only because I hadn’t read the f’ing manual.
These manuals don’t read like terse manpages, they read like manuals that you would actually want to read. The Libc manual and the Emacs manual both repeatedly surprise me. Emacs users often joke about learning new “features” of Emacs that have actually existed for decades. Whenever I am frustrated with Emacs in some way, I’ll usually find a workaround, and then months later I’ll be reading the manual for some unrelated cause and find a solution to my problem. It was right there the whole time! You can imagine how empowering reading these manuals is.
The weird thing is that although I’ve repeatedly had this experience with GNU Manuals, they aren’t the first thing that I go to. I need to change that habit. We often treat reference materials as though we shouldn’t sit and read them, we should instead browse through them until we find what we need and then put them away. That’s what manpages are for. GNU Manuals are different. GNU Manuals actually tell you what’s going on and what to do: they are great for beginning programmers. I’m not going to waste my time going to the library; I’m going to read the Libc Manual.
eBooks, rms and DRM
Recently ebooks outsold paperbacks on Amazon.com. People may be treating this as the final sign that the death of paper is coming, but I don’t, for one considering that Amazon has been set up as an ebook store from the very beginning, i.e. they’re on the friggin’ web — it’s obvious they would try to compete by delivering their content as quickly and conveniently as possible. I’ve always seen it as a goal of theirs, although I think back in the 90s most of us thought ebooks would just be webpages, rather than something you’d actually carry around, i.e. we thought they would be different enough from regular books to combat the problems of regular books.
Amazon however has a different idea: they and their competitors would like you to think of ebooks as the same as regular books, just lighter weight, and easier to pay for. Their ridiculous idea of “e-lending” is so stupidly backward that I laughed out loud when I heard of it:
They have managed to recreate, in the palm of a reader’s hand, the thrill of tracking down a call number deep in the library stacks only to find its spot occupied by empty space. With a clever arrangement of bytes, they have enabled users to experience the equivalent of being without their books while their friends’ dogs chew on them. Maybe if we’re lucky, next they’ll implement the feature that allows two electronic pages to be stuck together as if by gum, or that translates coffee spilled on the screen into equivalent damage to the digital pages.–John Sullivan, Lending: A solved problem
They’ve done this with DRM or “Digital Restrictions Management.” Its practitioners call it “Digital Rights Management,” which I think is sinister enough: do you want your rights digitally managed? They’ve managed to make ebooks just as problematic as paper books, and why?
The question of their motives becomes so much clearer when we consider that not only did Richard Stallman create great free books about computing, like the Emacs Manual and the GNU Libc manual, he also helped create the best ebook reader out there (info), and all with the goal that it will facilitate user freedom. The choice is yours: do you want ebooks to be as inconvenient as regular books? Or would you rather have convenient, indexed, hyperlinked text written by people who care about you and your freedom? The choice is clear to me.
Some Further Reading
The history is about as interesting as the books themselves. Some people think that ebooks (or the concept) is new, just as they think about tablet computers and touch screens. Both touch screens and “ebooks” are about as old as computing itself. If you’re skeptical about that, think of how simple an idea it is: many, many books that you can carry around in your pocket at no additional weight. “Hey let’s use computers,” is a pretty simple solution. Computers were almost built for the task. The only new idea is making ebooks as inconvenient as paper books. I’m reminded of Douglas Adams‘ explanation that if a hitch-hiker wanted to carry a paper copy of The Hitchhiker’s Guide to the Galaxy (a text that bears a strange resemblance to Wikipedia), he would have to carry several enormous buildings with him.
Fairly often I see people asking in online communities “which programming language should I learn next?” I have asked this question often recently. I want to learn something new. I always enjoy learning new things. It’s what I do. I’m a scientist, and my current occupation (what I put on forms) is “student,” but I think of myself as a student in a much more holistic way. I always enjoy learning and I seek it out on a moment-to-moment basis. Programming is a large part of what I do: in a way, it’s also been my “occupation” for at least the past five years. I programmed in Stata for my job before I came to graduate school and now I use C, Scheme, bash, Emacs Lisp and a few other “languages” every day.
I feel like I reached a plateau of sorts a couple of years ago, after studying languages at a rate of about two per month for at least two years. By that I mean I studied Emacs Lisp and Python for a month, then things seemed to shift to Scheme and R, or Perl and Common Lisp for the next month. I think I intensely studied about ten languages over three years, including various Unix shells and a few specialty languages (like Mathematica: yuck!) . There’s still a whole bunch that I would say I’m conversant in, and some even that I use as a fairly essential part of my work, that I might be able to use better if I knew them better, like TeX. As my graduate school research picked up, however, I settled on C and Scheme as my main languages.
I found this plateau somewhat dismaying: as I said I always want to learn new things, and there seem to be really cool languages out there that I could learn. For about two years I’ve been casually reading about, and doing minor coding in the ML family and Haskell. However in each case I’ve found that there are reasons I shouldn’t bother. Here are my conclusions:
- My needs as a programmer are different from the vast majority of people who put the title of this posting into Google
- Most programs people want to write are quite different from the ones that I want to write
- I really like the Unix workflow
Other Programmers Learn For Jobs
In my discussion of object-oriented programming I got the comment quite often that “You need to know object-oriented programming because it controls complexity, and is therefore essential in corporate programming environments, so if you want a job…” End of discussion. Don’t believe the hype. If you want such a job, then by all means, learn Java. If you’re more like me, and you realize that programming is not the hardest part of most jobs then focus on those other parts, and get good at using whichever programming paradigm is most well-suited to the task at hand. Don’t obsess about which programming paradigm is most suited to having people fire you easily.
Other Programmers Write Monolithic, Interactive Programs
The programming task that I’m most often using is numerical analysis, the oldest programming task in the universe — the one that pre-dates computers. I conclude that the source of my confusion with many programming texts and the explanations given is that other programmers are interested in (or at least authors are trying to interest them in) designing large, monolithic, interactive programs. In my mind there are only a few good examples of such programs, and they are already written: Emacs, the shell, window managers and file managers, and a web-browser (which is really a noninteractive program dressed up as an interactive one). I’m not going to write one of those. Seems to me like most people learning Haskell, for example, are coming from writing monolithic programs in C++ or Java, and probably on Microsoft Windows.
What’s particularly funny to me about this is that this split goes back a few decades to the “Worse is better” controversy of the early nineties. Unix’ detractors generally believed in writing monolithic programs and their favorite development environments were eclipsed by Unix and the hardware it came with (workstations). I guess Microsoft and Apple were able to steer people away from Unix once again; now people come from environments where they are used to building these monolithic programs to Unix-like systems, and they don’t find out they can use computers a particular way. I started using Unix when I was thirteen: I guess this means I’m old. I’d rather be an old Unix-user than a young anything.
There are a few other reasons I’m not writing such big programs: an interactive environment for numerical operations only makes sense up to a point. It’s great for experimenting. However, even in Stata I ended up writing scripts, in a programmatic style, and executing them in batch mode, carefully writing the important results to output, and saving graphs as files. Either those programs have been written and are awesome, or I don’t need monolithic, interactive programs to do the things I’m doing. I have a different perspective on how people should use computers.
Unix Philosophy Works For Me
I often read that the Unix philosophy became “Do one thing and do it well.” Other people seem to want to start a program, work with just that program for a long time, and then do something else using a different huge, monolithic program. I think that’s a waste of time. It sounds extremely limiting. Especially when I have a whole bunch of tools available to integrate my work into a common whole. I often read the derisive aphorism “When all you’ve got is a hammer, everything starts to look like a nail.” I think the supposed wisdom of that remark is placed elsewhere, but it has the opposite meaning when speaking about using Unix tools. Yes, when you have Make, everything starts to look like targets and dependencies. When you have sed and awk, everything becomes text processing.
Consequently all I need is an editor to make me happy. I use Emacs, which becomes a whole “working environment,” but I could get by using vi with shell access (however much it hurts me to say that). Everything becomes editing when you have the idea that to use a computer is to write programs, and you know which tools can glue those programs together. Then all you need is a single command (e.g. “make”) to get the ball rolling. Given this perspective, learning new languages just becomes a matter of fitting those into an existing workflow. I generally think of programs as “input-output” and it’s okay if that input is a program, but it shouldn’t try to be its own environment and supersede Unix.
The language that fits in best with Unix philosophy and GNU Tools is C. Not only does C fit in, the GNU system is built around it, including a huge number of tools that make using C really, really easy. Automake, autoconf and the other auto-tools mean that all I have to do is write a little program, write a little Makefile.am, use autoscan and a few other things, and “make” builds me a program. Writing programs for simple number-crunching also means that most of the problems people associate with C are not my problems. I don’t have memory leaks in my programs, they just don’t happen. Therefore I don’t really need to care about having a language with garbage collection. Everybody’s screaming about making programs faster with parallel execution, but that’s for web-servers, databases that get called by web-servers, and other things that I’m not writing. C is extremely fast for number crunching, and we can make the kernel run parallel jobs using “make -j” or GNU Parallel. C is just fine.
Am I the only one out there interested in using something other than Fortran for number-crunching? Probably yes, but I can use C. I don’t need Haskell. I like the mathematical cleanliness of Haskell, but that doesn’t matter when I already know a functional language (Scheme), can already write bloody-fast numerical algorithms in C, and can run parallel jobs with Make. I read a lot of stuff about writing parallel programs and other features of supposedly “modern” languages, but they are almost always things important for writing web servers or GUIs, things that I’m not doing.
I’m still tempted to learn certain languages: here’s a run-down of why.
C++ is still tantalizing because so many people know it. In addition to that, it seems to have a very mature set of standard libraries. However, especially when I hear people say stuff like “Many C++ programmers write in C, but just don’t know it,” it seems still more unnecessary. C++ has a large community, GNU development tools, and seems like I’d have to change very little of how I do my work in order to learn it. All I would have to learn is the language.
D is an interesting language because it includes well-implemented features of some other language platforms, like garbage collection. D seems basically like C with some added on features, and the ability to extend its programming paradigms. I haven’t taken the steps to see what kind of development tools are available for D, so I haven’t given it the full evaluation yet. Unfortunately, it doesn’t seem to have a large enough user community to fit fully in with GNU yet, which is a critical factor.
The big thing Haskell has going for it is that Fedora has a full complement of development tools to form a Haskell environment. Haskell has just as huge a network of tools as Lisp (close enough to look that way), so that would make it easy to get going. I think the problems with Haskell are that it seems too difficult to get going with, it seeks to be its own environment (i.e. doesn’t fit in with my working environment), seems suited to doing other things than I would do with it, and I don’t need to learn it. I would really like to learn it, but all these things just add up to me saying “I don’t have the time.” That’s fewer words than all that other stuff I said.
What Makes a Language Usable or Worth Learning?
This is a common question I see people discuss: most often I’ve seen it in “Common Lisp vs. Scheme” discussions common in Lisp forums. The question there seems directed at why Common Lisp has been so much more popular than Scheme. That’s a dubious premise, seeing that many people learn Scheme in college CS classes, at least that’s my impression (as I said, I’ve never taken such a class). The real premise of the question is “Why does Common Lisp have so many libraries, whereas Scheme makes you recreate format?” Paul Graham’s creation of Arc was driven by this contention: people say “If you want to actually get work done, use Common Lisp,” but Scheme is so cool, right? I have come to a different question which is “How does this language fit into my workflow?” This was also a critical part of choosing a Scheme implementation. There are tons of them, but they are all designed for slightly different purposes, or they are someone’s proof-of-concept compiler. Guile is a great example of the reasons I would put time into learning to use a particular language.
I find the relevant factors in choosing to spend time with a language are (a) fitting in with Unix workflow/mindset, (b) a good community (hopefully aligned with GNU), (c) libraries, utilities and functions that have the features I expect, (d) development and workflow tools and (e) good learning materials. I have found that certain languages or implementations fit all these features, and some fit some, but not others. The best is obviously C, which has all these qualities. Guile is the best Scheme implementation because it has all these qualities. Guile even integrates with C; I think my next big project will be C with Guile fully integrated. Python has a great community, but it’s quite distinct from the GNU community, the community I prefer. I’m less likely to find a fellow Unix-minded friend in the Python community. Haskell has good support on Fedora, but I haven’t found a good text for learning it. Pike looks thoroughly Unixy in its syntax, but its development materials, or even its interactive environment are not available in Fedora repositories. I’ve found the tools that work for me, and I suppose the best thing is to learn how to use them better.
Over the past few months I’ve been working with new tools to enable me to work on several machines and keep the same important data, such as configurations and bookmarks. The impetus for this was that I’ve started to use a laptop: I never wanted one, but then somebody on the TriLUG mailing list offered one for sale for $20, and I just bought it. It’s really useful for when I’m watching the kids or when travelling: I don’t have to use someone’s antiquated, slow Windows computer just because I’m visiting them. Also my wife needs to be on our main machine after we put the kids to bed. Short of acting out my fantasy and installing a server with LTSP terminals throughout the house, the laptop is good for me to keep working. Unfortunately the usual routine of setting up my shell, Emacs, Firefox, email and my Org Mode agenda files seemed so laborious that I realized “there’s potential for automation here.”
To tackle email, I switched from MH to using IMAP. For Firefox I started using Weave. I was using Delicious for a long time, but Delicious is not free software so I decided I didn’t trust it. Weave is as free as Firefox, so when I heard about it I decided to go for it and it’s worked really well. I rarely used the social aspect of Delicious and mostly used it for portable bookmarks.
For the other two areas, Emacs and Org Mode files, the solution was less clear. I had tried using ssh urls with tramp to have my agenda on multiple machines, then I saw Carsten Dominik’s Tech Talk about Org Mode, where he described using git to manage his Org files on multiple computers. For config files (shell and Emacs) I had tried using rsync to mirror them on different machines, using a Makefile with rsync commands. However, different needs of different machines would always screw things up. Then I remembered that version control might be the right tool. I had tried that before with Subversion (SVN), my main version-control system (VCS), but things had not gone much better than with rsync. Then I thought perhaps a distributed version-control system (DVCS) would make more sense.
My first impetus for using something other than Subversion was that I’ve discovered having one project per repository makes the most sense; that way I can make branches and not worry about confusing anybody. So I have a repository for my webpages, and a repository for my biggest project. That works well. However, I also started working on a book (i.e. it became a book) and it really didn’t fit in either repository. I came down to a choice between adding another Subversion repository, with all the Apache setup, or using something else that would be more convenient. Although setting up Apache is not hard after the third or fourth time you’ve done it, I still felt like it was unnecessary. I knew I would be the only person working on this, and therefore something that I could configure to use ssh made the most sense.
This is the most compelling argument for distributed version control: it’s easy to set up a repository and it’s easy to configure access for local system users. With Mercurial (similar for git and bzr), you just do
joel@chondestes: ~/tmp > hg init repo joel@chondestes: ~/tmp > cd repo joel@chondestes: ~/tmp/repo > touch myself joel@chondestes: ~/tmp/repo > hg add myself joel@chondestes: ~/tmp/repo > hg status A myself
That to me is really compelling. Setting up a Subversion repository is also pretty easy, depending on where you want to put it, but configuring access from the internet is not as simple. I can access the above-initialized Mercurial repository just using an ssh url argument to its clone or pull commands.
Another thing that is really good about DVCS systems (all that I’ve tried) is that they’re really good at merging. They do not have a monopoly on this, however; Subversion has been good at merging for as long as I’ve been using it. Again, I may be different from the people writing the manuals in that I don’t work on large projects with lots of contributors.
For some reason, however, the biggest advantage of distributed version control touted by its proponents is that you can make commits when you don’t have internet access. Wow! That is huge. Oh wait, that’s sarcasm. I am in this situation pretty often working on my Subversion projects and it really doesn’t bother me. If I’ve come to a point where I should make a commit and I don’t have network access I can do one of two things: I can make a ChangeLog comment and keep working, or I can stop working. I always have other things I can do. Seriously I don’t see this as an advantage, especially when if what you want is to update another repository you have to push your changes anyway, and that requires network access. Committing changes to my Org-files would be useless unless I could push them to the computer I know I’ll be sitting at in the morning.
Another ridiculously inflated advantage proponents mention is that you don’t have to worry about breaking the build when you commit, because you make your commits to your local repository. I have spent another blog posting on this concept already, but again this is not a distinct advantage. I commit things that are broken when they’re broken already, but not if I’m adding a new feature. If you want to commit something on a new feature that might screw other things up, the proper way to do it with centralized version control is to make a branch. It seems like some people don’t think this is possible with Subversion, but I’ve been doing it since the beginning. Not only is it possible, it’s recommended.
There are two more big problems I have with distributed version control. First: it’s a fad. I don’t mean that like it’s something that is overblown and bound to die out like MC Hammer. However, it seems like everyone is switching to it and citing the same bad reasons. That to me seems like a warning. The rest of us who know how to use Subversion will just keep on going doing it the right way, reading the manual.
My second big problem that people who like DVCS seem to love is this “fork happens” idea. Forking is when there’s some kind of split in the development philosophy or goals between project members that leads to factionalism. The most famous example is the creating of XEmacs. The author of The Definitive Guide to Mercurial uses socially-oriented rhetoric (thank you, Eric S. Raymond) to justify distributed version control. He says basically that forking is natural and we’ve all been in denial by using centralized version control. Using DVCS on the other hand, brings us out of our comfort zone into some new promised-land where we all have our own fork.
This argument doesn’t really hold up. As others have pointed out, the idea that you’re always working on your own fork is kinda ridiculous. Unless you’re happy to just keep your own version, your contribution to the larger piece of software will always have to be transmitted to someone else. Why would you develop software in a vacuum?
The same Definitive Guide author says that some people use centralized version control because it gives them an illusion of authority, knowing that people won’t be off-putting out their code as someone else’s. While that’s possible, it certainly goes against the tone of the Subversion book, which encourages administrators to let developers develop their own branches freely. And again, even if you don’t have the illusion of authority over a project, you’re going to have to express authority at some point and decide whose changes to merge into the mainline. Now who’s delusional? People don’t want to download some dude’s fork, they want to download from a central point, where all the developers and users meet to discuss and code, and decide which changes should make the software the best.
My previous experience with distributed version control was using git to maintain my webpages. There was so much about it that didn’t make sense that I decided its creator was not using the Rule of Least Surprise. He claims to have never used CVS for Linux, so I can understand him not using CVS-like commands. However, the creators of Mercurial and Bazaar seem to have noticed that a lot of people who aren’t Linus Torvalds have also been writing software over the past few years: these two DVC systems do use syntax that is mostly familiar to a habitual Subversion user like me.
I got pretty psyched about Bazaar and read a lot about it over the past few weeks. However, despite the claims made on the bzr website, bzr is really slow. No matter what I do, it’s really slow. I was just playing around with it by cloning a repository on the same machine (again, that is a selling point) and no matter what it was deathly slow with just a few text files and minor changes. I’m not the only one who thinks it’s slow: Emacs maintainer Staffan Monnier recently wrote Emacs-devel to say that some minor pushes had taken over 30 minutes. I liked the bzr interface, but considering that hg has pretty much the same interface and is way faster I decided to stick with using hg for my Org Files. [Update: I am using bzr for a project on LaunchPad.net] The only remaining task is to figure out how to maintain forks of my configuration files on different machines. I think I have just been ignoring very basic rules of usage for merging, so that should not be hard.
My conclusion is that using Mercurial is a good idea for my needs, and perhaps I can make it work using configuration files again. However, it is not the panacea that its proponents advertise, nor do we need to necessarily rethink version control completely. Version control is good for many things and making bold statements like “forking is fundamental” is really uncalled for. Those sorts of conclusions are specific to the particular needs of the developers involved and not necessarily for me. I’m not going to convert any of my major SVN projects to bzr as I originally intended because as I see it, DVCS does not offer any major advantages over centralized version control. Maybe it does for Linux kernel developers, Firefox developers, etc. For me it’s not a major improvement and it’s not going to help me work any better. I’m going to keep using Subversion and Mercurial, and we’ll see what happens.