Fairly often I see people asking in online communities “which programming language should I learn next?” I have asked this question often recently. I want to learn something new. I always enjoy learning new things. It’s what I do. I’m a scientist, and my current occupation (what I put on forms) is “student,” but I think of myself as a student in a much more holistic way. I always enjoy learning and I seek it out on a moment-to-moment basis. Programming is a large part of what I do: in a way, it’s also been my “occupation” for at least the past five years. I programmed in Stata for my job before I came to graduate school and now I use C, Scheme, bash, Emacs Lisp and a few other “languages” every day.
I feel like I reached a plateau of sorts a couple of years ago, after studying languages at a rate of about two per month for at least two years. By that I mean I studied Emacs Lisp and Python for a month, then things seemed to shift to Scheme and R, or Perl and Common Lisp for the next month. I think I intensely studied about ten languages over three years, including various Unix shells and a few specialty languages (like Mathematica: yuck!) . There’s still a whole bunch that I would say I’m conversant in, and some even that I use as a fairly essential part of my work, that I might be able to use better if I knew them better, like TeX. As my graduate school research picked up, however, I settled on C and Scheme as my main languages.
I found this plateau somewhat dismaying: as I said I always want to learn new things, and there seem to be really cool languages out there that I could learn. For about two years I’ve been casually reading about, and doing minor coding in the ML family and Haskell. However in each case I’ve found that there are reasons I shouldn’t bother. Here are my conclusions:
- My needs as a programmer are different from the vast majority of people who put the title of this posting into Google
- Most programs people want to write are quite different from the ones that I want to write
- I really like the Unix workflow
Other Programmers Learn For Jobs
In my discussion of object-oriented programming I got the comment quite often that “You need to know object-oriented programming because it controls complexity, and is therefore essential in corporate programming environments, so if you want a job…” End of discussion. Don’t believe the hype. If you want such a job, then by all means, learn Java. If you’re more like me, and you realize that programming is not the hardest part of most jobs then focus on those other parts, and get good at using whichever programming paradigm is most well-suited to the task at hand. Don’t obsess about which programming paradigm is most suited to having people fire you easily.
Other Programmers Write Monolithic, Interactive Programs
The programming task that I’m most often using is numerical analysis, the oldest programming task in the universe — the one that pre-dates computers. I conclude that the source of my confusion with many programming texts and the explanations given is that other programmers are interested in (or at least authors are trying to interest them in) designing large, monolithic, interactive programs. In my mind there are only a few good examples of such programs, and they are already written: Emacs, the shell, window managers and file managers, and a web-browser (which is really a noninteractive program dressed up as an interactive one). I’m not going to write one of those. Seems to me like most people learning Haskell, for example, are coming from writing monolithic programs in C++ or Java, and probably on Microsoft Windows.
What’s particularly funny to me about this is that this split goes back a few decades to the “Worse is better” controversy of the early nineties. Unix’ detractors generally believed in writing monolithic programs and their favorite development environments were eclipsed by Unix and the hardware it came with (workstations). I guess Microsoft and Apple were able to steer people away from Unix once again; now people come from environments where they are used to building these monolithic programs to Unix-like systems, and they don’t find out they can use computers a particular way. I started using Unix when I was thirteen: I guess this means I’m old. I’d rather be an old Unix-user than a young anything.
There are a few other reasons I’m not writing such big programs: an interactive environment for numerical operations only makes sense up to a point. It’s great for experimenting. However, even in Stata I ended up writing scripts, in a programmatic style, and executing them in batch mode, carefully writing the important results to output, and saving graphs as files. Either those programs have been written and are awesome, or I don’t need monolithic, interactive programs to do the things I’m doing. I have a different perspective on how people should use computers.
Unix Philosophy Works For Me
I often read that the Unix philosophy became “Do one thing and do it well.” Other people seem to want to start a program, work with just that program for a long time, and then do something else using a different huge, monolithic program. I think that’s a waste of time. It sounds extremely limiting. Especially when I have a whole bunch of tools available to integrate my work into a common whole. I often read the derisive aphorism “When all you’ve got is a hammer, everything starts to look like a nail.” I think the supposed wisdom of that remark is placed elsewhere, but it has the opposite meaning when speaking about using Unix tools. Yes, when you have Make, everything starts to look like targets and dependencies. When you have sed and awk, everything becomes text processing.
Consequently all I need is an editor to make me happy. I use Emacs, which becomes a whole “working environment,” but I could get by using vi with shell access (however much it hurts me to say that). Everything becomes editing when you have the idea that to use a computer is to write programs, and you know which tools can glue those programs together. Then all you need is a single command (e.g. “make”) to get the ball rolling. Given this perspective, learning new languages just becomes a matter of fitting those into an existing workflow. I generally think of programs as “input-output” and it’s okay if that input is a program, but it shouldn’t try to be its own environment and supersede Unix.
The language that fits in best with Unix philosophy and GNU Tools is C. Not only does C fit in, the GNU system is built around it, including a huge number of tools that make using C really, really easy. Automake, autoconf and the other auto-tools mean that all I have to do is write a little program, write a little Makefile.am, use autoscan and a few other things, and “make” builds me a program. Writing programs for simple number-crunching also means that most of the problems people associate with C are not my problems. I don’t have memory leaks in my programs, they just don’t happen. Therefore I don’t really need to care about having a language with garbage collection. Everybody’s screaming about making programs faster with parallel execution, but that’s for web-servers, databases that get called by web-servers, and other things that I’m not writing. C is extremely fast for number crunching, and we can make the kernel run parallel jobs using “make -j” or GNU Parallel. C is just fine.
Am I the only one out there interested in using something other than Fortran for number-crunching? Probably yes, but I can use C. I don’t need Haskell. I like the mathematical cleanliness of Haskell, but that doesn’t matter when I already know a functional language (Scheme), can already write bloody-fast numerical algorithms in C, and can run parallel jobs with Make. I read a lot of stuff about writing parallel programs and other features of supposedly “modern” languages, but they are almost always things important for writing web servers or GUIs, things that I’m not doing.
I’m still tempted to learn certain languages: here’s a run-down of why.
C++ is still tantalizing because so many people know it. In addition to that, it seems to have a very mature set of standard libraries. However, especially when I hear people say stuff like “Many C++ programmers write in C, but just don’t know it,” it seems still more unnecessary. C++ has a large community, GNU development tools, and seems like I’d have to change very little of how I do my work in order to learn it. All I would have to learn is the language.
D is an interesting language because it includes well-implemented features of some other language platforms, like garbage collection. D seems basically like C with some added on features, and the ability to extend its programming paradigms. I haven’t taken the steps to see what kind of development tools are available for D, so I haven’t given it the full evaluation yet. Unfortunately, it doesn’t seem to have a large enough user community to fit fully in with GNU yet, which is a critical factor.
The big thing Haskell has going for it is that Fedora has a full complement of development tools to form a Haskell environment. Haskell has just as huge a network of tools as Lisp (close enough to look that way), so that would make it easy to get going. I think the problems with Haskell are that it seems too difficult to get going with, it seeks to be its own environment (i.e. doesn’t fit in with my working environment), seems suited to doing other things than I would do with it, and I don’t need to learn it. I would really like to learn it, but all these things just add up to me saying “I don’t have the time.” That’s fewer words than all that other stuff I said.
What Makes a Language Usable or Worth Learning?
This is a common question I see people discuss: most often I’ve seen it in “Common Lisp vs. Scheme” discussions common in Lisp forums. The question there seems directed at why Common Lisp has been so much more popular than Scheme. That’s a dubious premise, seeing that many people learn Scheme in college CS classes, at least that’s my impression (as I said, I’ve never taken such a class). The real premise of the question is “Why does Common Lisp have so many libraries, whereas Scheme makes you recreate format?” Paul Graham’s creation of Arc was driven by this contention: people say “If you want to actually get work done, use Common Lisp,” but Scheme is so cool, right? I have come to a different question which is “How does this language fit into my workflow?” This was also a critical part of choosing a Scheme implementation. There are tons of them, but they are all designed for slightly different purposes, or they are someone’s proof-of-concept compiler. Guile is a great example of the reasons I would put time into learning to use a particular language.
I find the relevant factors in choosing to spend time with a language are (a) fitting in with Unix workflow/mindset, (b) a good community (hopefully aligned with GNU), (c) libraries, utilities and functions that have the features I expect, (d) development and workflow tools and (e) good learning materials. I have found that certain languages or implementations fit all these features, and some fit some, but not others. The best is obviously C, which has all these qualities. Guile is the best Scheme implementation because it has all these qualities. Guile even integrates with C; I think my next big project will be C with Guile fully integrated. Python has a great community, but it’s quite distinct from the GNU community, the community I prefer. I’m less likely to find a fellow Unix-minded friend in the Python community. Haskell has good support on Fedora, but I haven’t found a good text for learning it. Pike looks thoroughly Unixy in its syntax, but its development materials, or even its interactive environment are not available in Fedora repositories. I’ve found the tools that work for me, and I suppose the best thing is to learn how to use them better.
I’m grappling with understanding what XML is really good for. I guess I understand that if I wanted to make an RSS reader or something like that, I would understand it right away. Mostly I’m not trying to understand what it’s good for but how it could help me. I understand that XML is for portable document (broadly-defined) exchange over the web, but it’s the meaning of “document” and “exchange” that require illumination.
I have a few big obstacles. For one, documents describing XML are only about syntax, which for me is the really really really really really really really really really really really really simple part. I mean, how much of a brain does it take to understand well-formedness and hierarchy? What’s hard to understand is poor-formedness and non-hierarchy (i.e. HTML). I reiterate that I come from a Unix background, where it’s impossible to think of things in a non-hierarchical manner, although some bad habits are creeping into the community from outside.
The next big obstacle is that many documents describing XML say that it’s big advantage is storing data in plain text files. My reaction is always “What idiot stores data in an opaque binary that requires a special program to read it?” Again, coming from a Unix background, this idea is hardly revolutionary. So it seems to me that the complexity of XML is a way to bring simplicity out of the Unix world, with just enough annoying complexity to satisfy people who insist on complexity. If it were too simple, they wouldn’t recognize it. So we have a potentially simple thing — data stored in plain text — in a completely verbose form: XML.
That verbosity brings to mind the common complaint about XML, which is that it’s really Lisp syntax with angle brackets and a bunch of stuff inside them. Consequently Lisp is a natural way to deal with XML. Again, this is bringing the ideas of one programming community (the Lisp community) to another (web programming).
I have thought of a few things I could do with XML, but most of all I don’t know why I feel the urge to learn more about it. One would be a graphing program that uses a web server and a browser to view results as SVG. That might be cool. Again, however, I have this voice inside that says “just tie together the right tools and automate them with a Makefile!”
Please share your thoughts.
Now this is something I’m going to check out. The creator of Gnus and Gmane has now brought us an rss reader that uses nntp.
For the second or third time in the course of a very large programming project I’m working on, I have discovered that a big runtime problem I was having was due not to program or computational complexity but because I did something very basically stupid. I’ve been programming a population genetic simulation that is, by its nature and rationale, very complex. One of the biggest problems in the history of population genetics is that programming multilocus (and in some case, multi-allele) problems results in hugely costly use of programming memory. Programs that seek to model systems with non-free recombination (recombination rates other than 0.5) must hold a huge number of variables.
Part of this problem I’ve solved by deploying haploid. However, the whole point of programming that algorithm was so that I could code even more complex things, like populations with age-structure, the project I’m talking about now. However, in trying to get this program to run I’ve had a string of problems that looked like the program was running out of resources. I have been using OpenMP to speed up the computations, but it seemed that I was running into either a race condition or some bizarre side-effect of parallelism where threads were waiting for threads that were otherwise blocked. In other words, I’d run my program, on my quad-core workstation, or on the university’s big cluster, and it would just stop after a while, despite the processors continuing to be totally occupied. A few times I had to actually turn my machine off; setting
OMP_NUM_THREADS didn’t seem to be helping either. I was really puzzled.
Then I met with my advisor and she suggested that instead of running a huge number of initial values, I should just be using one initial value (0.1) for one of my variables. I did this and got the same hangup behavior from the program. This was totally irritating me by this point, especially because my much-wiser adviser (she’s my sensei) looked at me with furrowed brow and said “It shouldn’t be taking that long.”
So I ran the program in
gdb (GNU Debugger). This had screwed me before, and it turned out gdb had a bug. I had spent all day chasing what was not a bug: when I ran the program it looked like the pointer I passed to a function was not the pointer evaluated when gdb entered the function. Then I realized that not only had
gcc failed to produce good debugging code, but I had just given the program incorrect input values.
In this case I came upon something else weird: a check that should have stopped the iteration instead evaluated to false. I was really confused until I looked at the types of the variables:
_Bool keep_going_p; keep_going_p = ... ; /* set the value */ if (keep_going_p == ERANGE) raise hell; else return library_book;
See the problem?
ERANGE is an integer. Booleans are stored as integers (i.e. you can assign an
int to a
_Bool), but they only evaluate to either true or false. 0 = false and everything else is true. So even though keep_going_p does in fact equal
ERANGE, the program can’t tell. I changed keep_going_p to an int and the program now completes (with up to 6 loci, that’s 64 genotypes to iterate over up to a billion iterations on four processors) in less than five minutes.
C Respects the Programmer’s Intellect
So one explanation for all this freedom to make stupid mistakes is that you just have to be a really smart programmer to use a language like C. C is supposed to be good for creating fast programs that run on system-level, and possibly in very resource-sparse environments (like a PDP-11). In other words C favors the creation of good compilers. The typical example is that arrays can run over their boundaries without throwing a compiler error, only resulting in (sometimes) hard to pin down runtime errors. In other words, you won’t know that you programmed the wrong number of elements in the array until you run the program and it returns garbage. That can be a long time, considering that the part of the program containing the bug might not be used for a few years after the program is released to the public.
However, what that means is that you have to know how the compiler deals with things inside the computer; you have to think of the memory of the computer in the way that the compiler deals with it. This is not a restriction, but a liberation, because it forces you to think in terms of how the computer actually works. Not as tightly as assembly language, but still pretty darn close. I see this as forcing me to think in terms of how the computer really works, and therefore coming up with better algorithms. In other words: I like it.
On top of this, this attitude of C compiler-writers fits in with the rest of the Unix philosophy: computer programs should be written for users who see themselves fundamentally as programmers. They are essentially people who know as much about the machine as the programmer, and it is completely immoral for the programmer to program in a way that insults the user by presuming the user to be a stupid and unsophisticated person.
Inevitably when someone who’s an adherent of a different philosophy hears me talking about one of these stupid mistakes, they say I should be using Java, C# or some other garbage-collected or dynamically-typed language. However, that disregards the nature of the stupid mistake, and disregards a basic problem with programming in general: I could have made the same mistake in any language with a type system. If I had done that in Lisp, or Java or whatever, the compiler still would not have caught it. There’s a difference between dynamically-typed languages and strongly-typed lagnuages. Almost every programming language is strongly-typed, and it has nothing to do with when those types are decided (compile-time versus run time). And none of the stupid mistakes I have made have involved dynamic allocation, so garbage-collection wouldn’t matter either. People often say we should all be using interpreted languages like Matlab, which I find objectionable on both philosophical (programming) grounds and moral grounds.
There’s just no beating the right tool for the job, which in the case of complex simulations that need to be portable and freely distributable, is C.
Here’s an excellent Emacs minor mode that I discovered last week, as did our friend at Passing Thoughts:
via Passing Thoughts
I thought of entitling this post “Object-oriented programming is the pesto of programming paradigms,” to paraphrase the philosopher Costanza. Since I was in middle school, I’ve been hearing people say “object-oriented” with a strange air in their voices as though they were discussing Camus or Beckett; they also often have the annoying habit of throwing in “modern” when talking about their favorite languages.
When I really started programming a few years ago, it came up right away when I was looking for reading material. I studied quite a bit of Python, but since Python is a multi-paradigm language (in a way), I wasn’t forced to learn how to program in an object-oriented style. This past week I decided that since I’m doing real work in Python, I should give real object-oriented programming a try to see what it’s benefits are. Here I report my conclusions.
Table of Contents
1 My Programming Background
A couple things you should understand about me and my programming projects are that (a) I’m a Unix guy and (b) my “programs” are typically along the lines of “input some values and spit out some different values.” I’m not building large programs with complex user interfaces; I have made somewhat complex interfaces in the form of command-line arguments are runtime configuration, but they are still batch-oriented, non-interactive programs. I get the impression that nowadays these are no longer typical. Although my impression is that most programming books only include batch-oriented programs as examples, most people I encounter on the web, and most programming reference guides seem oriented towards either web programming (where the user interface is a browser), or GUI programs.
As for (a) I don’t like limits and labels, but nine-hundred and ninety-nine times out of a thousand, the solution I end up using follows Unix philosophy and GNU standards. I don’t think this is a concerted effort on my part, but pretty often when I think I need to use a language like Python, I end up writing a shell script or make target that gets the job done, often in less than twenty minutes.
2 What is Object-oriented Programming?
No one has given me a satisfactory answer. Alan Kay should know. I certainly don’t know. You will not get a good definition from someone who goes around saying “object” and “object-oriented” with his nose in the air. Nor will you get a good definition from someone who has done nothing but object-oriented programming. No one who understands both procedural and object-oriented programming has ever given me a consistent idea of what an object-oriented program actually does. If you’re reading this, please comment!
The usual definition goes something like this:
OO Zealot: objects have attributes and methods.
Me: Wait, I asked what object-oriented programming is, not what objects are; I know what objects are, every programming language has pieces of data that procedures operate on
OO Zealot: Yeah, but in an object-oriented language, you can define your own data types
Me: I define new data types in C all the time, they’re called structures
OO Zealot: Yeah, but in Java these data types have attributes that they carry with them, that are like procedures, but they’re called methods
OO Zealot: Yeah, but in C you can’t tie a function to a piece of data like you can with object-oriented programming
Me: Sure you can; you can define a structure with a pointer to a function, whose other fields are the arguments of that function. Then you can create a function that takes the structure as an argument and evaluates the function pointed to. In Scheme you can also create what’s called a closure where you basically return a procedure that encapsulates the environment at the time the definition is called. You can use these closures to evaluate the same argument in many different encapsulated environments; you can make whole lists of closures fairly easily. In Scheme you can also travel backward in program execution by capturing the continuation of an evaluation, kinda like a throw-catch.
OO Zealot: Oh.
I don’t mean to alienate people who really know what object-oriented programming is, but that is based on real conversations I’ve had. The people who advocate object-oriented programming at me have seemed at times to just be really narrow-minded and totally ignorant ofother kinds of programming. They think object-oriented programming is the only way, and seem like they’d have trouble learning any other kind of programming. That does not mean I think that anybody who advocates object-oriented programming is a bad programmer, but the people who have advocated it to me have sounded less knowledgeable about programming to me than they thought they did. Is the reason lots of languages phrase their documentation in terms of object-oriented programming because there are a bunch of people out there who can’t do anything else?
Furthermore, none of the definitions of object-oriented programming given by these devotees, or that I have found go beyond defining what “object” means in the phrase “object-oriented.” Some of them also say that “objects pass messages to each other” or “the program is a set of interacting objects,” but that doesn’t really say what goes into the program text of an object-oriented program.
I had a little Eureka! moment when I realized that sure, you could define a program, with “endowed objects” just by instantiating a certain set of such objects, and maybe calling one or two methods. I headed over to my favorite programming showcase, 99 Bottles of Beer, to get some examples of object-oriented programs in languages that are the biggest sellers of the object-oriented lifestyle. Sure enough, what I found was exactly what I refer to above, however it looked an awful lot like a plain old imperative program. What I see in both the C++ and Python versions said to “object-oriented” is that the programmer defines a data type and within that data type he defines some procedures (called “methods”) and then he calls those methods on an instance of that data type. Read that again: defines a variable and calls procedures with that variable. That is no bloody different from imperative programming.
So on a cycling trip across Durham I closed the intellectual book on object-oriented programming again, thinking that I personally have no need for it. Maybe it comes in handy for really huge, really complex programs, but as I said, I try to avoid that at all costs, and I’m not designing a new web browser for anybody. Maybe it has some applications, like in video games, where these “objects” will interact in unpredictable ways, but that’s not what I’m doing either.
And maybe it makes somebody a lot of money, but I’m in it to learn, not to make money.
3 I gave it a try
I still had this challenge of using object-oriented programming languages, especially their libraries. Pythonis an excellent multi-paradigm language, but its libraries work within the object-oriented paradigm. This makes perfect sense because one of the advantages of object-oriented programming is that the datatypes from the library come with their procedures pre-defined. That’s cool. You still have to look up what those procedures are and what they do, and every library that comes with custom data types comes with functions that take advantage of them. Therefore this isn’t unique to object-oriented programming either: think about how useless a library would be if it didn’t work this way. I would like to build some more complex programs in Python, and use its web, XML-processing, and database libraries, so I really should learn how to use some object-oriented programming techniques.
What I had already was a functional program, in Python, which executed a population genetic simulation. For iterating equations, functional programming makes the most sense, and so far Python is the best functional (multiparadigm) language that works in a Unix-like environment. Basically I had a function that did the iteration, that took a function and its argument as arguments. Nice, clean and simple.
What I tried to do in Python was to define a class that did the iteration, and a derived class that defined a specific function to do the iteration. Here’s what I ended up with:
#============begin satire============ self.self.self.self.self.self.self.self.self.self.bungle () #=============end satire============
I’m not an experienced Python programmer, but I felt like this wasn’t really capturing the essence of the problem like OO advocates said it would. Well, a program can only be as good as the programmer. But here’s a more critical question: can you take a mediocre programmer, and give her a C++ or Java project and say “Here, use object-oriented programming” and have the program come out better?
4 Ask the experts
After that half-hearted (is that the right body part?) attempt at object-oriented programming, I thought I should see what some of my favorite hackers had to say.
Paul Graham said this:
Object-oriented programming is exciting if you have a statically-typed language without lexical closures or macros. To some degree, it offers a way around these limitations.
Immediately I thought “Oh yeah, that’s true, I’ve never had that problem.” In other words, I’ve done a lot of programming in Lisp, Scheme and using Python in a functional style. You can even program C in a somewhat functional style, as long as you keep track of the scope of allocated versus stack objects. In C you have to remember to think about things the way the computer does, whereas in Lisp you have a huge pile of abstractions already there. However, it’s pretty easy to build up abstractions in C without moving to C++ or Objective-C.
Graham writes more:
Object-oriented programming is popular in big companies, because it suits the way they write software. At big companies, software tends to be written by large (and frequently changing) teams of mediocre programmers.
So this answers my question from above. Giving a mediocre programmer a confining paradigm to work within will not produce better code, but it may produce a better-behaved programmer. In my case, it won’t do me any good whether I’m good or mediocre. Graham’s point is supported by evidence collected by Eric Raymond:
… inspection of open-source archives (in which choice of language reflects developers’ judgments rather than corporate mandates) reveals that C++ usage is still heavily concentrated in GUIs, multimedia toolkits and games (the major success areas for OO design), and little used elsewhere.
Richard Stallman had this to say:
Emacs Lisp is powerful enough. Adding OOP to Emacs is not clearly an improvement; I used OOP when working on the Lisp Machine window systems, and I disagree with the usual view that it is a superior way to program.
Emacs does, by the way, have an object-oriented programming framework called eieio.
As it stands, I cannot justify learning an object-oriented programming style. I may want to learn C++, since it seems to have a lot of interesting standard library datatypes that might be useful in my research. However, anyone I’ve asked whether I should learn C++ has said “Don’t bother,” for one reason or another.
I would really like to know what you all think: it’s been at least four years that I’ve been trying to understand the value of object-oriented programming when functional programming and imperative programming seem so simple and obviously useful to me. A critical piece of my own story is that I have made all my decisions about what to learn on my own: no one told me to learn C++ for a job. I decided to learn Scheme, C, Python, Perl and many other languages because I did research and found out they were right for one reason or another. The one programming language I learned for my job was Stata, which is a special purpose language (incidentally it does have a class system). On the other hand, most of the people advocating object-oriented programming at me have been either given the task of using an object-oriented language (e.g. Java), or learned it as the pinnacle of a programming class. I’ve never taken a programming class.