Why Scientists Should Learn Some Computer Science

A Prescription for Undergraduate Science Curricula

Rahul Simha


I've lost count of the number of times when, at a science seminar where the laptop or projector needs adjustment, science colleagues look towards me, expecting that I the computer scientist would know the right commands in Windows-XP to adjust the resolution. I've likewise grown accustomed to requests for helping resurrect a crashed disk or to explain what went wrong during a software installation. Memo to science faculty: that's not Computer Science, and your students ought to find out why.

Science departments already require their undergraduate students to take courses in other science disciplines. Some of these are obviously justified, such as an Organic Chemistry prerequisite to Biochemistry. Others are thought to be good for the soul: many faculty believe that no student, science or otherwise, ought to graduate without a calculus course regardless of whether they actually use calculus in their disciplinary courses. Thus, the current collection of required non-disciplinary courses reflects a range of needs and a range of opinion. Whatever one might think about the value of individual courses in this collection, one thing is clear: there is hardly any computer science in today's science requirements.

Let's first consider some of the more obvious, practical reasons for requiring computer science courses. Some exposure to computer science, and I don't mean an intellectually insipid "Word, Excel and the Internet" literacy course, helps students understand how software and hardware work, and creates a type of student that will be comfortable with the kinds of software tools increasingly essential to research in any science discipline. Moreover, these students will also be able to make better use of software tools, accomodate new interfaces, understand their limitations, and even customize open-source software to their needs. They become the computionally savvy scientist that every lab depends on today.

I want to focus on the deeper, intellectual reasons for teaching computer science to science students. The first of these is to arm them with a major new problem-solving approach: computational problem-solving. Let me illustrate with an example. A biology colleague once approached me with the following problem:

"I've devised a probe to test for the existence of certain genes. Each time the probe is applied, the output shows the occurence of one gene from among a set of 20. Sometimes it's gene 1, sometimes it's gene 5 ... etc. The chances of getting gene 1 are quite high, about a 20% chance. The chances of seeing gene 2 are about 5% ... I know these percentages approximately for each of the 20 genes. How many probes do I have to apply in an experiment to be sure that I've seen every one of the 20 genes at least twice?"

If you're a mathematician reading this, you might, before you read on, want to spend a few minutes trying to cast this into a problem you are familiar with. Ready? It's called the coupon-collector problem, one of a large class of discrete statistical problems known as urn models. The homogeneous case (the simpler problem in which all genes occur with equal probability) can be solved analytically, although it's not pretty. Not much is known about the general case; indeed, the little that has been published [1] is not applicable to our problem above.

Now, what's important is not that it's hard to solve mathematically, but that it's hard to recognize, even for a mathematician. Even after identification, the homogeneous case is not easy to find in books; one has to know where to look. So it is unrealistic to think that a biologist or her grad student would plough through advanced texts in statistics to find the problem, or that even if given a paper such as [1], would be able to work through it and extract something useful.

What does this have to do with computer science? On a whim, I gave this problem to a few of my sophomore Computer Science (CS) students and asked them to write a program to solve the problem. Note that these students were into their third CS course, having taken the two intro CS courses (we call them CS1 and CS2 in nationwide discussions on undergrad CS) in their freshman year. Moreover, these students were, like all too many CS students we encounter, generally not disposed to taking math courses. Two of them produced a correct solution in about 45 minutes. (This occured in a lab where some of the better students had completed their assignments early and were just sitting around.) For reference, here's the program for the homogeneous case, written in the Java programming language. The general case is not much harder; I estimate it would need another 15 minutes.

The example quite nicely illustrates the power of computational problem-solving: many problems that are hard mathematically are actually quite easy to solve numerically on a computer. In many cases, the programs tend to be small, easy to modify, and do not require a degree in CS to create. There is a large class of problems amenable to computational problem-solving in this manner, including those solved by simulation, Monte-Carlo type estimation, numerical integration and optimization, for example. A few years ago, one of my students simulated the spread of an epidemic and studied the effect of vaccinations on the rate of spread. This is another example of a problem that can be solved analytically with heavy-duty math provided simplifying assumptions are made, but that was computationally solved without those assumptions by a student who'd taken just three CS courses. Indeed, many such problems share the common characteristic of a "macro" property that is the result of thousands of tiny non-linear interactions; they are difficult, if not impossible, to analyse mathematically while at the same time amenable to computational solution.

Beyond computational problem-solving, a second intellectual reason to teach computer science to science students is to have students exposed to algorithmic thinking [2]. While taking a few CS courses may open the doors to computational problem-solving, exposure to a course in algorithms teaches students how computer scientists think about problems: how they formulate problems, how they've solved classic problems in their field, and how they've abstracted principles out of computational problem-solving into problem-solving paradigms.

I have deliberately focused on undergraduate education here because of the third major reason to teach CS to science students: students learning computer science undergo a learning experience quite different from traditional science. CS students are accustomed to active learning, in which a short burst of lecture is accompanied by a hands-on programming exercise that reinforces a concept, and is followed by a lengthy programming assignment that forces students to work through all kinds of challenging detail. Science curricula can avail of this teaching style to their advantage. A physics course, instead of relying on a few toy pen-and-paper problems, can have students write programs to solve realistic problems using actual data. Similarly, a biology course, instead of struggling with predator-prey differential equations that are themselves limited in their power, can have students simulate all kinds of populations with more credible interactions and parameters. Learning can become visual, realistic and exciting. Indeed, computational problem-solving can complement the learning of difficult concepts from mathematics and statistics.

So how much CS is enough? For the sake of discussion, let me identify three levels of CS skill: (1) Freshman: the now-standard CS1 and CS2 intro courses; (2) Minor: CS1, CS2 and 3-4 additional courses, including one in Algorithms and one in Scientific Computing; and (3) Dual-major: a combination of a science major with one in CS. Probably, what makes sense is for students in Biology, Chemistry, Geology and Experimental Psychology to take the two Freshman-level courses and then see if they wish to pursue more quantitative academic career tracks. The more quantitative disciplines such as Physics, Mathematics and Statistics ought to consider prodding their students to take courses up to the level of a minor. Such a student should have no trouble solving the above coupon-collector problem computationally. Science departments should also consider facilitating dual-majors with computer science, to create the kinds of future interdisciplinary scientists described in the recent high-profile reports by the NAS or NIH [3]

Some science faculty will gasp at the suggestion that their curricula be modified to incorporate more than two CS courses. This reaction is especially exacerbated by perceptions that their own curricula are too tightly constrained as it is, with hardly any room for their own special topics courses. One reaction I've heard goes along the lines of: "Can't all the relevant CS be compressed into one or two courses?" Unfortunately, programming and computational problem-solving is a slowly-acquired mental skill, equivalent to mastering a musical instrument or chess, and one that is taught to students without the benefit that math departments have of relying on years of gradual K-12 inculcation. It is also the reason that the umpteen books with titles like "Teach Yourself Java in 7 Days" consistently fail to produce this skill [4]. CS departments could, and sometimes do, teach a "Here's what CS is all about" overview course about the discipline, but such a course leaves students without useful skills, in the same way that a "History of Music" course can't produce even garage-band competence. Using this analogy, I must caution that, realistically, the average student coming out of the CS1/CS2 sequence can manage nursery rhymes and jingles, and even understand some rudimentary theory, but is far from being able to compose a symphony, the level of skill required for full-fledged software development.

Aside from prescribing CS courses, science departments should, instead of settling for the ancient numerical methods course that's probably already on the books, collaborate with their CS colleagues in designing a useful Scientific Computing course that follows the first few CS courses. This course should cover, in addition to some ODE/PDE methods, topics on number representation, on simulation, and should focus on programming inside sophisticated tools such as Matlab and SAS, and on developing some skill with scripting languages like Python that will help scientists perform simple operations on data. At the same time, science departments should be cautious about going overboard and requiring all their students to take specialized courses in databases, networks or architecture.

Because Computer Science, about 50 years old, is a new kid on the science block, it's understandable that CS courses haven't found their way into science curricula as much as math or stats courses have. But now that CS curricula have standardized, now that computational techniques are essential in science, and now that computational science has established itself as a third way of doing science, there's no excuse for any further delay. So meet with your CS colleagues and re-fashion your curriculum to include more CS, at the very least to include computationally intensive tracks through your major. And don't ask them to fix your projector problems.


References

  1. L.Holst. Extreme Value Distributions for Random Coupon Collector and Birthday Problems. Extremes, 4, 129-145, 2001. (cached-copy)
  2. F.Olsen. Computer Scientist Says All Students Should Learn to Think 'Algorithmically' . Chronicle of Higher Education, March 22, 2000. (cached-copy)
  3. See these two key national reports. (1) BIO2010: Transforming Undergraduate Education for Future Research Biologists. http://books.nap.edu/catalog/10497.html (2) NIH Roadmap: http://nihroadmap.nih.gov/. See also: W.Bialek and D.Botstein. Introductory Science and Mathematics Education for 21st-Century Biologists , Science, 303:5659, pp.788-790, 2004.
  4. P.Norvig. Teach Yourself Programming in Ten Years.

Acknowledgements: Thanks to Rhys Price-Jones (GW) and Joel Adams (Calvin College) for their valuable suggestions.