Rahul Simha
Professor of Computer Science


Department homepage

What makes a complete computer scientist?
Advice for CS students

What skills and experiences make a strong foundation in CS? How do you know what's needed, what courses to choose? I'm going to address what can be acquired in the educational system, as opposed to the experience one gains developing large software projects in industry (which we can't teach in academia).

I believe you aren't a strong graduating computer scientist (BS/MS/PhD) unless:

  • You are comfortable programming in a high-level language. You don't need to be familiar with particular APIs, but you ought to be confident writing tens of thousands of lines of robust code.

  • You have a strong understanding of how hardware works. That is, you've written assembly code at least once, you understand how a CPU works, the basic instruction fetch-execute cycle, what code might look like in binary. You also have a sense of how it works at the low-level (gates, combinational and sequential logic, a basic state-machine, microprogrammed control) and at the high-level (virtual memory, caching, TLBs, disks, DMA, bus protocols).

  • You understand how compilers and language runtimes work. You don't need to have written a compiler, but it certainly helps. You should have written a parser for something, preferably using a parser generator, understanding both lexical analysis (and its relationship to regular expressions), and parsing (and its relationship to grammars). And you should have seen how code is generated, and optimized. And you need to understand the runtime support, such as garbage collection, needed by modern languages.

  • You have a strong understanding of memory as its used in higher-level languages. For example, you should understand exactly what's global, on the stack, and on the heap at any time during the execution of:
         int foo (SomeObj s, int n)
            if (n == 0) return 1;
            int x = s.getX ();
            return  x * foo(s,n-1);
         public static void main (String[] argv)
            SomeObj obj = new SomeObj (10);
            print (foo (obj), 5);
    It might seem odd to single out memory from a general understanding of languages and architecture, but it is fundamental to efficient programming and successful debugging, at least in today's popular high-level languages.

  • You can think algorithmically. You are familiar with order-notation, and you are able to quickly assess the running time of most snippets of code. You know the core data structures well, know how they are built, and how to use them in problems. You've been exposed to the classic algorithms in a standard course, the theory of combinatorial problems, and know how to write pseudocode with just the right amount of detail so that someone else can code up your algorithmic idea.

  • You have a reasonable understanding of our corner of math. The CS corner of the mathematical world, beyond the basic discrete math needed as background, is the stretch of material ranging from finite automata to Turing machines. That is, you should be able to: (1) describe a finite (deterministic or non-D) automaton, its connection to regular expressions and regular grammars/languages, and its use in hardware; (2) do the same for pushdown automata and context-free grammars; (3) describe the language hierarchy; (4) explain how a Turing machine works, and what it has to do with the halting problem, and what it means for a problem to be undecidable. No, you do not need to be math wiz and be able to prove theorems on your own, but you ought to be able to decipher notation, and explain the key ideas. I haven't listed the math topics needed in some CS specializations, such as number theory for crypto, or parts of the calc-diffeq sequence for robotics. These can be acquired based on interest.

  • You've had good courses in at least one (and ideally two) of these three CS-related areas of math (in order of importance):
    • Probability. Here I don't mean the insipid intro stats course taught across the university, but a real probability course where you will work with random variables, distributions, joint distributions, the CLT, and ideally, also see Markov chains. Core probability shows up in so many places, for example: machine learning, stats, algorithms, computer systems.
    • Linear algebra. The not-ideal course is the standard math department course that uses this material to introduce students to proofs. This makes the linear algebra course abstract; many students who get an A in such a course say they still can't tell me what the course was about. Far better is a course that shows how linear algebra is applied, either through computational or engineering examples.
    • Logic. Logic courses often come in one of three flavors: (1) the classic logic course that traces a path from propositional logic to undecidability; (2) the very computer science-y course that combines logic, semantics and programming languages, often with supporting theorem proving tools; (3) an algorithmic course that includes fast SAT solvers, verification, LTL/CTL, theorem proving search strategies, and the like. It would be nice to have a single course combining elements of all three.

  • You know how computer systems work and have done some systems programming. For example, you could programming an embedded device, or develop a distributed application on top of sockets, or write a piece of an OS.

  • You understand and are comfortable programming in the dominant paradigm of the day. Today's dominant paradigm is the web front-end, dbase backend. This requires understanding in detail the whole sequence from a browser click, to retrieving tables, computing joins, and returning results aggregated into a page. And being able to build such a simple web application using a modern framework.

  • You appreciate how difficult it is to build robust complex software. This means you've completed at least one big project on your own or a team project in which you've had a major role, and in which you've been through several stages of design (high-level, UI, specs, details, testing) and gotten complex packages to work together. But it also means you've developed a healthy respect for the work ethic needed to develop good software: good design principles, an attention to detail, pride in craftsmanship, plenty of defensive code, strong use of powerful dev tools, and the humility to know that your code does have bugs.

  • You understand our profession and your professional obligations. This means you know not just your little corner of web development, but what other areas of CS are like, are familiar with other career paths through CS, and the interactions between CS and other disciplines. This means being familiar with our rich history, our professional societies and standards, the ethics of our discipline, understanding that software development is a team effort, and that mere coding mastery is only one piece of the large puzzle of software creation, and only one piece of being a complete computer scientist.