Software Engineering in Academia

Posted on June 8, 2017 by Gabe Parmer

Quite a bit of information exists on the internet for how you should perform Pull-Requests (PRs), and how to conduct a code review. This post discusses one vision about how these software engineering practices fit into a research group that is building a significant body of code. I’m not an expert in any of these areas, but I have seen what works, and what doesn’t work in support of sustained research.

Why Software Engineering Exists

It is rarely obvious why software engineering practices are necessary or beneficial to developers that haven’t yet worked in large teams. These practices provide “processes” that restrict the methods and means of software development in a team. Let me say clearly: they are necessary, and they are beneficial. When someone is maturing as a software developer, they often realize that simply writing code isn’t the hard part. The hard part is writing code that is understandable, debuggable, and maintainable in the longer term. The goal is often for this code to be part of our shared infrastructure. Importantly, it is important to realize that the code you write will not be used only by yourself. Thus, you’re writing code that needs to be understandable, debuggable, and maintainable by others. That is a pretty high bar.

A big part of software engineering can be seen as a set of conventions and processes overlayed onto our normal process of coding. They are designed to produce code, documentation, and development histories that are high quality, and can be productively used by every member of the development team. At least that’s the goal.

In academia, we have a set of constraints that work against code quality (races for paper deadlines, and lots of perceived throw-away evaluation code), documentation (it doesn’t feed into someone’s PhD), and development histories (no mandated version control usage patterns, and little training in how to use version control). So I’m going to discuss a process that we apply to our group’s development, and how it attempts to enable a long-term, sustainable infrastructure. We don’t do this perfectly, but this post is meant to give the rational and expectations behind this process.

Many development teams in industry have extensive processes for development aside from the actual code. These include a fairly strict regime including commit, PR, and code-review guidelines, continuous integration and unit testing, automated bug tracking and lifetime management, scrum and structured meetings to structure short and long term goals of the group and individuals, etc… Though some of these have been adopted by academia (for example, scrum for academia), we are not necessarily creating a production-stable API as research requires agility, and we are a small team with members that are often working on separate projects, so don’t necessarily need as much cross-team coordination. However, the notions of curating our code through structured PRs, code reviews, and unit tests/continuous integration are necessary for the code-base to have any longevity. This document discusses the first two of these.

Team Cohesion, Code Quality, and Code Reviews

For a software infrastructure to have any longevity beyond a single student, a group needs to be involved in the shared development of the system. Having implemented a feature that improves the infrastructure, it is important to get some buy-in and quality control from the group before the code is pushed to mainline. The main goal of producing infrastructural code is to generate maintainable and readable code. Thus, it is naturally quite useful to get others from the group involved in the process of generating your code. This is a way to get a sanity check that the code that you wrote to be readable, actually is readable.

The main goals of our code reviews are to:

build cohesion in the group as we see what each other are working on and learn about different parts of the code-base;
improve the quality of our code by getting more people with different perspectives to examine it before it is in mainline; and
ensure that the code is readable and properly follows conventions.

Given this, how does one actually conduct a code review? I typically think of it as requiring two passes.

Code review first pass. When doing a code review, what should you focus on? What are you looking for, and what should you spend your effort on when doing a review? The first pass should focus on the following.

Is error checking always performed? If not, you must request it be added. No excuse for laziness; always check your error values eagerly.
Are all of the relevant cases properly tested? Are tests completely lacking? Can they be more complete? How does the author have confidence that the code works?
Are assumptions stated clearly, with appropriate assertions added to check them? Try and to understand these, and ask the author questions about them.
Are the Composite style guidelines always followed as spelled out in the Composite Style Guide. If not, you must request changes. If there are confusions about style, first consult the guide, and if they persist then talk to me, and we can update the CSG.
Is the code confusing? If conditions or expressions are convoluted, say so. Unduly confusing code should be clarified, simplified, or restructured. This is an important part of code craftsmanship, but it is sometimes difficult for the author to properly evaluate. The code review is an opportunity for another person to bring their context to bear. If the code is necessarily confusing, then a comment is required to explain it.
Do the chosen names convey information about the code? This is another area where a different perspective is useful. A reviewer should insist that names convey useful information, usually by proposing better names. This is an area where being constructive and providing suggestions is much more helpful than just saying “choose a better name”. The author chose that name initially because presumably (if they are following the code craftsmanship and style guidelines) they thought it was good. Saying “it is bad” isn’t helpful given this; additional input is required.

Code review second pass. It is often surprising for people to know that the core of a review actually revolves around the first pass. Any additional effort often focuses on the structure of the code. Should files be broken up? Should functions be broken up? Should an abstraction be refactored? This list is obviously much smaller than the first, but it requires that you go through all of the code before you can make a reasonable mental model about both the code’s structure, and what other potential structures might be. I’m going to avoid going into what might constitute a better structure here, and leave that for the CSG and for a future post.

What Doesn’t a Code Review Provide?

Put simply, reviews provide neither testing, nor thorough evaluation of logic. It is expected that every time someone does a PR, it is for well-tested code that follows the guidelines for code craftsmanship and the conventions in the CSG. Solely reading code will not yield a deep enough understanding of the code to fess out complex logic problems, or ensure thorough testing. In this way, a code review is a process that focuses on issues (style, error checking, test coverage, and complexity) that are a proxy for refined, well-tested code. In this way, they are complementary to proper testing.

How to Think About Commits and Pull Requests

Before a code review is performed, a PR must be generated. It is very important to understand the responsibilities of a researcher submitting a PR, and the goals of the PR. To understand many of the goals and conventions around PRs, you have to understand how to use version-control. See the Appendix for tips on using git.

PR intentions and responsibilities. Using git “correctly” is one thing, but submitting code to be included in the main repository is another. You should submit a PR on your work when you reach a point of tested completion. You should plan any high-level feature that you’re adding as a set of successive, more manage-ably sized PRs. It does take some thought to figure out how to break your task into smaller constituent tasks, but this is something you really have to be thinking about anyway. It isn’t feasible to do a reasonable code review on a large PR because it requires too much accumulated mental state. This is similar to why we can’t just write thousands of lines of code at once; we must consume smaller, sequential problems. So if we take the code reviews into account, it becomes clear that we aren’t just writing code; we’re writing code in a digestible manner.

Studies have been performed that found that defects were most successfully found if code reviews are performed on 400 lines of code or fewer. It is not a good idea to send a huge PR and expect a good result. The likely result will be “please break this up into separate PRs that can be reviewed separately”.

It is very important to understand that when a researcher submits a PR, they are asking other people to spend their time on the author’s behalf (i.e. to help improve the code). Given this, any perceived laziness on behalf of the author is often not received well. Not spending the time to go through a researchers own diffs before submission to ensure that obvious errors are purged is disrespectful of the time of the people doing the review. It is inherently saying that the reviewer’s time is valued less than the researcher’s. This is, quite obviously, not OK. Don’t do this.

Our Group’s Pull Request Policies

Given this perspective on code reviews and PRs, our research group has a number of policies:

Code craftsmanship must be adhered to if you’re doing a PR. The style guide (CSG) must be adhered to to even ask for a code review or issue a PR.
You must fill out the check-boxed list for each PR that validates that you did, in fact, adhere to conventions and follow craftsmanship requirements. This is mainly a reminder, but if you check off a box, and haven’t done the indicated task, then you should not expect a friendly review.
When you submit a PR, please make sure to @-mention your fellow researchers. A good general rule is to @-mention at least two colleagues.
Every PR should be reviewed by at least one other colleague, and by the repo owner (me for Composite).
Every researcher should do reviews. If you’re @-mentioned, and no-one else has done a review, strongly consider it. If someone is working in your area (in an area where you’re familiar or want to learn about the code), then ask to be @-mentioned.
Please try and finish a code review within 48 hours of the PR. I’m particularly bad at this, but it is essential to give prompt feedback, and get good turn-around on code modifications.
If the code modifications are urgent, please mention this by including the text URGENT in the first line of the PR. I can apply these PRs without review, but the responsibility is that such changes don’t break mainline, and that the author will integrate code review feedback in a subsequent, and proximate PR.
Frequent PRs are much better than infrequent, larger PRs. 400 lines or less is the target. See the logic for this above. Past a certain size, there is no review that is possible, and the author simply has to go back and re-design their string of PRs by making them more fine-grained.

Appendix: Git Tips

What tools should you always consider when making code modifications and additions, and when integrating with other’s changes? With git, the obvious commands that you must integrate into your development cycle include:

git diff, git diff --stat, and git status which all let you understand at different granularities what changes have been made. Before you commit code, you should likely execute all of these commands. You do this for two reasons. First, you want to make sure that only code modifications that you want in the commit, are in it. Each commit should be relatively focused, so it is important to catch this. Second, it is a good way to remind yourself what the commit contains so that you can better summarize it in the commit message. It is also useful to know that you can execute diff on ranges of commits to see what changed (see git log to see the list of commits).
git commit -a, obviously. Do not, do not, use the m flag. You should write a “full form” message, not a single line. The focus of your message should be on 1. summarizing the commit in a single-line (which is displayed along with the commit in Github), and 2. an explanation of the details of the commit. The latter should explain the intention, and often includes an itemized list of the main aspects of the commit. @s should be used to reference any issues it addresses.
git branch <b> and git co <b> to use and manage your branches. Get used to using branches. Branches are useful for when you’re trying to debug a problem, or implement a new feature. To see how they’re useful, it is important to realize that it is not uncommon to get distracted, and have to redirect your attention to another feature or bug. Without branches (and the ability to roll-back to before your work on the feature), you end up with a commit history of half-finished features intertwined with bug fixes and other features. This makes your PRs schizophrenic and difficult to review. So by default, you should always be developing on a non-mainline branch. When you go fix a bug, or work on a new feature, ask yourself if it deserves its own branch.
git stash is an acknowledgment that we mess up with our branches. If you find that you’re somewhat accidentally developing a new feature, or fixing a new bug, but have a bunch of unstaged changes on the current branch, you can stash them, create a new branch, stash pop the changes into the new branch. I’m sure there are other ways to do this, but this example at the very least demonstrates how stash can be used to delay a current set of changes.
git rebase enables you to update a branch to an updated master, and to “squash” multiple commits, and clean up your commit history. When the master progresses beyond the point where your branch split from it, you generally want to fast-forward merge your changes onto the new master. git rebase master is your friend here. It is generally superior to git merge as it maintains a cleaner (linear) history. git rebase -i allows you to rewrite history by combining different commits. This is very useful as it enables you to have a set of commits that tell a story, and don’t have a number of “in progress” commits. My suggestion is that you label your commits that you’ll likely want to get rid of in the future with TODO as a header on the title line. More information about this can be found here and here.

Updates

6/12/17 - Thanks Robert Gifford for some suggested improvements and edits.