Software Engineering in Academia

Posted on June 8, 2017 by Gabe Parmer

Quite a bit of information exists on the internet for how you should perform Pull-Requests (PRs), and how to conduct a code review. This post discusses one vision about how these software engineering practices fit into a research group that is building a significant body of code. I’m not an expert in any of these areas, but I have seen what works, and what doesn’t work in support of sustained research.

Why Software Engineering Exists

It is rarely obvious why software engineering practices are necessary or beneficial to developers that haven’t yet worked in large teams. These practices provide “processes” that restrict the methods and means of software development in a team. Let me say clearly: they are necessary, and they are beneficial. When someone is maturing as a software developer, they often realize that simply writing code isn’t the hard part. The hard part is writing code that is understandable, debuggable, and maintainable in the longer term. The goal is often for this code to be part of our shared infrastructure. Importantly, it is important to realize that the code you write will not be used only by yourself. Thus, you’re writing code that needs to be understandable, debuggable, and maintainable by others. That is a pretty high bar.

A big part of software engineering can be seen as a set of conventions and processes overlayed onto our normal process of coding. They are designed to produce code, documentation, and development histories that are high quality, and can be productively used by every member of the development team. At least that’s the goal.

In academia, we have a set of constraints that work against code quality (races for paper deadlines, and lots of perceived throw-away evaluation code), documentation (it doesn’t feed into someone’s PhD), and development histories (no mandated version control usage patterns, and little training in how to use version control). So I’m going to discuss a process that we apply to our group’s development, and how it attempts to enable a long-term, sustainable infrastructure. We don’t do this perfectly, but this post is meant to give the rational and expectations behind this process.

Many development teams in industry have extensive processes for development aside from the actual code. These include a fairly strict regime including commit, PR, and code-review guidelines, continuous integration and unit testing, automated bug tracking and lifetime management, scrum and structured meetings to structure short and long term goals of the group and individuals, etc… Though some of these have been adopted by academia (for example, scrum for academia), we are not necessarily creating a production-stable API as research requires agility, and we are a small team with members that are often working on separate projects, so don’t necessarily need as much cross-team coordination. However, the notions of curating our code through structured PRs, code reviews, and unit tests/continuous integration are necessary for the code-base to have any longevity. This document discusses the first two of these.

Team Cohesion, Code Quality, and Code Reviews

For a software infrastructure to have any longevity beyond a single student, a group needs to be involved in the shared development of the system. Having implemented a feature that improves the infrastructure, it is important to get some buy-in and quality control from the group before the code is pushed to mainline. The main goal of producing infrastructural code is to generate maintainable and readable code. Thus, it is naturally quite useful to get others from the group involved in the process of generating your code. This is a way to get a sanity check that the code that you wrote to be readable, actually is readable.

The main goals of our code reviews are to:

Given this, how does one actually conduct a code review? I typically think of it as requiring two passes.

Code review first pass. When doing a code review, what should you focus on? What are you looking for, and what should you spend your effort on when doing a review? The first pass should focus on the following.

Code review second pass. It is often surprising for people to know that the core of a review actually revolves around the first pass. Any additional effort often focuses on the structure of the code. Should files be broken up? Should functions be broken up? Should an abstraction be refactored? This list is obviously much smaller than the first, but it requires that you go through all of the code before you can make a reasonable mental model about both the code’s structure, and what other potential structures might be. I’m going to avoid going into what might constitute a better structure here, and leave that for the CSG and for a future post.

What Doesn’t a Code Review Provide?

Put simply, reviews provide neither testing, nor thorough evaluation of logic. It is expected that every time someone does a PR, it is for well-tested code that follows the guidelines for code craftsmanship and the conventions in the CSG. Solely reading code will not yield a deep enough understanding of the code to fess out complex logic problems, or ensure thorough testing. In this way, a code review is a process that focuses on issues (style, error checking, test coverage, and complexity) that are a proxy for refined, well-tested code. In this way, they are complementary to proper testing.

How to Think About Commits and Pull Requests

Before a code review is performed, a PR must be generated. It is very important to understand the responsibilities of a researcher submitting a PR, and the goals of the PR. To understand many of the goals and conventions around PRs, you have to understand how to use version-control. See the Appendix for tips on using git.

PR intentions and responsibilities. Using git “correctly” is one thing, but submitting code to be included in the main repository is another. You should submit a PR on your work when you reach a point of tested completion. You should plan any high-level feature that you’re adding as a set of successive, more manage-ably sized PRs. It does take some thought to figure out how to break your task into smaller constituent tasks, but this is something you really have to be thinking about anyway. It isn’t feasible to do a reasonable code review on a large PR because it requires too much accumulated mental state. This is similar to why we can’t just write thousands of lines of code at once; we must consume smaller, sequential problems. So if we take the code reviews into account, it becomes clear that we aren’t just writing code; we’re writing code in a digestible manner.

Studies have been performed that found that defects were most successfully found if code reviews are performed on 400 lines of code or fewer. It is not a good idea to send a huge PR and expect a good result. The likely result will be “please break this up into separate PRs that can be reviewed separately”.

It is very important to understand that when a researcher submits a PR, they are asking other people to spend their time on the author’s behalf (i.e. to help improve the code). Given this, any perceived laziness on behalf of the author is often not received well. Not spending the time to go through a researchers own diffs before submission to ensure that obvious errors are purged is disrespectful of the time of the people doing the review. It is inherently saying that the reviewer’s time is valued less than the researcher’s. This is, quite obviously, not OK. Don’t do this.

Our Group’s Pull Request Policies

Given this perspective on code reviews and PRs, our research group has a number of policies:

Appendix: Git Tips

What tools should you always consider when making code modifications and additions, and when integrating with other’s changes? With git, the obvious commands that you must integrate into your development cycle include:

Updates