Note 12: Software Testing

Adapted with permission from CSCI 2113 (Dr. Kinga Dobolyi)

Testing - Introduction

Debugging

Let’s review what it means to debug your code, which is really performing fault localization:

  • You observe a program output that doesn’t match the expected output for some input.
  • You need to fix one (or more) lines of code to get your program to work.
  • Typically, this involves tracing back from the line of code producing the incorrect output to where the problem actually was.

Some items to keep in mind for this class, and your careers going forward:

  • Except for trivially small programs (which these are not), it is virtually impossible for you (or the TA/professor) to look at the code and simply analyze what’s going wrong.
  • Instead, you will need to use debug print statements (or a debugger) to print out values from memory and/or see what path your inputs took through your code to arrive at the problem.
  • There is nothing magical about using print statements to debug; this is typically what your professors are also doing for you.

We’ve had a good amount of practice finding the faults in our code (using print statements, debuggers, and/or tracing). Now let’s explore designing our own test cases.

Test Cases

In homework assignments for this class, and for other classes, you have test cases provided for you. Your homework (and coding exams) are graded using test cases. Now, it is our turn to write test cases. Perhaps it will be wise to approach the subject cautiously.

It is possible to write your tests before you write code; this is called Test Driven Development (TDD). Writing tests before writing code has several benefits:

  • Your tests are not biased by your code
  • Writing tests helps you think about how you might write code
  • Writing tests helps you think about the requirements of your system
  • In some sense, we’ve been doing TDD this semester, since you were writing code to pass against existing test cases.

At the lowest level (closest to your code), developers typically write unit tests (preferably before they write code). We usually write unit tests for each method or function. Then, we might write tests for checking how classes or modules work together (module or integration testing), and eventually we might do very high level system testing.

Verification and Validation

There are also different flavors of testing: verification and validation. The former checks that your implementation meets the requirements that were written, while the latter measures if the requirements themselves where what should have been written. For example, imagine someone writes requirements that explain how a self-driving car should navigate the roads around Foggy Bottom. Verification might mean making sure that the car stops at all stop signs. Validation might mean, making sure that the car, which has already been tested to drive on the left hand side of the road, is actually supposed to drive on the left hand side (like in the UK). In other words, you could have software that meets all its stated requirements, but is still wrong, because the requirements aren’t valid.

The tests we will cover will be verification tests.

Test Coverage Criteria

The goal of software testing is to increase confidence that your code works. There is no way to “prove” that your code is perfect, or correct, outside of some formal methods you might learn in your upper level courses. This is because it is virtually impossible to test your software on every possible combination of inputs. Instead, developers try to approximate various common, representative, and/or interesting inputs or paths through their code, and these those. Below are some example test coverage criteria commonly used in software testing.

Input Domain Partitioning

Using Input Domain Partitioning, developers partition the possible input space into some number of classes, and then draw samples from those classes in all possible combinations with respect to classes. This is most easily explained with an example. Imagine you have the following simple method that is meant to return child, adult, or senior depending on the age of a person:

age_group.py
def get_age_group(age: int): 
  if age >= 18 and age < 65:
    return "adult"
  if age < 18:
    return "child"
  return "senior"

In terms of the possible input space (which is an almost infinite range of intergers) the following categories of human age might be reasonable to test:

  • Children, between 1 and 17 years old
  • Adults who are between 18 and 64 years old
  • Seniors who are 65 or older

But this is not yet a partition of all integers; we also need to consider:

  • The age 0; is this a baby who was recently, born, or an invalid input?
  • Negative numbers, which are invalid inputs
  • Do we want, for error checking purposes, to do something if we get an age over 120? And what does that mean for how our code will be useful over time?

Finally, while this code is typehinted for the argument age to be an integer, if we don’t use formal type checking, the function could be called with a non-integer argument. Python does not, by default, check type.


Let’s say we decide to allow 0-year-olds to count as babies, and should return string `‘invalid input’`` for anyone older than 120. Then, all six of the previous bullets become our partition of the input space; every possible integer is mapped to a specific value:

  • Invalid inputs: age less than 0 or greater than 120
  • Children: ages 0 through 17, inclusive
  • Adults: ages 18 through 64, inclusive
  • Seniors: ages 65 through 120, inclusive

In terms of integers, some of these partitions are effectively infinite in size, some have a few dozen possibilities, and some are rather small. It’s not feasible to test every single possible input here, primarily because we’d have to specify the expected outputs. Instead, we will follow the following algorithm for selecting useful and interesting test cases from these partitions:

  • Choose one element from the middle of each partition
  • Choose edge cases for each partition, where it may “touch” another partition(s)

For example, while it’s not foolproof (especially since we may be writing tests without seeing the code), it’s likely safe to select some middle inputs as follows:

  • Invalid inputs: -2, 122
  • Children: 9
  • Adults: 45
  • Seniors: 80

We could certainly pick more arbitrary examples, but based on how we expect our code to shape up, it’s not likely to bring us much more value.

Instead, we should pick edge cases for each partition that touch the other partitions:

  • Invalid inputs: -1, 121
  • Children: 0, 17
  • Adults: 18, 64
  • Seniors: 65, 120

These are arguably the most important test cases in this example, as they may catch common “off by one” errors when using the if statements above.

With these specified, we now need to write the tests. We’re going to use the pytest testing framework (other testing frameworks include doctest and unittest; we’ll see more abot doctest later).

Using pytest

Our tests want to assert that a given function input yields a specific function output. In Python, we do this with assert statements:

def test_get_age_group_0():
  assert get_age_group(23) == "adult"

Note that the assertion is inside of a function and the name of the function starts with test – this is important: it is how pytest finds the tests.

If we add the test into the Python script, we can now run:

pytest -q age_group.py

and all tests in the file will be run:

1 passed in 0.00s  

Let’s add another test:

def test_get_age_group_1():
  assert get_age_group(-2) == "invalid age"

Now when we run our tests, we get a test failure:

    def test_age_group_1():
>     assert get_age_group(-1) == "invalid age"
E     AssertionError: assert 'child' == 'invalid age'
E
E       - invalid age
E       + child  

(If the output looks familar to you, that’s because you’ve seen it before: the autograder for CSCI 6201 uses pytest!)

Add more tests in the practice problems

Working With Types and Raising Exceptions

Because Python is not strongly typed, it’s useful to write tests that handle “unexpected” input types.

A basic tool for this is the type function:

if type(x) != int:
  # do something

So far, we’ve experienced exceptions when we have made errors. We can also raise our own exceptions deliberately, to indicate that something has gone wrong:

if type(x) != int:
  raise TypeError("this function expects an int")

Using pytest, we can test to see if the errors we expect are raised:

import pytest

def test_non_int():
  with pytest.raises(TypeError):
    get_age_group("thirty five")

Write test cases that check for exceptions (and change your function to raise exceptions) in the practice problems

Practice

Practice Problem 12.1

Practice Problem 12.1

Complete the tests we’ve specified for age_group.py, and then fix the get_age_group function so that it passes the tests.

Practice Problem 12.2

Practice Problem 12.2

First, write additional tests to handle non-integer inputs.

After you have written the tests, modify the get_age_group function to pass the tests.

Practice Problem 12.3

Practice Problem 12.3

Write test cases to except your function will raise a TypeError if the input is not an integer, and raise a ValueError if the input is negative or greater than 120.

After you have written the tests, modify the get_age_group function to pass the tests.