Note 12: Software Testing

Adapted with permission from CSCI 2113 (Dr. Kinga Dobolyi)

Testing - Introduction

Debugging

Let’s review what it means to debug your code, which is really performing fault localization:

You observe a program output that doesn’t match the expected output for some input.
You need to fix one (or more) lines of code to get your program to work.
Typically, this involves tracing back from the line of code producing the incorrect output to where the problem actually was.

Some items to keep in mind for this class, and your careers going forward:

Except for trivially small programs (which these are not), it is virtually impossible for you (or the TA/professor) to look at the code and simply analyze what’s going wrong.
Instead, you will need to use debug print statements (or a debugger) to print out values from memory and/or see what path your inputs took through your code to arrive at the problem.
There is nothing magical about using print statements to debug; this is typically what your professors are also doing for you.

We’ve had a good amount of practice finding the faults in our code (using print statements, debuggers, and/or tracing). Now let’s explore designing our own test cases.

Test Cases

In homework assignments for this class, and for other classes, you have test cases provided for you. Your homework (and coding exams) are graded using test cases. Now, it is our turn to write test cases. Perhaps it will be wise to approach the subject cautiously.

It is possible to write your tests before you write code; this is called Test Driven Development (TDD). Writing tests before writing code has several benefits:

Your tests are not biased by your code
Writing tests helps you think about how you might write code
Writing tests helps you think about the requirements of your system
In some sense, we’ve been doing TDD this semester, since you were writing code to pass against existing test cases.

At the lowest level (closest to your code), developers typically write unit tests (preferably before they write code). We usually write unit tests for each method or function. Then, we might write tests for checking how classes or modules work together (module or integration testing), and eventually we might do very high level system testing.

Verification and Validation

There are also different flavors of testing: verification and validation. The former checks that your implementation meets the requirements that were written, while the latter measures if the requirements themselves where what should have been written. For example, imagine someone writes requirements that explain how a self-driving car should navigate the roads around Foggy Bottom. Verification might mean making sure that the car stops at all stop signs. Validation might mean, making sure that the car, which has already been tested to drive on the left hand side of the road, is actually supposed to drive on the left hand side (like in the UK). In other words, you could have software that meets all its stated requirements, but is still wrong, because the requirements aren’t valid.

The tests we will cover will be verification tests.

Test Coverage Criteria

The goal of software testing is to increase confidence that your code works. There is no way to “prove” that your code is perfect, or correct, outside of some formal methods you might learn in your upper level courses. This is because it is virtually impossible to test your software on every possible combination of inputs. Instead, developers try to approximate various common, representative, and/or interesting inputs or paths through their code, and these those. Below are some example test coverage criteria commonly used in software testing.

Input Domain Partitioning

Using Input Domain Partitioning, developers partition the possible input space into some number of classes, and then draw samples from those classes in all possible combinations with respect to classes. This is most easily explained with an example. Imagine you have the following simple method that is meant to return child, adult, or senior depending on the age of a person:

age_group.py

def get_age_group(age: int): 
  if age >= 18 and age < 65:
    return "adult"
  if age < 18:
    return "child"
  return "senior"

In terms of the possible input space (which is an almost infinite range of intergers) the following categories of human age might be reasonable to test:

Children, between 1 and 17 years old
Adults who are between 18 and 64 years old
Seniors who are 65 or older

But this is not yet a partition of all integers; we also need to consider:

The age 0; is this a baby who was recently, born, or an invalid input?
Negative numbers, which are invalid inputs
Do we want, for error checking purposes, to do something if we get an age over 120? And what does that mean for how our code will be useful over time?

Finally, while this code is typehinted for the argument age to be an integer, if we don’t use formal type checking, the function could be called with a non-integer argument. Python does not, by default, check type.

Let’s say we decide to allow 0-year-olds to count as babies, and should return string `‘invalid input’`` for anyone older than 120. Then, all six of the previous bullets become our partition of the input space; every possible integer is mapped to a specific value:

Invalid inputs: age less than 0 or greater than 120
Children: ages 0 through 17, inclusive
Adults: ages 18 through 64, inclusive
Seniors: ages 65 through 120, inclusive

In terms of integers, some of these partitions are effectively infinite in size, some have a few dozen possibilities, and some are rather small. It’s not feasible to test every single possible input here, primarily because we’d have to specify the expected outputs. Instead, we will follow the following algorithm for selecting useful and interesting test cases from these partitions:

Choose one element from the middle of each partition
Choose edge cases for each partition, where it may “touch” another partition(s)

For example, while it’s not foolproof (especially since we may be writing tests without seeing the code), it’s likely safe to select some middle inputs as follows:

Invalid inputs: -2, 122
Children: 9
Adults: 45
Seniors: 80

We could certainly pick more arbitrary examples, but based on how we expect our code to shape up, it’s not likely to bring us much more value.

Instead, we should pick edge cases for each partition that touch the other partitions:

Invalid inputs: -1, 121
Children: 0, 17
Adults: 18, 64
Seniors: 65, 120

These are arguably the most important test cases in this example, as they may catch common “off by one” errors when using the if statements above.

With these specified, we now need to write the tests. We’re going to use the pytest testing framework (other testing frameworks include doctest and unittest.).

Using `pytest`

Our tests want to assert that a given function input yields a specific function output. In Python, we do this with assert statements:

def test_get_age_group_0():
  assert get_age_group(23) == "adult"

Note that the assertion is inside of a function and the name of the function starts with test – this is important: it is how pytest finds the tests.

If we add the test into the Python script, we can now run:

pytest -q age_group.py

and all tests in the file will be run:

1 passed in 0.00s

Let’s add another test:

def test_get_age_group_1():
  assert get_age_group(-2) == "invalid age"

Now when we run our tests, we get a test failure:

    def test_age_group_1():
>     assert get_age_group(-1) == "invalid age"
E     AssertionError: assert 'child' == 'invalid age'
E
E       - invalid age
E       + child

(If the output looks familar to you, that’s because you’ve seen it before: the autograder for CSCI 6201 uses pytest!)

Add more tests in the practice problems

Working With Types and Raising Exceptions

Because Python is not strongly typed, it’s useful to write tests that handle “unexpected” input types.

A basic tool for this is the type function:

if type(x) != int:
  # do something

So far, we’ve experienced exceptions when we have made errors. We can also raise our own exceptions deliberately, to indicate that something has gone wrong:

if type(x) != int:
  raise TypeError("this function expects an int")

Using pytest, we can test to see if the errors we expect are raised:

import pytest

def test_non_int():
  with pytest.raises(TypeError):
    get_age_group("thirty five")

Write test cases that check for exceptions (and change your function to raise exceptions) in the practice problems

Practice

Practice Problem 12.1

Complete the tests we’ve specified for age_group.py, and then fix the get_age_group function so that it passes the tests.

Practice Problem 12.2

First, write additional tests to handle non-integer inputs.

After you have written the tests, modify the get_age_group function to pass the tests.

Practice Problem 12.3

Write test cases to except your function will raise a TypeError if the input is not an integer, and raise a ValueError if the input is negative or greater than 120.

After you have written the tests, modify the get_age_group function to pass the tests.

Homework

Homework problems should always be your individual work. Please review the collaboration policy and ask the course staff if you have questions. Remember: Put comments at the start of each problem to tell us how you worked on it.
Double check your file names and return values. These need to be exact matches for you to get credit.
For this homework, don’t use any built-in functions that find maximum, find minimum, or sort. Don’t use any built-ins or libraries that implement trees or heaps.
These problems are intended to be completed in sequence. Each builds on the previous.

Homework Problem 12.1

Homework Problem 12.1 (20 pts)

Recall fondly the underliner problems from Note 10.

Write test cases for the multi_underliner function: the function should match all examples a substring (second arg) in a string (first arg). Assume the multi_underliner function will be imported from multi_underliner.py.

Each of your tests should have a single assert statement that tests for a single case.

Submit as test_underliner.py

Homework Problem 12.2

Homework Problem 12.2 (50 pts)

Consider the triangle.

A Python function triangulator is specified to solve for the angles of a triangle when given the sides of a triangle.

triangulator.py

def triangulator(sides: tuple[float, float, float]) -> tuple[float, float, float]:
  # logic
  return angles

Sides are expressed in arbitrary units; angles are expressed in degrees, and will be rounded to the closest tenth of a degree.

If the input argument does not represent a valid triangle, a Value Error should be raised.

Write unit tests to validate the function’s behavior. Assume the function will be imported from triangulator.py.

Each of your tests should have a single assert statement that tests for a single case.

Submit as test_triangulator.py

Homework Problem 12.3

Homework Problem 12.3 (30 pts)

Write exhaustive test cases to validate the functionality of your JSON parser from Note 6.

Assume the parse_json function will be imported from parse_json.py.

Each of your tests should have a single assert statement that tests for a single case.

Submit as test_parse_json.py.

Testing - Introduction

Debugging

Test Cases

Test Coverage Criteria

Input Domain Partitioning

Using pytest

Working With Types and Raising Exceptions

Practice

Practice Problem 12.1

Practice Problem 12.2

Practice Problem 12.3

Homework

Homework Problem 12.1

Homework Problem 12.2

Homework Problem 12.3

Using `pytest`