Project3: Machine Learning models for real-world use (due 04/21 11:59pm)

In this project, we'll take the CNN project we've worked on this semester and complete the following analyses:

Generate test cases and datasets
Discuss what biases may be present in the data and the modeling
Try to explain what the model has learned using saliency maps

Let's get started!

Part 1: Generating test cases

In this section, we're going to design some datasets to convince ourselves that our model is giving us reasonable results. First, we'll generate some toy datasets from the Stanford dogs dataset where we know what the correct outcome should be. Complete the items below:

Please copy each line of the grading rubric (including number) into a markdown element in your jupyter notebook that matches the cell that completes it, so we don't miss anything during grading :-)

GRADING RUBRIC for Part 1:

1. Generate a dataset of just three images, one for each class, and show your model correctly labels them. (display each image in your notebook, pass it to your model, and then print the prediction).	3 points
2. Generate three datasets of your inputs, where each has only two of the classes. What do you predict the performance should be for three binary classifiers trained on these three datasets? Re-train your model on these three datasets, and discuss your results.	5 points
3. Generate a dataset from your original dataset where 20% of the classes in one class are mis-labelled as the remaining two classes. How do you think your model performance will be impacted? Re-train your model on this test dataset, and discuss your results.	3 points

Part 2: Biases in the modeling

Complete the following items: GRADING RUBRIC for Part 2:

4. Take a look at each of the images in all classes individually. What aspects of the image (such as backgrounds) might be influencing the decision-making of the model, besides the dogs themselves?	3 points
5. Calculate the "average image" across all pixels of each of your classes in your training dataset. Are your results consistent with the previous item?	5 points
6. Is the data biased in any way that could impact your results? Why or why not?	3 points
7. If you noted some potential biases in the modeling/dataset above, discuss how you could help mitigate these biases (you don't need to implement, just discuss). If you didn't note any biases in this dataset, discuss what biases there could have been, and how the dataset designers might have helped mitigate them.	3 points

Part 3: Model uncertainty and explainability

We discussed earlier this semester how deep learning models can be black boxes; it's hard to tell what the models really learned, that is, what are the using to make their decisions?

Overall, we got incredibly good performance earlier with just a couple hundred images per class. Let's investigate how much pre-training with ImageNet helped us achieve those results. First, re-train your model, but this time don't use a pre-trained version (you can set the flag to False). What happened? Include a discussion in your notebook.

Next, let's see what happens in two cases:

When we try to build a model on just the head of the dog (cropping just the head/face)
When we crop the dog out of the image, leaving just the background

Discuss two hypotheses for the two datasets above -- how do you think they should perform, and why. If you have time, you can manually build the two datasets above and re-train your model, but you're not required to.

Pixel importance with saliency maps

Finally, let's see how much each pixel contributes to the final decision of the model using saliency mapping. Although saliency mapping has its limitations and is not the be-all-end-all of this type of analysis, it will at least get us started this semester towards a discussion of model explainability.

Write a loop that goes through every image in your dataset and prints out the original image and a saliency map for each image. Use any resources you find online (other than asking someone to do the work for you!) to generate such a saliency map; you can probably do this in about 20 lines of code without the need to import any more libraries, but feel free to get fancy! If you get stuck, as us for help on Ed. Make sure you cite any resources you used!

Finally, include a discussion of the output of your saliency analysis for this dataset -- what do you think this means in terms of your model performance?

GRADING RUBRIC for Part 3:

8. Discussion of how model should behave if trained just on heads of dogs	2 points
9. Discussion of how model should behave if trained just on backgrounds without dogs	2 points
10. Discussion of model performance without using pre-trained weights vs using pre-trained weights.	2 points
11. Correctly implementing saliency maps for all images.	5 points
12. Discussion of saliency mapping results.	2 points

All done! Where to go next?

Great work on this project! If you're interested in learning more about how researchers are testing ML models, or how saliency maps are useful (or not) to users, check out:
Evaluating Saliency Map Explanations for Convolutional Neural Networks: A User Study
Testing and validating machine learning classifiers by metamorphic testing

Extra credit:

Discuss paper 1 in one paragraph.	5 points
Discuss paper 2 in one paragraph.	5 points