Group Project: Choose your own Machine Learning Adventure!

In this project, you will work on a single semester-long project on a topic and dataset of your choice. This option is meant for students in CS6364 and those with previous machine learning experience. The goal is to work on a project that could form the foundation of an academic article or similar. Students selecting this option will need to submit the proposal below by 01/19 to be approved. The projects selected need to be complex. This assignment must be completed in teams of two or three -- project complexity must scale with the number of team members specified.

Instructions: Project selection and proposal

Your project must make use of at least one deep learning model.

Note: please be mindful of where you host and source your data. We will not allow students to work on projects that contain objectionable and/or illegal material that is also prohibited by the school's computing policy (which includes, but is not limited to pornographic material). You and your team are solely responsible for ensuing that the data you are working with abides by GW's computing policy as well as the law.

Please fill out the sections below giving details of your chosen project. Then, review the grading rubric here and explain how your project proposal either will be able to meet that existing requirement, or suggest an equal, alternative-in-spirit item for professor approval.

Part 1: Project Description

Please fill out the sections below.

In one sentence, what is the dataset and what are you trying to predict?

Describe your dataset. How many classes (if doing classification), how many samples, how many features/pixels, class imbalance, etc. Provide a link to download your dataset (or some samples) and what the license is for use.

What kind of modeling were you thinking of doing?

How will you use your results? Will your model be a part of a larger tool ecosystem you will implement this semester? Note: the group projects need to be more complex than individual assignments, so this is one way to achieve this requirement.

Are you planning to publish this work as part of an academic paper? If so, please provide a link to the closest related work, and summarize what you believe your novel contribution will be.

How will you divide up the work amongst your group members?

How much memory will you need to store your dataset?

How many hours of GPU compute time a week do you think you will need, and for how many?

Please copy each line of the grading rubric (including number) into a markdown element in your jupyter notebook that matches the cell that completes it, so we don't miss anything during grading :-)

Part 2: Grading Rubric Adjustment

1.	Load data correctly and show contents in a cell	2 points
2.	Holdout dataset split as specified	2 points
3.	Correct explanation generalization from such a holdout split	5 points
4.	Printout of dataset distribution, including missing data. For imagery datasets, provide the "average image" for each class. For tabular data, use `value_counts()` and `describe()`. For textual data, show the distribution of your labels/targets.	2 points
5.	Discussion of how the dataset distribution can/will affect your modeling.	5 points
6.	Handle any missing data. For imagery/text datasets, discuss what records/items you might drop and why.	2 points
7.	The holdout dataset also contains missing data/bad images/text. Discuss how you handled this in your holdout, or why it was not a problem for you.	5 points
8.	Discuss (and implement if applicable) whether or not you need to scale/normalize your features, and which ones, if any, for tabular data or imagery. For textual data, display the outputs of the word embeddings and discuss why they look the way they do.	5 points
9.	If your dataset has categorical features: discuss and implement if you will encode them as ordinal numbers, or one-hot encode them, and why you chose to do so for each such feature. If you are using images/text, discuss whether you are performing classification or regression on your dataset and why (instead of the other one).	5 points
10.	Give an example of an ordinal feature that you've seen used by others, when it should have been treated as a categorical.	2 points
11.	For tabular data: Use a heatmap to show the correlation between all feature pairs. Discuss, if any, which features you would recommend dropping from your model. Also discuss why you would want to drop them (what is the expected benefit?). For imagery/text: Show a histogram of the distribution of pixels or word embeddings across your dataset.	5 points
12.	Discuss what feature you would engineer (and implement) if using tabular data, what customized dataset augmentation you would use (not required to implement) if images, or what non-standard pre-processing might help, if text	5 points
13.	Separate your training data into features and labels.	2 points
14.	Discuss and implement how you will handle any dataset imbalance.	5 points
15.	Instantiate a model of your choosing.	2 points
16.	Define a grid to tune at least three different hyperparameters with at least two different values each. Discuss why you think these parameter values might be useful for this dataset.	5 points
17.	Set up a `gridsearchCV` with 5-fold cross validation (scikit-learn) or equivalent in PyTorch. Discuss what accuracy metric you chose and why.	5 points
18.	Train your model using grid search (or equivalent), and report the best performing hyperparameters.	2 points
19.	Calculate accuracy, precision and recall on the holdout dataset. Discuss which metric you think is most meaningful for this dataset, and why	5 points
20.	Discuss how the model performance on holdout compares to the model performance during training. Do you think your model will generalize well? Why or why not?	5 points
21.	Generate a confusion matrix and discuss your results.	5 points
22.	Train and tune another type of model on your training dataset. Using the best performing hyperparameters, test this model on your holdout. How did it perform, compared to your earlier model? Do you think your results will generalize?	5 points
23.	Next, repeat training and tuning on the same data with a third model, dissimilar from the other two. Do you need to do any additional feature cleaning or scaling here? Why or why not?	5 points
24.	For images, define a list of image transformations to be used during training, passing them to `transforms.Compose()`. For text and tabular data, discuss what pre-processing you used. Discuss why you think these transformations might help.	5 points
25.	Repeat the step above for test and validation transformations.	2 points
26.	Correctly set up `DataLoader`s for the three folders (train, validation, holdout). Discuss what options you chose for these loaders, and why (including batch size, shuffling, and dropping last).	5 points
27.	Instantiate any pre-trained model. Discuss why you chose it amongst the others.	5 points
28.	Write code to freeze/unfreeze the pretrained model layers.	2 points
29.	Replace the head of the model with sequential layer(s) to predict however many classes you need.	2 points
30.	What activation function did you use in the step above? Why?	5 points
31.	Did you use dropout in the step above? Why or why not?	5 points
32.	Did you use batch normalization in the step above? Why or why not?	5 points
33.	Choose and instantiate an optimizer. Discuss your choice.	5 points
34.	Choose and instantiate a loss function. Discuss your choice.	5 points
35.	Write code that places the model on the GPU, if it exists, otherwise using the CPU.	2 points
36.	Correctly set up your model to train over 20 epochs.	2 points
37.	Correctly set up your model to use your batches for training.	2 points
38.	Correctly make predictions with your model (the predictions can be wrong).	2 points
39.	Correctly choose a loss function and back-propagate its results.	2 points
40.	Use the optimizer correctly to update weights/gradients.	2 points
41.	Correctly record training losses for each epoch.	2 points
42.	Correctly set up validation at each epoch.	2 points
43.	Correctly record validation losses for each epoch.	2 points
44.	Correctly record training and validation accuracies for each epoch	2 points
45.	Graph training versus validation loss using `matplotlib.pyplot` (or other). Was your model overfitting, underfitting, or neither?	5 points
46.	Make a list of reasons why your model may have under-performed.	5 points
47.	Make a list of ways you could improve your model performance (you don't have to implement these unless you wan to).	5 points
48.	Graph training versus validation accuracy using `matplotlib.pyplot` (or other). Score your model on its predictions on the holdout. Discuss why you think your results will or will not generalize.	5 points
49.	Generate a dataset of just three items, one for each class, and show your model correctly labels them. (display each item in your notebook, pass it to your model, and then print the prediction).	5 points
50.	Generate three datasets of our inputs, where each has only two of the classes. What do you predict the performance should be for three binary classifiers trained on these three datasets? Re-train your model on these three datasets, and discuss your results.	5 points
51.	Generate a dataset from your original dataset where 20% of the classes in one class are mis-labelled as the remaining two classes. How do you think your model performance will be impacted? Re-train your model on this test dataset, and discuss your results.	5 points
52.	Take a look at each of the items in all classes individually. What aspects of the item (such as backgrounds) might be influencing the decision-making of the model, besides the salient parts themselves?.	5 points
53.	Is the data biased in any way that could impact your results? Why or why not?	5 points
54.	If you noted some potential biases in the modeling/dataset above, discuss how you could help mitigate these biases (you don't need to implement, just discuss). If you didn't note any biases in this dataset, discuss what biases there could have been, and how the dataset designers might have helped mitigate them.	5 points
55.	Correctly train your model without pre-training (and discussion how this affects performance)	5 points
56.	Correctly implement saliency maps for all images. If doing text or tabular data, discuss feature importances or other metric.	5 points
57.	Discussion of saliency mapping or other metric from above.	5 points

In-class presentation grading rubric

Presentations are expected to be about 15 minutes, with all group members present and speaking.

1.	Discussion of motivation for work, including explanation of related work.	5 points
2.	Discussion of dataset aquisition and preparation.	5 points
3.	Discussion of model selection and hyperparameter options chosen/hypothesized.	5 points
4.	Discussion of results.	5 points
5.	Good use of powerpoint, and presentation meets length requirements.	5 points

Part 3: Milestone proposal

Provide a list of milestones and due dates tailored to your project. You may use the grading rubric above, or the milestones can be more holistic. Include at least three milestones with dates.

Extra credit:

Choose up to four related papers to your work, and summarize each one in a paragraph.

Discuss paper 1 in one paragraph.	5 points
Discuss paper 2 in one paragraph.	5 points

CS 4364/6364