GWU

CS 4364/6364

Introduction to Machine Learning, Spring 2022

GWU Computer Science


Project0: Environment Setup (due 01/17 11:59pm)

For the assignments in this class, you are free to use whatever environment, IDE, and programming language you feel most comfortable with. The following are the tools that we'll be providing support for; if you deviate from this list, we will try to help you, but we may not be familiar with that language/toolkit/environment:


You are free to use whatever text editor you wish; we will only use Jupyter Notebook for certain programming projects.






Part 1: Downloading Anaconda

We're going to be using Anaconda to download and install python packages needed for this course. Follow the instructions to download and install conda, which we'll use as a command-line tool.

Check that you have installed conda correctly by trying to run it on the terminal.

If you feel comfortable, we encourage you to use virtual environments via conda as a best practice.

Note: if you are using a Windows machine, you'll want to launch the Anaconda power shell (a terminal) using admin priveledges (right mouse click when you are trying to open the application).






Part 2: Download and install Jupyter Notebook

Jupyter Notebooks, for better or for worse, have become an industry standard in the fields of data science and machine learning. Notebooks were designed to facilitate the sharing of data analyses, which includes not only the code used to create visualizations and train models, but also markup language for human-legible commentary. However, they have many limitations from software engineering perspectives, so it's important to not rely on them; this is less of an issue for people with a computer science and programming background than many others who enter the field of machine learning from other STEM subjects (or beyond!).

For the projects in this class, we'll use Jupyter Notebooks because they are a convenient way to display visualizations and English commentary about python code.

Download and install Jupyter Notebook.

Once you have finished installation, use the terminal command jupyter notebook to check your installation, and make sure this opens up a browser window. On the right hand side of that window, you can click on "New" to create a new python3 notebook to use for an assignment.






Part 3: Installing python libraries

As we work on assignments through this course, we'll often need to download and install libraries (like scikit-learn). There are many ways to do this, but we'll keep it simple: try running your notebooks, and if it complains about a missing library, use conda to install the package (you can google "conda <package name>" and it will take you to the correct way to download the desired library -- just be weary of typosquatting!).

Note that you'll need to restart your jupyter notebooks in order for it to recognize a freshly installed package.

If you get stuck, ask us for help on Ed!

Practice this process by opening a new notebook, and running a cell with the following import statement (it should fail because we haven't installed the library yet):
import pandas

Now, google "conda pandas" and find the Anaconda page in those search results (https://anaconda.org/anaconda/pandas). You'll see the command there to install pandas using conda: conda install -c anaconda pandas. Run that command, restart your notebook, and try re-running the cell to verify the library is now available in your jupyter notebook environment.






Part 4: GPU access

We are going to try to be ambitious this semester and have the whole class do at least some deep learning on GPUs.

GPUs are needed to run deep learning libraries to get results in a "reasonable" timeframe. Unfortunately, GPUs are expensive, and in general we don't have the resources to provide each student with their own dedicated GPU or cluster. Therefore, we will find a few workarounds for this course on our deep learning assignments: