Week 12: NumPy

Reading: Python for Data Analysis Chapter 4, through “Boolean Indexing”

Notes

The NumPy Array

Python did not originally have dedicated features for numerical computing: these were added later in a library called NumPy (or numpy).

Check to see if you have numpy installed by importing it:

import numpy as np

The convention import numpy as np exists entirely to save us some typing. It is widespread: almost every reference on numpy will use it, so we will use it here.


The basic data type in numpy is the array. It’s similar to a list, but better for doing math.

Recall that the + operator between lists concatenates lists:

x = [2, 3, 4]
y = [1, 2, 3]
x + y
[2, 3, 4, 1, 2, 3]

We create numpy arrays by using the np.array function called on a list:

x_arr = np.array(x)
type(x)
list
type(x_arr)
numpy.ndarray

Note: ndarray means “n-dimensional array” - we will just call it an array.


Arrays are very useful for mathematical operations. Note how the + operator adds the two arrays, element by element:

y_arr = np.array(y)
x_arr
array([2, 3, 4])
y_arr
array([1, 2, 3])
x_arr + y_arr
array([3, 5, 7])

Arrays also support other mathematical operations:

x_arr - y_arr
array([1, 1, 1])
x_arr * y_arr
array([ 2,  6, 12])
x_arr / y_arr
array([2.        , 1.5       , 1.33333333])

Arrays can be used for math with regular numbers (scalars):

x_arr + 2
array([4, 5, 6])

The number is broadcast across the entire array.

If you’ve had a linear algebra class, you might want to use numpy arrays for tensor operations.

  • The @ operator performs matrix multiplication
  • The .T value of an array is its transpose
np.array([[1, 2],[3, 4]]) @ np.array([[1, 2]]).T
array([[ 5],
       [11]])

Numpy arrays can also be indexed similarly to lists:

x_arr[1]
3
x_arr[1] += 1
x_arr[1]
4

Numpy arrays have some substantial differences from lists. For instance, you cannot use .append with a numpy array.

Randomness and Simulation

Many early and contemporary uses of computing involve simulation: calculating a value approximately by simulating events. Events can be simulated through randomness.

We can generate a random number with numpy:

np.random.random()
0.25241441390830555

np.random.random() gives us a random number in between 0 and 1.

Let’s use this to create a “coin”:

coin.py
import numpy as np

def flip():
    rand_number = np.random.random()
    if rand_number > 0.5:
        return "Heads"
    else:
        return "Tails"

print(flip())
Tails

Let’s test our “coin” (include all of this in one file):

trials = 10000
results = 0
for j in range(trials):
    if flip() == "Heads":
        results += 1
print(results/trials)
0.501

About 50% - it works!

We can also randomly choose an element from a list:

np.random.choice(["bricks", "lumber", "cement"])
'bricks'

Let’s use randomness to simulate a simple problem: estimating the value of \(\pi\).

Assume:

  • We know that the formula for a circle is given by \(x^2 + y^2 = r^2\)
  • We know the area of a square is \(s^2\)
  • We know the area of a circle is \(\pi \cdot r^2\)
  • We don’t know the value of \(\pi\)

First, let’s use our random number generator to get values in between -1 and 1.

By default, np.random.random() gives values between 0 and 1. To get values between -1 and 1:

  • Double the default random values
    • This places them between 0 and 2 instead of 0 and 1
  • Subtract one from the doubled value
    • This places them between -1 and 1
( np.random.random() * 2 ) - 1
-0.13069645606873692

Now let’s write a function that generates one point and returns it:

def generate_point():
    x = np.random.random() * 2 - 1
    y = np.random.random() * 2 - 1
    return x, y
generate_point()
(0.049670796033088216, -0.8184505471766754)

We can use another function to check if a point is inside the circle:

def check_point(x, y):
    if x**2 + y**2 <= 1:
        return True
    else:
        return False
check_point(0, 0)
True
check_point(1, 1)
False

Now let’s generate a lot of points, check all of them, and count the ones inside the circle:

num_points = 1000000
in_circle = 0
for j in range(num_points):
    x, y = generate_point()
    result = check_point(x, y)
    if result:
        in_circle += 1
print(in_circle)
786027

Finally, let’s check the ratio of points in the circle to total points. We’ll multiply it by four, because the side of the “square” we are using is 2, and we’re looking at the ratio: \[\frac{\pi \cdot r^2}{(2\cdot r)^2} = \frac{\pi \cdot r^2}{4 \cdot r^2} \]

ratio = in_circle/num_points
print(ratio * 4)
3.144108

The answer is reasonably close to \(\pi\), and if we make the number of points larger, the result will become more accurate.

Plotting

We will use Python to plot charts of our results, using a library called seaborn.

Plotting is a good way to communicate numerical information visually. We’ll show you a few things about plotting, but this course won’t require you to make plots or test you on plotting.


To follow along, you can install seaborn by typing pip install seaborn at your terminal.

Plotting with Python is much more powerful and expressive than plotting with a spreadsheet program such as Microsoft Excel. We will present an ‘extra’ lesson on plotting at the end of the course.

Let’s plot the results. First, we need to import a plotting library:

import seaborn as sns
import matplotlib.pyplot as plt

Next, we need to remember the x and y values we generated, so we modify our loop to do so:

num_points = 1000000
x_circ, y_circ = [], [] # lists for values in circle
x_no_circ, y_no_circ = [], [] # lists for values out of circle
for j in range(num_points):
    x, y = generate_point()
    result = check_point(x, y)
    if result:
        x_circ.append(x)
        y_circ.append(y)
    else:
        x_no_circ.append(x)
        y_no_circ.append(y)

We can check the result: this time, we’ll look at the length of one of the lists of “in circle” values:

ratio = len(x_circ)/num_points
print(ratio * 4)
3.14418

Now we’ll plot the result:

sns.set_theme(rc={'figure.figsize':(6,6)})
sns.scatterplot(x=x_circ, y=y_circ)
sns.scatterplot(x=x_no_circ, y=y_no_circ)
plt.show()

Practice

Practice Problem 12.1

Practice Problem 12.1

Write a function to_numpy that takes as argument a list and returns:

  • A numpy array of the same values, if every element of the list is numeric
  • The bool False if any element of the list is non-numeric.

Practice Problem 12.2

Practice Problem 12.2

Write a function compare_lists that takes two lists of ints and compares each pair of elements, returning a list of bools corresponding to whether the first list’s element is greater:

  • compare_lists([2, 3, 4], [3, 1, 2]) returns [False, True, True]
    • 2 is not greater than 3, 3 is greater than 1, 4 is greater than 2

Practice Problem 12.3

Practice Problem 12.3

Write a function list_ceil that takes two arguments:

  • The first argument a list of ints
  • The second argument is a single int

The function should return a new list, identical to the original list, except no int in the new list is greater than the second argument.

  • list_ceil([3, 5, 2, 1], 4) returns [3, 4, 2, 1]
  • list_ceil([-1, 0, 6, 1], 2) returns [-1, 0, 2, 1]

Practice Problem 12.4

Practice Problem 12.4

Write a function array_mean that takes as argument a numpy array, calculates the mean, subtracts it from the array, and returns it.

Practice Problem 12.5

Practice Problem 12.5

Estimate the value of \(\pi\) with a circle of radius 2 (our example had radius 1).