import numpy as np
Week 12: NumPy
Reading: Python for Data Analysis Chapter 4, through “Boolean Indexing”
Notes
The NumPy Array
Python did not originally have dedicated features for numerical computing: these were added later in a library called NumPy (or numpy
).
Check to see if you have numpy installed by importing it:
The convention import numpy as np
exists entirely to save us some typing. It is widespread: almost every reference on numpy will use it, so we will use it here.
The basic data type in numpy is the array. It’s similar to a list, but better for doing math.
Recall that the +
operator between lists concatenates lists:
= [2, 3, 4] x
= [1, 2, 3] y
+ y x
[2, 3, 4, 1, 2, 3]
We create numpy arrays by using the np.array
function called on a list:
= np.array(x) x_arr
type(x)
list
type(x_arr)
numpy.ndarray
Note: ndarray
means “n-dimensional array” - we will just call it an array.
Arrays are very useful for mathematical operations. Note how the +
operator adds the two arrays, element by element:
= np.array(y) y_arr
x_arr
array([2, 3, 4])
y_arr
array([1, 2, 3])
+ y_arr x_arr
array([3, 5, 7])
Arrays also support other mathematical operations:
- y_arr x_arr
array([1, 1, 1])
* y_arr x_arr
array([ 2, 6, 12])
/ y_arr x_arr
array([2. , 1.5 , 1.33333333])
Arrays can be used for math with regular numbers (scalars):
+ 2 x_arr
array([4, 5, 6])
The number is broadcast across the entire array.
If you’ve had a linear algebra class, you might want to use numpy arrays for tensor operations.
- The
@
operator performs matrix multiplication - The
.T
value of an array is its transpose
1, 2],[3, 4]]) @ np.array([[1, 2]]).T np.array([[
array([[ 5],
[11]])
Numpy arrays can also be indexed similarly to lists:
1] x_arr[
3
1] += 1 x_arr[
1] x_arr[
4
Numpy arrays have some substantial differences from lists. For instance, you cannot use .append
with a numpy array.
Randomness and Simulation
Many early and contemporary uses of computing involve simulation: calculating a value approximately by simulating events. Events can be simulated through randomness.
We can generate a random number with numpy:
np.random.random()
0.25241441390830555
np.random.random()
gives us a random number in between 0 and 1.
Let’s use this to create a “coin”:
coin.py
Tails
Let’s test our “coin” (include all of this in one file):
trials = 10000
results = 0
for j in range(trials):
if flip() == "Heads":
results += 1
print(results/trials)
0.501
About 50% - it works!
We can also randomly choose an element from a list:
"bricks", "lumber", "cement"]) np.random.choice([
'bricks'
Let’s use randomness to simulate a simple problem: estimating the value of \(\pi\).
Assume:
- We know that the formula for a circle is given by \(x^2 + y^2 = r^2\)
- We know the area of a square is \(s^2\)
- We know the area of a circle is \(\pi \cdot r^2\)
- We don’t know the value of \(\pi\)
First, let’s use our random number generator to get values in between -1 and 1.
By default, np.random.random()
gives values between 0 and 1. To get values between -1 and 1:
- Double the default random values
- This places them between 0 and 2 instead of 0 and 1
- Subtract one from the doubled value
- This places them between -1 and 1
* 2 ) - 1 ( np.random.random()
-0.13069645606873692
Now let’s write a function that generates one point and returns it:
def generate_point():
= np.random.random() * 2 - 1
x = np.random.random() * 2 - 1
y return x, y
generate_point()
(0.049670796033088216, -0.8184505471766754)
We can use another function to check if a point is inside the circle:
def check_point(x, y):
if x**2 + y**2 <= 1:
return True
else:
return False
0, 0) check_point(
True
1, 1) check_point(
False
Now let’s generate a lot of points, check all of them, and count the ones inside the circle:
num_points = 1000000
in_circle = 0
for j in range(num_points):
x, y = generate_point()
result = check_point(x, y)
if result:
in_circle += 1
print(in_circle)
786027
Finally, let’s check the ratio of points in the circle to total points. We’ll multiply it by four, because the side of the “square” we are using is 2, and we’re looking at the ratio: \[\frac{\pi \cdot r^2}{(2\cdot r)^2} = \frac{\pi \cdot r^2}{4 \cdot r^2} \]
= in_circle/num_points
ratio print(ratio * 4)
3.144108
The answer is reasonably close to \(\pi\), and if we make the number of points larger, the result will become more accurate.
We will use Python to plot charts of our results, using a library called seaborn
.
Plotting is a good way to communicate numerical information visually. We’ll show you a few things about plotting, but this course won’t require you to make plots or test you on plotting.
To follow along, you can install seaborn
by typing pip install seaborn
at your terminal.
Plotting with Python is much more powerful and expressive than plotting with a spreadsheet program such as Microsoft Excel. We will present an ‘extra’ lesson on plotting at the end of the course.
Let’s plot the results. First, we need to import a plotting library:
import seaborn as sns
import matplotlib.pyplot as plt
Next, we need to remember the x
and y
values we generated, so we modify our loop to do so:
num_points = 1000000
x_circ, y_circ = [], [] # lists for values in circle
x_no_circ, y_no_circ = [], [] # lists for values out of circle
for j in range(num_points):
x, y = generate_point()
result = check_point(x, y)
if result:
x_circ.append(x)
y_circ.append(y)
else:
x_no_circ.append(x)
y_no_circ.append(y)
We can check the result: this time, we’ll look at the length of one of the lists of “in circle” values:
= len(x_circ)/num_points
ratio print(ratio * 4)
3.14418
Now we’ll plot the result:
={'figure.figsize':(6,6)})
sns.set_theme(rc=x_circ, y=y_circ)
sns.scatterplot(x=x_no_circ, y=y_no_circ)
sns.scatterplot(x plt.show()