import numpy as np
Week 12: Modules and Classes
Reading: Python for Data Analysis Chapter 4, through “Boolean Indexing”
Notes
The NumPy Array
Python did not originally have dedicated features for numerical computing: these were added later in a library called NumPy (or numpy
).
Check to see if you have NumPy installed by importing it:
The convention import numpy as np
exists entirely to save us some typing. It is widespread: almost every reference on numpy will use it, so we will use it here.
The basic data type in numpy is the array. It’s similar to a list, but better for doing math. It’s also, formally, an object of the class numpy.ndarray
.
Note: ndarray
means “n-dimensional array” - we will just call it an array.
Recall that the +
operator between lists concatenates lists:
= [2, 3, 4] x
= [1, 2, 3] y
+ y x
[2, 3, 4, 1, 2, 3]
We create numpy arrays by using the np.array
function called on a list:
= np.array(x) x_arr
type(x)
list
type(x_arr)
numpy.ndarray
Every class has a function that creates a new instance of the class (a new object). That function has the same name as the class, and is formally called the constructor.
Arrays are very useful for mathematical operations. Note how the +
operator adds the two arrays, element by element:
= np.array(y) y_arr
x_arr
array([2, 3, 4])
y_arr
array([1, 2, 3])
+ y_arr x_arr
array([3, 5, 7])
Arrays also support other mathematical operations:
- y_arr x_arr
array([1, 1, 1])
* y_arr x_arr
array([ 2, 6, 12])
/ y_arr x_arr
array([2. , 1.5 , 1.33333333])
Arrays can be used for math with regular numbers (scalars):
+ 2 x_arr
array([4, 5, 6])
The number is broadcast across the entire array.
If you’ve had a linear algebra class, you might want to use numpy arrays for tensor operations.
- The
@
operator performs matrix multiplication - The
.T
value of an array is its transpose
1, 2],[3, 4]]) @ np.array([[1, 2]]).T np.array([[
array([[ 5],
[11]])
Numpy arrays can also be indexed similarly to lists:
1] x_arr[
np.int64(3)
1] += 1 x_arr[
1] x_arr[
np.int64(4)
Numpy arrays have some substantial differences from lists. For instance, you cannot use .append
with a numpy array.
Properties of Objects
Numpy arrays are objects, which collect variables and functions together logically. When functions are associated with objects, we refer to them as methods.
We can contrast these objects with “primitive” types like ints and floats.
You’ve seen some object-like properties already with strings:
= "George"
x = x.lower()
y print(y)
george
The string class has a method lower()
associated with it - you call lower()
on a string with a period between the name of the variable and the name of the method.
Numpy arrays have a great number of useful methods.
Consider the task of finding the index of a list corresponding to the largest value. The list [3, 7, 4, 2]
has the largest value at index 1 (the second item). Numpy arrays have a method, argmax
(argument maximum) for this:
= np.array([3, 7, 4, 2])
x = x.argmax()
y print(y)
1
Another method, reshape
changes the shape of an array:
= np.array([1, 2, 3, 4])
x = x.reshape(2, 2)
z print(z)
[[1 2]
[3 4]]
…but what is the “shape” of an array? Objects in Python have variables associated with them. We can access these like other variables, associating them with the object using a decimal.
= np.array([[1, 2, 3]])
x print("x:\n", x)
print("x.shape:\n", x.shape)
= x.reshape(3, 1)
y print("y:\n", y)
print("y.shape:\n", y.shape)
x:
[[1 2 3]]
x.shape:
(1, 3)
y:
[[1]
[2]
[3]]
y.shape:
(3, 1)
Numpy arrays also have a size
: the total number of numbers in the array:
= np.array([[10, 11, 20, 21]])
x print("x:\n", x)
print("x.size:\n", x.size)
= x.reshape(2, 2)
y print("y:\n", y)
print("y.size:\n", y.size)
x:
[[10 11 20 21]]
x.size:
4
y:
[[10 11]
[20 21]]
y.size:
4
There’s a relationship between size and shape: the size is the product of the dimensions of the shape multiplied together. Some shapes are invalid: for instance, an array of size 7
can’t be reshaped to (4, 2): there’s no logical way to put 7 things into eight places.
Practice
Practice Problem 12.1
Practice Problem 12.2
Practice Problem 12.3
Practice Problem 12.4
Practice Problem 12.4
Homework
Homework problems should always be your individual work. Please review the collaboration policy and ask the course staff if you have questions. Remember: Put comments at the start of each problem to tell us how you worked on it.
Double check your file names and return values. These need to be exact matches for you to get credit.
This homework includes 20 pts bonus - you can earn extra credit on this assignment.
Homework Problem 12.1
Homework Problem 12.2
- What is the size of the input array?
- What numbers can be multiplied together to yield the size?
- Recall the
greatest_factor
problem from earlier in the course. - You will want to check a series of possible values with a loop.
- Those values will always be greater than or equal to one 1, and less than or equal to the array size.
- Recall the
Note: There are “one dimensional” numpy arrays that have shapes like (4,)
— this is different from (4, 1)
. Don’t worry about these!