= [ [ [1,2], [3,4], [5,6] ], [ [7,8], [9,10], [11,12] ] ]
A
print(A[0])
[[1, 2], [3, 4], [5, 6]]
The goal of this module is to introduce numpy arrays, for use when working with numeric data.
Recall a basic list:
# A list of integers:
evens = [2, 4, 6, 8, 10]
# A list of strings:
greetings = ['hello', 'hi', 'howdy', 'aloha']
# Access list elements using square brackets and index
x = evens[1] + evens[3]
print(x)
# We can change the value at an individual position
evens[0] = x
# Recall: len() gives us the length of the list
print('length:', len(evens), 'contents:', evens)
# Example of using **in** to search inside a list:
if (not 'hey' in greetings):
print('missing hey')
# Add something new to the end of a list
evens.append(12)
# Write code here to increment each element by 2
print(evens)
# Should print: [12, 4, 6, 8, 10, 12]
First, in list_example.py
, type up the above to see what it prints (without including the missing code). Then, in list_example2.py
, add the missing code to increment each list element by 2.
Let’s recall a few things we learned about lists via this example:
Why are lists useful? - Using a loop to create elements, as in:
Lists allow both index iteration as above but also content iteration:
As it turns out, we can make a list of lists.
That is, a list whose elements are themselves lists.
For example:
A = [ [2,4,6,8,10], [1,3,5,7,9] ]
x = A[1] # The 2nd element is a list
print(x) # Prints [1,3,5,7,9]
y = A[1][3] # 4-element of 2nd list
print(y) # 7
print(len(A)) # 2
print(len(A[0])) # 5
A[1]
refers to the 2nd element of the outer list, which means A[1]
is the 2nd inner listA[1]
is a list itself, we can access its elements using an additional set of square brackets, e.g., A[1][3] = 7
len()
function applied to the whole list will give 2, while applying it to one of the constituent lists will give that list’s lengthConsider the following code:
A = [[1,2,3,4], [4,5,6,7], [8,9,10,11,12]]
x = A[?][?]
print(x) # Should print 7
# Write code to increment every element using a nested `for` loop:
print(A)
# Output should be: [2, 3, 4, 5], [5, 6, 7, 8], [9, 10, 11, 12, 13]]
In list_example3.py
, add the correct numbers to replace the question marks. Then, write a nested for
loop to increment every element of every constituent list.
Think of a single list as one dimensional:
In a one-dimensional list, we need a single number to access a data value in the list: print(A [3])
A list of lists is two dimensional:
In a two-dimensional list, we need two numbers to access a data value in the list:, hence print(A[0][2])
Think of a list of lists of lists as three-dimensional, which means three numbers fix the position of a element. For example:
It’s a bit hard to see the list of lists of lists:
We can look at indexing list elements by looking at each element:
Get the outermost element:
Get the third element of the outermost element:
Get the second element of the third element of the outermost element:
Python was created as a general-purpose programming language and did not originally have support for numerical computing. A library called NumPy (numpy
) was created to address this.
While lists are useful and easy to use, they are a bit inefficient “under the hood”:
Very large lists (million of elements or more) can slow down a program. A list-of-lists is even slower for large sizes, and takes up a lot of memory.
NumPy arrays (we will often call them simply “arrays”) were created as a separate structure in Python to enable efficient processing of lists of numbers, especially multidimensional lists.
Some of the most compelling uses involve the array equivalent of a list-of-lists-of-lists: an image. As we will see, a color image will can be represented as athree dimensional array while a black-and-white image only requires a two-dimensional array.
Because arrays are part of NumPy and not a “built-in” part of Python, the syntax around arrays is a bit different, for example:
import numpy as np
A = np.array([1,2,3,4])
print(type(A)) # What does this print?
A[1] = 5 # Replace 2nd element
print(A) # [1 5 3 4]
print(A.shape[0]) # 4
print('len(A)=',len(A)) # 4
# A[4] = 9
# A.append(9)
Type up the above in array_example.py
. Try un-commenting in turn each of the two commented-out lines at the end to see what kind of errors this yields. (Restore your program by commenting out both.)
Let’s point out a few things: To gain efficiency, arrays trade away some flexibility and ease of use. For example, we now need to import this special package called numpy
:
Once we do this, the syntax for making an array with actual data is, as we’ve seen:
np
?
The convention to import the numpy
library as np
is strictly for convenience, using the keyword as
to create a shortcut:
This could also be written as:
There is no difference in the function of the code by creating the shorcut, but because we tend to use a lot of numpy
arrays, the np
shortcut saves time in writing code and makes the code more readable. It has become a convention in the Python/NumPy community, and you will rarely encounter import numpy
without the np
shortcut.
numpy``array
function as a list:The actual array so created is assigned to the variable A
. To work with elements in the array, we use square brackets with the variable A
, just like a list:
The standard function len()
also works the same as with lists:
However, the array has a feature that is more general called shape:
shape
has.shape[0]
has the first dimension (the length of the array along the first dimension).shape[1]
has the length along the second dimension, and so on.shape[0]
.This creates a new array with the added element. The difference between what we have done here and modifying a list in place is subtle, but important. Typically most scientific applications do not change sizes on the fly, and so, this is not a serious restriction.
Numpy has powerful features that simplify manipulation of numeric arrays.
For example, consider:
Type up the above in array_example2.py
and add a line that multiplies the arrays A
and B
element-by-element, and prints the result: ([ 4 10 18])
.
In list_version.py
, let’s remind ourselves about how lists work. Start by examining what happens with:
X = [1, 2, 3]
Y = [4, 5, 6]
Z = X + Y
print(Z) # What does this print?
# Write code here to compute Z as element-by-element addition
# of X and Y (to give [5, 7, 9])
print(Z)
Then add code to perform element-by-element addition.
Numpy also has a number of functions that act on arrays and return arrays, for example:
One can apply a function like square-root element-by-element:
Numpy can create an array with random elements, as in:
This produces an array of size 20 with each element randomly chosen from among the numbers 1,2,3,4,5,6.
Numpy has its own random-generation tool: np.random
This has a function randint()
that takes the desired range (inclusive of first, excluding the last), and the desired size of the array.
One can also test membership using the in
operator: For example, suppose we roll a die 20 times and want to know whether a 6 occured:
In dice_problem.py
fill in code below to estimate the chances that you get a total of 7 at least once when rolling a pair of dice 10 times.
successes = 0
num_trials = 1000
for n in range(num_trials):
# Fill your code here
print( successes/num_trials )
To do so, generate one array called A
of length 10 with random numbers representing one die (selected from 1 through 6). Then generate a second array called B
that represents the 10 rolls of the second die. A success occurs when A[i]+B[i]
is 7 for some i
. Can you solve this without accessing individual array elements with a loop?
Here, 2D is short for two-dimensional. Let’s begin with a conceptual depiction of a 1D (one-dimensional) array:
Because we need a way to use our keyboard to enter elements, we use a particular kind of syntax, comma-separation with square-brackets to specify the elements. We use a similar type of syntax to access a particular element in this array, as in:
We can also change an element in an array:
which will result in the visualization:
To explain how a 2D array works, let’s start with its conceptual visualization, via an example:
Consider this visualization of a 2D array:
We use the term row to describe the contents going across one of the series of boxes going left to right:
And the term column (shortened to col in our pictures) to describe the series of boxes going vertically top to bottom:
A = np.array([ [50, 55, 60, 65, 70],
[100, 105, 110, 120, 125],
[150, 155, 160, 165, 170],
[200, 205, 210, 215, 220] ])
Here, we’ve added whitespace (that’s allowed) to line up the rows so that it’s visually organized. To access a particular element, we need the row number and column number, as in:
Important: Unlike a list-of-lists, arrays can use comma separation. For comparison:
# List of lists:
X = [ [2,4,6,8,10], [1,3,5,7,9] ]
print(X[0][2])
# 2D array:
X = np.array([ [2,4,6,8,10], [1,3,5,7,9] ])
print(X[0,2])
Arrays allow box separation as well (like lists) but this causes problems in other array operations (slicing): so please use comma-separation with a single set of square brackets for arrays.
Just as we used a for
loop for a single array, it is very typical to use a nested for
loop for a 2D array:
For comparison, let’s look at a 1D array:
A = np.array( [1, 4, 9, 16] )
for i in range(A.shape[0]): # Recall: A.shape[0] is the size
print(A[i])
The equivalent for a 2D array is:
A = np.array([ [50, 55, 60, 65, 70],
[100, 105, 110, 120, 125],
[150, 155, 160, 165, 170],
[200, 205, 210, 215, 220] ])
for i in range(A.shape[0]): # number of rows
for j in range(A.shape[1]): # number of columns
print(A[i,j])
To make the code a bit more readable, we could write
num_rows = A.shape[0]
num_cols = A.shape[1]
for i in range(num_rows):
for j in range(num_cols):
print(A[i,j])
Consider this conceptual 2D array: In 2D_array.py
, write code to create the array, and then a nested loop to print the array so that the output has one row on each line, with two spaces between elements, as in:
10 12 15
6 8 10
2 -1 -5
-4 4 5
In 2D_array2.py
, use the same array above and structure a nested loop to compute the sum of elements down each column so that the output is:
Column 0 total is 14
Column 1 total is 23
Column 2 total is 25
About 2D arrays:
Consider the following program:
from drawtool import DrawTool
import numpy as np
dt = DrawTool()
dt.set_XY_range(0,10, 0,10)
dt.set_aspect('equal')
greypixels = np.array([ [50, 55, 60, 65, 70],
[100, 105, 110, 120, 125],
[150, 155, 160, 165, 170],
[200, 205, 210, 215, 220] ])
dt.set_axes_off()
dt.draw_greyimage(greypixels)
dt.display()
Type up the above in image_example.py
and download drawtool.py into the same folder. Run the program and examine the image output as compared to the input array greypixels
.
What is a greyscale image? By greyscale, we mean an image without colors, but rather shades of grey (often 256 shades). Consider this illustration showing an image on the left with a small part of it zoomed in:
Any digital image is really a 2D arrangement of small squares called pixels, in rows and columns (just like an array). In a greyscale image, each pixel is colored a shade of grey. In standard (“8-bit”) greyscale images, there are 256 shades of grey numbered 0 through 255, where 0 represents black and 255 represents white.
Now let’s go back to the code and examine what we wrote:
greypixels = np.array([ [50, 55, 60, 65, 70],
[100, 105, 110, 120, 125],
[150, 155, 160, 165, 170],
[200, 205, 210, 215, 220] ])
Let’s now work with an actual image:
from drawtool import DrawTool
import numpy as np
dt = DrawTool()
dt.set_XY_range(0,10, 0,10)
dt.set_aspect('equal')
greypixels = dt.read_greyimagefile('eniac.jpg')
# greypixels is a 2D array
dt.set_axes_off()
dt.draw_greyimage(greypixels)
dt.display()
# Add code to print the number of rows, number of columns
# Should print: rows = 189 columns = 267
Type up the above in image_example3.py
and download eniac.jpg. Add code to print the number of rows and number of columns.
Image formats:
.jpg
in eniac.jpg
) tells you the format.Let’s now modify a greyscale image:
from drawtool import DrawTool
import numpy as np
dt = DrawTool()
dt.set_XY_range(0,10, 0,10)
dt.set_aspect('equal')
greypixels = dt.read_greyimagefile('eniac.jpg')
greypixels2 = np.copy(greypixels)
num_rows = greypixels2.shape[0]
num_cols = greypixels2.shape[1]
lightness_factor = 10
for i in range(num_rows):
for j in range(num_cols):
value = greypixels[i,j] + lightness_factor
if value > 255:
value = 255
greypixels2[i,j] = value
dt.set_axes_off()
dt.draw_greyimage(greypixels2)
# To save an image, use the save_greyimage() function:
# dt.save_greyimage(greypixels2,'eniac-light.jpg')
dt.display()
Type up the above in image_example4.py
and try different values (in the range 10 to 100) of the lightness_factor
. There is more than one “correct” answer!
In image_example5.py
write code to create the “negative” of a greyscale image (black turns to white, white to black, light grey to dark grey, dark grey to light grey, and so on). For example, applying this to the eniac.jpg image should result in eniac-negative.jpg.
In a color image, each pixel will have a color instead of a “greyness” value.
Unfortunately, one cannot easily represent colors with a single number.
There are many ways of using multiple numbers to encode colors.
We’ll use the most popular one: specify the strengths of the three primary colors (Red, Green, Blue).
This approach is so popular that we refer to it simply as “RGB.””
The “amount” of red is a number between 0 and 255, the amount of green is another such number, as is the amount of blue.
Each color is therefore a group of three numbers, for example:
(255,0,0)
is all red (no green, no blue) (0,255,0)
is all green (no red, no blue) (0,0,255)
is all blue (no red, no green) Let’s try a few more:
(255,255,0)
(100,255,255)
(200,200,200)
(grey is R,G,B all equal) (0,0,0)
(255,255,255)
is white When each pixel needs three numbers and there’s a grid of pixels,how do we store the numbers?
Let’s look at an example:
from drawtool import DrawTool
import numpy as np
dt = DrawTool()
dt.set_XY_range(0,10, 0,10)
dt.set_aspect('equal')
pixels = np.array(
[ [ [255,0,0], [200,0,0], [150,0,0], [50,0,0] ],
[ [255,50,0], [200,100,0], [150,150,0], [50,200,0] ],
[ [255,50,50], [200,100,100], [150,150,150], [50,200,200] ],
[ [0,50,50], [0,100,100], [0,150,150], [0,200,200] ],
[ [0,0,50], [0,0,100], [0,0,150], [0,0,200] ]
])
dt.set_axes_off()
dt.draw_image(pixels)
dt.display()
Let’s point out the structure inherent in the above 3D array:
Next, let’s work with actual color images with an example application: converting color to greyscale.
from drawtool import DrawTool
import numpy as np
dt = DrawTool()
dt.set_XY_range(0,10, 0,10)
dt.set_aspect('equal')
# The image file is expected to be in the same folder
pixels = dt.read_imagefile('washdc.jpg')
num_rows = pixels.shape[0]
num_cols = pixels.shape[1]
greypixels = dt.make_greypixel_array(num_rows, num_cols)
for i in range(num_rows):
for j in range(num_cols):
# Average of red/green/blue
avg_rgb = (pixels[i,j,0] + pixels[i,j,1] + pixels[i,j,2]) / 3
# Convert to int
value = int(avg_rgb)
greypixels[i,j] = value
dt.set_axes_off()
dt.draw_greyimage(greypixels)
# Notice: saving to a different image format (PNG):
dt.save_greyimage(greypixels, 'washdc-grey.png')
dt.display()
Type up the above in color_example3.py
and download washdc.jpg. Compare the file size of the original versus the new greyscale file you created.
Next, consider the following program:
from drawtool import DrawTool
import numpy as np
dt = DrawTool()
dt.set_XY_range(0,10, 0,10)
dt.set_aspect('equal')
pixels = dt.read_imagefile('washdc.jpg')
num_rows = pixels.shape[0]
num_cols = pixels.shape[1]
for i in range(num_rows):
for j in range(num_cols):
if ( (pixels[i,j,1] > pixels[i,j,0])
and (pixels[i,j,2] < 0.5*pixels[i,j,1]) ):
pixels[i,j,0] = 0
pixels[i,j,1] = 0
pixels[i,j,2] = 255
dt.set_axes_off()
dt.draw_image(pixels)
dt.display()
Type up the above in color_example4.py
. You already have washdc.jpg.
What did we just do?
In color_example5.py
, try to add additional rules to capture the remaining greenery (the trees) and set those pixels to blue as well.
Slicing can be applied to arrays in the same way that we used them earlier for lists with one major difference, as we’ll point out. For example:
import numpy as np
print('list slicing')
A = [1, 4, 9, 16, 25, 36]
B = A[1:3] # B has [4, 9]
print(B)
B[0] = 5 # B now has [5, 9]
print(B)
print(A) # What does this print?
print('array slicing')
A = np.array( [1, 4, 9, 16, 25, 36] )
B = A[1:3] # B "sees" [4, 9]
print(B)
B[0] = 5 # What happens now?
print(B)
print(A) # What does this print?
Type up the above in slicing_example.py
. Did you get the results you expected?
The slicing expression 1:3
in A[1:3]
refers to all the elements from position 1 (inclusive) to just before position 3 (so, not including position 3). With lists, a new list is created with these elements:
Here, array B refers to the segment (that’s still in A) from positions 2 to 3. This is why, if you make a change to B, you are actually changing A. Why did the authors of NumPy do this?
Slicing is a big sub-topic so we will point out a few useful things via an example:
# Color image:
A = np.array(
[ [ [255,0,0], [200,0,0], [150,0,0], [50,0,0] ],
[ [255,50,0], [200,100,0], [150,150,0], [50,200,0] ],
[ [255,50,50], [200,100,100], [150,150,150], [50,200,200] ],
[ [0,50,50], [0,100,100], [0,150,150], [0,200,200] ],
[ [0,0,50], [0,0,100], [0,0,150], [0,0,200] ]
])
B = A[4:5,:,: ] # The fifth row
print(B)
C = A[:,1:2,:] # The second column
print(C)
D = A[:3,:2,:] # The pixels in the first three rows and first two columns
print(D)
B = A[4:5,:,:]
D = A[:3,:2,:]
Let’s apply slicing to creating a cropped image:
In slicing_example2.py
change the cropping so that the Washington Monument shows up centered in your cropped image, with little else around it.
Suppose we want to write a function that computes both the square and cube of a number. One option is to write two separate functions:
We can alternatively write one function that computes and returns two things:
def square_cube(x):
square = x*x
cube = square*x
return (square, cube)
x = 5
(y, z) = square_cube(x)
print(x, y, z)
return (square, cube)
.(y, z) = square_cube(x)
We can go beyond a pair to any number of such “grouped” variables:
def powers(x):
square = x*x
cube = square*x
fourth = cube*x
fifth = fourth*x
return (square, cube, fourth, fifth) # returns four grouped variables
x = 5
(a, b, c, d) = powers(x) # expects four grouped variables from the function
print(x, a, b, c, d)
Such a grouping of variables is called a tuple.
Tuples are similar to lists in many ways, but different in one crucial aspect: tuples are immutable.
First, let’s examine how to write the same square_cube()
function above but using lists:
def square_cube_list(x):
square = x*x
cube = square*x
return [square, cube]
x = 5
[y, z] = square_cube_list(x)
print(x, y, z)
This works just fine. Another way in which a tuple is like a list is in using square-brackets and position indices to access individual elements, as in:
# List version:
L = square_cube_list(x)
print(L[0], L[1]) # L[0] has the square, L[1] has the cube
# Tuple version:
t = square_cube(x)
print(t[0], t[1]) # t[0] has the square, t[1] has the cube
However, here’s the difference:
# List version:
L = square_cube_list(x)
L[0] = 0 # This is allowed
# Tuple version:
t = square_cube(x)
t[0] = 0 # This is NOT allowed
Thus, you can replace a list element but you cannot replace a tuple element. This is in fact a bit subtle, as this example shows:
Once the tuple is instantiated (that’s the technical term for “made”) then the tuple’s value cannot be changed. You can of course assign a different tuple value to a tuple variable as in:
Here, we’re simply replacing one fixed-value tuple with another. Hence, tuples are immutable (so are strings).
Why use tuples at all? It’s to allow programmers to signal clearly that their tuples shouldn’t be changed. This turns out to be convenient for mathematical tuples (like points on a graph), which are similar.
Groups of tuples can be combined into lists and other data structures. It’s very useful in working with points (the mathematical point you draw with coordinates) and other mathematical structures that need more than one number to describe.
For example, here’s a program that, given a list of points, finds the leftmost point (the one with the smallest x value).
def leftmost(L):
leftmost_guess = L[0]
for q in L:
if q[0] < leftmost_guess[0]:
leftmost_guess = q
return leftmost_guess
list_of_points = [(3,4), (1,2), (3,2), (4,3), (5,6)]
(x,y) = leftmost(list_of_points)
print('leftmost:', (x,y) )
leftmost: (1, 2)
Trace through several iterations of the above, then type in tuple_example.py
to confirm the trace.
Consider the following:
import math
def distance(p, q):
return math.sqrt( (p[0]-q[0])**2 + (p[1]-q[1])**2 )
def find_closest_point(p, L):
# Write your code here to find the closest point in L to p
# and to return both the point and the distance to it from p.
list_of_points = [(3,4), (1,2), (3,2), (4,3), (5,6)]
query_point = (5,4)
(c, d) = find_closest_point(query_point, list_of_points)
print('closest:',c,' at distance', d)
# Should print:
# closest: (4, 3) at distance 1.4142135623730951
In tuple_example2.py
, fill in the missing code to find the closest point in a list of points to a given query point. Return both the closest point and the distance to the query point as a tuple.
Consider this problem: We have a data file that looks like this:
This might represent, for example, a record of sales at a fruit stand. We’d like to count how many of each fruit. One way would be to define a counter for each kind:
num_apples = 0
num_bananas = 0
num_pears = 0
num_kiwis = 0
num_oranges = 0
with open('fruits.txt','r') as data_file:
line = data_file.readline()
while line != '':
fruit = line.strip()
if fruit == 'apple':
num_apples += 1
elif fruit == 'banana':
num_bananas += 1
elif fruit == 'pear':
num_pears += 1
elif fruit == 'kiwi':
num_kiwis += 1
elif fruit == 'orange':
num_oranges += 1
else:
print('unknown fruit:', fruit)
line = data_file.readline()
print('# apples:', num_apples)
print('# bananas:', num_bananas)
print('# pears:', num_pears)
print('# kiwi:', num_kiwis)
print('# oranges:', num_oranges)
Type up the above in fruits.py
and use the data file fruits.txt to confirm. Next, in fruits2.py
, change the program to accommodate the additional fruits in fruits2.txt.
Aside from being tedious, this approach has other issues:
- One would like to be able to write a general program that does not need to know which fruits are in a file. - What if there were a thousand different kinds of items? - A single mistake in a variable can cause the counts to be wrong.
Fortunately, the use of dictionaries will make it easy:
# Make an empty dictionary
counters = dict()
with open('fruits.txt','r') as data_file:
line = data_file.readline()
while line != '':
fruit = line.strip()
if fruit in counters.keys():
# If we've seen the fruit before, increment.
counters[fruit] += 1
else:
# If this is the first time, set the counter to 1
counters[fruit] = 1
line = data_file.readline()
print(counters)
Type up the above in fruits3.py
and first apply it to fruits.txt and then to fruits2.txt.
In this case, we’re associating
d = {'apple': 3, 'banana': 3, 'pear': 1, 'kiwi': 2, 'orange': 4}
print(d['apple']) # Prints 3
d['banana'] = 0
# Which changes the value associated with 'banana' to 3
The above is an example of a dictionary that’s already built (after we’ve processed the data). To process data on-the-fly, we need an additional operation that an English dictionary does not really have: we need to be able to add something that’s not already there. To add a new key, we simply use it as an index:
With this understanding we can now revisit the code in the fruit example.
# Make an empty dictionary
counters = dict()
with open('fruits.txt','r') as data_file:
line = data_file.readline()
while line != '':
fruit = line.strip()
if fruit in counters.keys():
# If we've seen the fruit before, increment.
counters[fruit] += 1
else:
# If this is the first time, set the counter to 1
counters[fruit] = 1
line = data_file.readline()
print(counters)
if fruit in counters.keys():
checks to see if we have seen the fruit before by seeing if value of fruit
is present as a key in the dictionary
counters[fruit] += 1
increments the count for that fruit by 1else
statement), counters[fruit] = 1
adds a new key to the dictionary for that fruit, with value equal to 1 (since it has been seen for the first time)In stopwords2.py
, write code that uses a dictionary to compare the relative occurrence of stopwords in alice.txt and darwin.txt. These texts are Lewis Carroll’s Alice In Wonderland and Charles Darwin’s On the Origin of Species (the complete work for both cases). You will also need wordtool.py and wordsWithPOS.txt.
In particular, we’d like to know: what percentage of the stopword occurence can be attributed to ‘the’, what percentage to ‘and’, and so on.
Here’s some starter code:
import wordtool as wt
# The 25 most common stopwords
stopwords = {'the': 0,'be': 0,'to': 0,'of': 0,'and': 0,'a': 0,'in': 0,'that': 0,
'have': 0,'I': 0,'it': 0,'for': 0,'not': 0,'on': 0,'with': 0,'he': 0,
'she': 0,'you': 0,'do': 0,'as': 0,'at': 0,'his': 0,'her': 0,'they': 0,
'by': 0}
wt.open_file_byword('alice.txt')
s = wt.next_word()
'''
your code goes here
update the values of the dictionary based
on the occurrences of the stopwords in the text
'''
for key in stopwords.keys():
print('Stopword "' + key + '" is ' + str(stopwords[key]) + '% of stopwords')
For alice.txt
the output should look like this (not necessarily in the same order) using the string formatting from the previous section.
Compare the outputs for the two texts: do the stopwords occur with similar relative frequencies across the two texts?
Unit 2 will lightly sketch a few advanced topics to introduce ideas and show some examples, without expecting mastery of all the details.
You might ask: what’s left in Python to learn?
Quite a lot. - Like many modern programming languages, Python is large enough that one needs a few courses to experience all of it. - Some concepts are advanced enough to need weeks to cover (example: objects). - Others need a background in data structures to understand how they work (example: dictionaries). - Yet others involve library functions and external packages.
Do you need to learn more? Is what you’ve learned enough to put Python to work? We’ll have more to say about this shortly.
Full credit is 100 pts. There is no extra credit.
Write a function third_shortest
that takes as input a string and returns the third-shortest word length. Words in the string will be separated by spaces. Do not include punctuation.
As an example, third_shortest('You should not judge, you should understand.')
returns 6
. (Note that by removing punctuation, judge
is counted as length 5.)
Hint: Use The np.min
function, the str.split
function, and dicts.
Submit as third_shortest.py
.
Write a function binner
that takes as input a list of numbers and:
For instance, [1, 1, 1, 6]
would have these five bins: 1-2, 2-3, 3-4, 4-5, 5-6. There are three numbers in the first bin, none in the second, third, and fourth, and one number in the last. (If a number falls exactly between two bins, it can go in either of these two, but not both.)
So: binner([1, 1, 1, 6])
would return: {'1.0-2.0': 3, '2.0-3.0': 0, '3.0-4.0': 0, '4.0-5.0': 0, '5.0-6.0': 1}
.
You can call the keys for the bins whatever you like, as long as the values are correct.
Submit as binner.py
.