Note 4: Strings and Lists

Reading: Think Python Chapters 8, 10

Strings

You have used strings for some basic operations: creating them, concatenating two strings with +, and printing them. Strings in Python are substantially more powerful data structures.

Strings As Sequences

  • The shortest possible string is the empty string: ''
  • The next-shortest possible string is a single character
    • 'a' is a character
    • 'A' is a different character
    • ' ' (space) is also a character
  • Longer strings consist of multiple characters
  • We can get the length of a string by calling the built-in function len on the string:
len("")
0
len(' ')
1
len("same")
4

Indexing

We can also retrieve individual characters from a string, based on the position of the character.

Counting From 0

Counting in Python starts at 0. Just as the range function starts at 0 by default, the first character of a string is at position 0.

"same"[0]
's'
"same"[1]
'a'

Pay attention to the syntax: the string, followed by an integer in square brackets. The square brackets indicate we are indexing the string; the integer is the position.

s = "meals"
s[2]
'a'

Note the variable assignment and indexing on the variable name.

b = "fast"
c = b[2]
print(c)
s

Strings can also be sliced: rather than retrieving a single character, a substring is retrieved. Slicing uses two or three integers: a start position, a stop-before position, and a spacing. This is very similar to range.

"thoughtful"[0:7]
'thought'
'kilogram'[2:5]
'log'
"tadpole"[2:8:2]
'doe'

Master the basic slicing syntax first.

We will use a simple list for these examples:

x = [1, 2, 3, 4, 5, 6, 7, 8]
  • A slice can leave a value to one side of the colon blank, such as x[1:] or x[:5]
    • Leaving a value blank implies the first value if the left side is blank
    • x[:5] yields [1, 2, 3, 4, 5]
    • It implies the last value if the right side is blank
    • x[2:] yields [3, 4, 5, 6, 7, 8]
  • Using a negative index in a slice will index from the end.
    • -1 is the last element, -2 is the second-to-last, etc.
    • This can be used for single values: x[-2] yields 7
    • x[2:-2] yields [3, 4, 5, 6] (the second-to-last element, 7, is not included)
    • x[-5:] yields [4, 5, 6, 7, 8]
  • Using a second colon slices using an interval greater than one:
    • x[1:5:2] yields [2,4]
    • Negative values index from the end of the list, backwards
    • Going backwards includes the starting value and doesn’t include the stopping value.
    • x[3:0:-1] yields [4, 3, 2]
    • x[1:5:-1] yields [] - an empty list - going backwards from 1 already starts out after the stopping point!
    • x[::-1] yields [8, 7, 6, 5, 4, 3, 2, 1]
    • x[::] yields [1, 2, 3, 4, 5, 6, 7, 8] - but it’s faster to just use x 🙃

Expressions

Expressions involving strings conform to the same rules we have seen for expressions: they are components that can be used to compose larger expressions and assign values to variables.

  • len returns an integer and can be used wherever integers are used
s = 'parhelion'
s[len(s)-4:len(s)]
'lion'
  • A string slice is an expression that results in a string
s[3:6] + s[5] + s[7]
'hello'

Methods

Strings contain built-in functions associated with the string. These associated functions are called methods.

An example is the <str>.upper() method. Note the syntax: <str> indicates the method is available for any string:

'maybe'.upper()
'MAYBE'

<str>.lower() is similar:

a_string = "Maybe?"
a_string.lower()
'maybe?'

<str>.find() takes an argument:

a_string.find('e')
4

Like everything else we’ve seen, these methods can compose expressions. What’s happening here?

a_string[a_string.find('e') + 1]
'?'

Iterating Through Strings

  • We have learned how to index individual characters of a string using positions
    • Those positions are integers
  • We have learned how to iterate over integers with for and while loops

We can combine these techniques to iterate over characters in a string:

All of the syntax here is syntax you have already learned!


We could also do this with a while loop:

What is shown above iterates the loop through values and is often called “value iteration.”

It’s also possible to iterate directly over the characters in a string. This can only be done with a for loop, and is called “content iteration” (because it iterates through the string’s content).

The syntax is very simple:

for j in s:

Mutability

Strings in Python cannot be changed. They are immutable.

We have seen string concatenation:

s = "allow"
s[0] + "new"
'anew'

The result of concatenation is a new string.

If we were to try to reassign an individual character of a string, we would get an error:

s[0] = "e"

Instead of changing a string, we can compose a new string using characters from the old string:

e = "mole"
e[0:len(e)-1] + 'ar'
'molar'

in

The in keyword between two strings will return True or False depending on whether the first string is in the second string:

'I' in 'team'
False
'u' in 'truth'
True

This is a very different use of in than you have seen in for loops. Trace through this example to see the difference. Note how the conditional is nested inside the loop!

Bin, Oct, Hex

Because computers represent all numbers in memory as binary, it’s common to work with numbers in binary format (base 2), as well as octal (base 8) and hexadecimal (base 16) formats:

  • Binary uses digits 0 and 1
  • Octal uses 0, 1, 2, 3, 4, 5, 6, and 7
  • Hex uses 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f (‘a’ is 10 in base 10, ‘f’ is 15 in base 10, etc.)

Binary (bin), octal (oct), and hexadecimal (hex) are often distinguished from decimal (base 10) with a prefix:

  • 0b is used for binary
  • 0o is used for octal
  • 0x is used for hex

Python can convert from base-10 int to these other formats using bin, oct, and hex functions. The result is a string that includes the prefix:

x = bin(25)
print(x)
print(type(x))
0b11001
<class 'str'>
y = hex(421)
print(y)
print(type(y))
0x1a5
<class 'str'>

To convert back, use the int function with as second parameter: the base:

z = int('0b11011', 2)
print(z)
27
b = int('0x1beef2beef', 16)
print(b)
119973002991

Lists

  • Lists are sequences.
  • Lists can contain a mix of different types
  • Much of what you learned about strings as sequences is also true of lists.
  • Lists are delimited with square brackets:
L = [2, 4, 'resonant', -1.2]
  • Lists are indexed with square brackets, identical to string indexing:
L[2]
'resonant'
L[1:3]
[4, 'resonant']
  • Unlike strings, lists are mutable. Values can be assigned in to lists:
L[1] = 'dampened'
L
[2, 'dampened', 'resonant', -1.2]

The len function also works for lists:

len(L)
4

Operators, Functions, Methods

Operators work with lists somewhat similarly to how they work with strings:

  • + between two lists concatenates the lists and returns a new list
[4, "3"] + [2, 1]
[4, '3', 2, 1]
  • A list can be “multiplied” with an integer using *
[0] * 6
[0, 0, 0, 0, 0, 0]

Lists have built-in methods, similar to strings. One of the most useful of these is <list>.append()

fruit = ['apple', 'pear', 'mango']
fruit.append('tomato')
fruit
['apple', 'pear', 'mango', 'tomato']

<list>.sort() will sort numeric values as you would expect:

numerical_sort,py
number_list = [1, 5, 6, 10, 2]
number_list.sort()
print(number_list)
[1, 2, 5, 6, 10]

It sorts strings and characters a bit differently:

alpha_sort,py
alpha_list = ['apple','Banana', 'Pear', 'mango', ' CHICKEN']
alpha_list.sort()
print(alpha_list)
[' CHICKEN', 'Banana', 'Pear', 'apple', 'mango']

What happened? In Python, every character is associated with an integer. This mapping is known as ASCII. Functions ord() and chr() convert between characters and numbers.

ord('A')
65
ord('b')
98
ord('@')
64
chr(99)
'c'

Comparing and sorting strings/characters uses these values. You will never need to memorize them.

Iterating

Iterating through lists is very similar to iterating through strings:


There are some differences, largely because lists are mutable:

Why was no 8 printed?


Very importantly when you iterate directly through the elements of a list with content iteration, the looping variable is a temporary copy of the list element. Changing that temporary variable does not change the list!

On the other hand, using value iteration, you can directly access and change list elements.

References

When a ‘primitive’ type is assigned to a variable, the value is assigned directly to the variable. When a list is assigned to a variable, the variable becomes a reference to the list. If two variables are assigned to the same list, changing one variable effectively changes the other- since you are changing the list that both variables reference.

Here’s an example:

You can check if two variables reference the same list with the is keyword:

a = [1, 2]
b = [1, 2]
c = a
a == b
True
a is b
False
a is c
True

The == operator between lists checks if the contents of two lists are the same, the is keyword checks if two variables reference the same list. This a subtle, but important difference.

Lists and Function

Similarly, passing a list as an argument to a function passes a reference to this list. If your function modifies the referenced list inside the function, the list is modified outside the function as well.

Note how the add_to_list function modified the list without returning anything.

Practice

Practice Problem 4.1

Practice Problem 4.1

Write a function one_star that takes one argument, a string.

Your function should return a new string, identical to the argument but with the second character changed to the '*' character.

  • starred_out('vote') returns 'v*te'
  • starred_out('12345') returns '1*345'

Practice Problem 4.2

Practice Problem 4.2

Write a function many_stars that takes one argument, a string.

Your function should return a new string, identical to the argument but with every character except the first and last changed to the '*' character.

  • many_stars('vote') returns 'v**e'
  • many_stars('12345') returns '1***5'

Practice Problem 4.3

Practice Problem 4.3

Write a function shorter_string that takes two arguments, both strings, and returns whichever string is shorter, or the first string if both are the same length.

  • shorter_string('too much', 'many') returns 'many'
  • shorter_string('same', 'okay') returns 'same'
  • shorter_string('vowel', 'consonant') returns 'vowel'

Practice Problem 4.4

Practice Problem 4.4

Write a function shared_character that takes two arguments, both strings, and returns True if the strings have at least one character in common, or False if they have no characters in common.

  • shared_character('true', 'story') returns True
  • shared_character('no', 'dice') returns False
  • shared_character('blue', 'lagoon') returns True
  • shared_character('yes?', 'how?') returns True

Practice Problem 4.5

Practice Problem 4.5

Write a function appendix that takes as argument a list, appends integer 9 to the list, and returns the list.

  • appendix(['cat', 'dog']) should return ['cat', 'dog', 9]

Practice Problem 4.6

Practice Problem 4.6

Write a function round_floats that takes as argument a list of floats. The function should create a new list, consisting of the elements of the argument list, rounded to the nearest integer. Return the new list.

  • round_floats([1.2, 1.7, 3.1]) should return [1, 2, 3]
  • round_floats([-1.2, 2.4, 2.5]) should return [-1, 2, 3]

Practice Problem 4.7

Practice Problem 4.7

Write a function round_only_floats that takes as argument a list. The function should create a new list, consisting of the elements of the argument list, with any floats rounded to the nearest integer, and other elements (which are not floats) unchanged. Return the new list.

  • round_only_floats([1.2, 1.7, 3.1]) should return [1, 2, 3]
  • round_only_floats([-1.2, 'clouds', 2.5]) should return [-1, 'clouds', 3]
  • round_only_floats(['mice', 'rats']) should return ['mice', 'rats']

Practice Problem 4.8

Practice Problem 4.8

Write a function positives that takes as argument a list of ints. The function should create a new list, consisting of the positive elements of the argument list, in the same order. Return the list.

  • positives([3, -1, 2]) should return [3, 2]
  • positives([-4, 5, -2, 5]) should return [5, 5]
  • positives([-1, -8]) should return [] (an empty list)

Practice Problem 4.9

Practice Problem 4.9

Write a function total_length that takes as argument a list of strings, and returns the total length of all the strings combined.

  • total_length(['cat', 'dog']) should return 6
  • total_length(['house', 'shed']) should return 9
  • total_length(['3', '2']) should return 2

Homework

  • Homework problems should always be your individual work. Please review the collaboration policy and ask the course staff if you have questions. Remember: Put comments at the start of each problem to tell us how you worked on it.

  • Double check your file names and printed output. These need to be exact matches for you to get credit.

  • For this homework, don’t use any built-in functions that find maximum, find minimum, or sort. Also don’t use built-in string methods split, replace, or find.

Homework Problem 4.1

Homework Problem 4.1 (25 pts)

Write a function shared_stem that takes two arguments, both strings, and checks if the beginning of the strings are identical. Whatever portion is identical is returned, stopping as soon as any characters are not identical.

  • shared_stem('look', 'lot') returns 'lo'
  • shared_stem('meeting', 'memory') returns 'me'
  • shared_stem('late', 'later') returns 'late'
  • shared_stem('poor', 'Yorick') returns '' (an empty string)

Hint: One approach involves iterating over the length of the shorter string, checking both strings to see if the characters are identical, continuing until a non-identical character is found.

Submit as shared_stem.py.

Homework Problem 4.2

Homework Problem 4.2 (25 pts)

Write a function remove_vowels that takes one argument, a string, and returns a new string with all vowels (lowercase or uppercase: a, e, i, o, u, A, E, I, O, U) removed.

  • remove_vowels('made') returns 'md'
  • remove_vowels('valid') returns 'vld'
  • remove_vowels('HANDLE') returns 'HNDL'
  • remove_vowels('hymn') returns 'hymn'

Submit as remove_vowels.py.

Homework Problem 4.3

Homework Problem 4.3 (25 pts)

Write a function alpha_cat that takes two arguments, both strings, and returns a single string consisting of the two input strings concatenated in alphabetical order, separated by spaces. The input strings will always consist of only lowercase letters.

  • alpha_cat('two', 'three') returns 'three two'
  • alpha_cat('aa', 'aaaa') returns 'aa aaaa'

Hint: The comparison operators > and < between strings check for alphabetical order for lowercase letters.

Submit as alpha_cat.py.

Homework Problem 4.4

Homework Problem 4.4 (25 pts)

Write a function before_comma that takes one argument, a string. If the string contains any commas, return the portion of the string prior to the comma (or if there are multiple commas, before the first comma). If the string contains no commas, simply return the string.

  • before_comma("okay, sure") returns 'okay'
  • before_comma("Thanks, I think.") returns 'Thanks'
  • before_comma("no commas here") returns 'no commas here'

Hint: You can check for equality between two strings with ==.

Submit as before_comma.py.

Homework Problem 4.5

Homework Problem 4.5 (20 pts)

Write a function list_reverse that takes as argument a list, and returns a list with the same elements, in reversed order.

  • list_reverse([2, 3, 5]) returns [5, 3, 2]
  • list_reverse(["maple", "cherry", "ash", "oak"]) returns ["oak", "ash", "cherry", "maple"]
  • list_reverse([]) returns [] (an empty list)

Submit as list_reverse.py.

Homework Problem 4.6

Homework Problem 4.6 (20 pts)

Write a function filter_less that takes as argument (1) a list of integers and (2) an integer and returns a list containing only the integers from the input list that are less than or equal to the provided integer. The order of elements in the returned list should be the same as their order in the input list.

  • filter_less([0, 2, 4, 6], 6) returns [0, 2, 4, 6]
  • filter_less([1, 2, 3, 4, 5], 1) returns [1]
  • filter_less([700], 699) returns []

Submit as filter_less.py.

Homework Problem 4.7

Homework Problem 4.7 (20 pts)

Write a function numbers_first that takes as argument a list of numbers and/or strings, and returns a list with the same contents, but all numbers come before all strings. Original order should otherwise be preserved.

(“Numbers” here are either ints or floats.)

  • numbers_first([1, 3.0, 'soap', 0]) returns [1, 3.0, 0, 'soap']
  • numbers_first(['brooms', 4, 'towels']) returns [4, 'brooms', 'towels']
  • numbers_first([15, 2, 'sponge']) returns [15, 2, 'sponge']
  • numbers_first([1, 2, 0]) returns [1, 2, 0]
  • numbers_first(['three', 'two']) returns ['three', 'two']

Submit as numbers_first.py.

Homework Problem 4.8

Homework Problem 4.8 (20 pts)

Write a function zip_finder that takes as argument a list. Exactly one element of the list will be a zip code: a positive five-digit integer. Return that integer.

Type hint: the return value should be an integer

  • zip_finder([1, 4, 'Alan', 20052]) returns 20052
  • zip_finder([False, 20003, -5, "DC"]) returns 20003
  • zip_finder(["Carl", 854.22, 10001.2, -2000, 86004]) returns 86004

Submit as zip_finder.py.

Homework Problem 4.9

Homework Problem 4.9 (20 pts)

Write a function flip_smallest that takes as argument a list of strings. Find the shortest of these strings and reverse it.

Your function should modify the argument list in place and does not need to return anything. You can assume there will only be one “shortest” string in each list, and that the shortest string will be at least two characters in length.

Examples:

a = ['easy', 'to', 'say']
flip_smallest(a)
print(a)
['easy', 'ot', 'say']
b = ['contains', 'plastic', 'parts']
flip_smallest(b)
print(b)
['contains', 'plastic', 'strap']
c = ['some', 'kinds', 'are', 'easier', 'than', 'others']
flip_smallest(c)
print(c)
['some', 'kinds', 'era', 'easier', 'than', 'others']

Submit as flip_smallest.py.

  • Find the shortest string:
    • Visit each element in the list
    • Use the len() function to check lengths
    • Use a variable to remember the shortest length you have found
    • Use a second variable to remember the index associated with that length
    • Update both variables if you find a new shorter length
  • Now that you know where the string is (the index you found):
    • Create a new string, the reverse of the original
    • Assign it into the list in the correct position