Week 7: Strings

Reading: Think Python Chapter 8

Notes

You have used strings for some basic operations: creating them, concatenating two strings with +, and printing them. Strings in Python are substantially more powerful data structures.

Strings As Sequences

  • The shortest possible string is the empty string: ''
  • The next-shortest possible string is a single character
    • 'a' is a character
    • 'A' is a different character
    • ' ' (space) is also a character
  • Longer strings consist of multiple characters
  • We can get the length of a string by calling the built-in function len on the string:
len("")
0
len(' ')
1
len("same")
4

Indexing

We can also retrieve individual characters from a string, based on the position of the character.

Counting From 0

Counting in Python starts at 0. Just as the range function starts at 0 by default, the first character of a string is at position 0.

"same"[0]
's'
"same"[1]
'a'

Pay attention to the syntax: the string, followed by an integer in square brackets. The square brackets indicate we are indexing the string; the integer is the position.

s = "meals"
s[2]
'a'

Note the variable assignment and indexing on the variable name.

b = "fast"
c = b[2]
print(c)
s

Strings can also be sliced: rather than retrieving a single character, a substring is retrieved. Slicing uses two or three integers: a start position, a stop-before position, and a spacing. This is very similar to range.

"thoughtful"[0:7]
'thought'
'kilogram'[2:5]
'log'
"tadpole"[2:8:2]
'doe'

Master the basic slicing syntax first.

We will use a simple list for these examples:

x = [1, 2, 3, 4, 5, 6, 7, 8]
  • A slice can leave a value to one side of the colon blank, such as x[1:] or x[:5]
    • Leaving a value blank implies the first value if the left side is blank
    • x[:5] yields [1, 2, 3, 4, 5]
    • It implies the last value if the right side is blank
    • x[2:] yields [3, 4, 5, 6, 7, 8]
  • Using a negative index in a slice will index from the end.
    • -1 is the last element, -2 is the second-to-last, etc.
    • This can be used for single values: x[-2] yields 7
    • x[2:-2] yields [3, 4, 5, 6] (the second-to-last element, 7, is not included)
    • x[-5:] yields [4, 5, 6, 7, 8]
  • Using a second colon slices using an interval greater than one:
    • x[1:5:2] yields [2,4]
    • Negative values index from the end of the list, backwards
    • Going backwards includes the starting value and doesn’t include the stopping value.
    • x[3:0:-1] yields [4, 3, 2]
    • x[1:5:-1] yields [] - an empty list - going backwards from 1 already starts out after the stopping point!
    • x[::-1] yields [8, 7, 6, 5, 4, 3, 2, 1]
    • x[::] yields [1, 2, 3, 4, 5, 6, 7, 8] - but it’s faster to just use x 🙃

Expressions

Expressions involving strings conform to the same rules we have seen for expressions: they are components that can be used to compose larger expressions and assign values to variables.

  • len returns an integer and can be used wherever integers are used
s = 'parhelion'
s[len(s)-4:len(s)]
'lion'
  • A string slice is an expression that results in a string
s[3:6] + s[5] + s[7]
'hello'

Methods

Strings contain built-in functions associated with the string. These associated functions are called methods.

An example is the <str>.upper() method. Note the syntax: <str> indicates the method is available for any string:

'maybe'.upper()
'MAYBE'

<str>.lower() is similar:

a_string = "Maybe?"
a_string.lower()
'maybe?'

<str>.find() takes an argument:

a_string.find('e')
4

Like everything else we’ve seen, these methods can compose expressions. What’s happening here?

a_string[a_string.find('e') + 1]
'?'

Iterating Through Strings

  • We have learned how to index individual characters of a string using positions
    • Those positions are integers
  • We have learned how to iterate over integers with for and while loops

We can combine these techniques to iterate over characters in a string:

All of the syntax here is syntax you have already learned!


We could also do this with a while loop:

What is shown above iterates the loop through values and is often called “value iteration.”

It’s also possible to iterate directly over the characters in a string. This can only be done with a for loop, and is called “content iteration” (because it iterates through the string’s content).

The syntax is very simple:

for j in s:

Mutability

Strings in Python cannot be changed. They are immutable.

We have seen string concatenation:

s = "allow"
s[0] + "new"
'anew'

The result of concatenation is a new string.

If we were to try to reassign an individual character of a string, we would get an error:

s[0] = "e"

Instead of changing a string, we can compose a new string using characters from the old string:

e = "mole"
e[0:len(e)-1] + 'ar'
'molar'

in

The in keyword between two strings will return True or False depending on whether the first string is in the second string:

'I' in 'team'
False
'u' in 'truth'
True

This is a very different use of in than you have seen in for loops. Trace through this example to see the difference. Note how the conditional is nested inside the loop!

Practice

Practice Problem 7.1

Practice Problem 7.1

Write a function one_star that takes one argument, a string.

Your function should return a new string, identical to the argument but with the second character changed to the '*' character.

  • starred_out('vote') returns 'v*te'
  • starred_out('12345') returns '1*345'

Practice Problem 7.2

Practice Problem 7.2

Write a function many_stars that takes one argument, a string.

Your function should return a new string, identical to the argument but with every character except the first and last changed to the '*' character.

  • many_stars('vote') returns 'v**e'
  • many_stars('12345') returns '1***5'

Practice Problem 7.3

Practice Problem 7.3

Write a function shorter_string that takes two arguments, both strings, and returns whichever string is shorter, or the first string if both are the same length.

  • shorter_string('too much', 'many') returns 'many'
  • shorter_string('same', 'okay') returns 'same'
  • shorter_string('vowel', 'consonant') returns 'vowel'

Practice Problem 7.4

Practice Problem 7.4

Write a function shared_character that takes two arguments, both strings, and returns True if the strings have at least one character in common, or False if they have no characters in common.

  • shared_character('true', 'story') returns True
  • shared_character('no', 'dice') returns False
  • shared_character('blue', 'lagoon') returns True
  • shared_character('yes?', 'how?') returns True

Homework

  • Homework problems should always be your individual work. Please review the collaboration policy and ask the course staff if you have questions.

  • Double check your file names and printed output. These need to be exact matches for you to get credit.

Homework Problem 7.1

Homework Problem 7.1 (25 pts)

Write a function shared_stem that takes two arguments, both strings, and checks if the beginning of the strings are identical. Whatever portion is identical is returned, stopping as soon as any characters are not identical.

  • shared_stem('look', 'lot') returns 'lo'
  • shared_stem('meeting', 'memory') returns 'me'
  • shared_stem('late', 'later') returns 'late'
  • shared_stem('poor', 'Yorick') returns '' (an empty string)

Hint: One approach involves iterating over the length of the shorter string, checking both strings to see if the characters are identical, continuing until a non-identical character is found.

Submit as shared_stem.py.

Homework Problem 7.2

Homework Problem 7.2 (25 pts)

Write a function end_swap that takes one argument, a string, and returns a new string with the first and last characters exchanged.

  • end_swap('made') returns 'eadm'
  • end_swap('valid') returns 'daliv'
  • end_swap('handle') returns 'eandlh'

Submit as end_swap.py.

Homework Problem 7.3

Homework Problem 7.3 (25 pts)

Write a function alpha_cat that takes two arguments, both strings, and returns a single string consisting of the two input strings concatenated in alphabetical order, separated by spaces. The input strings will always consist of only lowercase letters.

  • alpha_cat('two', 'three') returns 'three two'

Hint: The comparison operators > and < between strings check for alphabetical order for lowercase letters.

Submit as alpha_cat.py.

Homework Problem 7.4

Homework Problem 7.4 (25 pts)

Write a function before_comma that takes one argument, a string. If the string contains any commas, return the portion of the string prior to the comma (or if there are multiple commas, before the first comma). If the string contains no commas, simply return the string.

  • before_comma("okay, sure") returns 'okay'
  • before_comma("Thanks, I think.") returns 'Thanks'
  • before_comma("no commas here") returns 'no commas here'

Hint: You can check for equality between two strings with ==.

Submit as before_comma.py.