Module 0.6 - Strings and Characters

Objectives

By the end of this module, for simple “Hello World”-like programs, you will be able to:

  • Write simple code that works with strings and characters (letters, digits, symbols like $)
  • Identify some syntax errors related to characters and strings.
  • Have some stress relieving fun after that long integer module.

0.6.0 - Strings

About strings:

We have already seen examples of strings, as in:

print('Hello')

Here, whatever is in between the quotes is treated as one thing : a sequence of letters, digits or symbols. Here are examples with digits, symbols and spaces:

print('What is your email address?')
print('Mine is omar.1998@gmail.com')
  • The entire sequence of letters, digits, punctuation etc from the W in What to the ? in address is one string.
  • Strings can contain letters, numbers, and special characters.

Just like integer values can be placed in variables, we can do the same with strings: Example:

s = 'The quick brown fox jumps over the lazy dog'
print(s)
  • Here, the variable s has the string The quick brown fox jumps over the lazy dog
  • If you’re wondering how a variable (we used a box as an analogy) can fit so many letters, that is a somewhat advanced topic.
  • For now, let’s proceed with the notion that we can do this.

Consider this example:

# Make a string and print it:
s = 'The quick brown fox jumps over the lazy dog'
print(s)

# Extract the length of the string and print that:
k = len(s)
print(k)

Here, we are using a function called lento extract the length of a string. The function len is like print in one respect: there is something that goes in between the parentheses:

how the len function works

But it is different in another respect: something comes out of the function after it runs (is returned) and, in this case, gets placed into the variable k.

  • We will later look further into how we can write our own functions that have this property of “making something and giving the result” to a variable.

0.6.1 - String Concatenation

Strings would be of limited use if there were no way of combining them, just as integers would be if there were no way of performing arithmetic. The joining of two strings end to end is called concatenation .

Consider this program:

x = 'The'
y = 'quick'
z = 'brown'

s = x + y + z
print(s)

Exercise 0.6.1

Exercise 0.6.1

Type up the above in string_example2.py. What is the output? Change the program to add spaces between the words - try to do this only by changing line 5.

About concatenation:

-The same + that we used for integer addition is what’s used to concatenate strings. = Multiple-usage of symbols in a programming language is common: we’ll see other examples of a single symbol or function serving multiple purposes. - How come Python doesn’t get confused and think that x, y, z are integers wanting to be added? - Python is smart about context, and understands that when + is used with strings, the only reasonable thing to do is to concatenate. - Likewise, with numbers, Python will add them. You may have noticed the words all strung together without a space. So let’s add the spaces:

x = 'Sphinx'
y = 'of'
z = 'black'

s = x + ' ' + y + ' ' + z + ' quartz, judge my vow'
print(s)

print(len(s))

Notice how multiple strings, some from variables, and some just written into the statement, are concatenated:

string concatenation

Exercise 0.6.2

Exercise 0.6.2

Type up the above in string_example3.py. Change Line 5 so that the sentence ends with a period - the length should increase by 1.

We also introduced something new:

print(len(s))

Here’s how to read this line:

  • First look at print and notice that there’s something between the parentheses: print( len(s) )
  • Think to yourself: something is going to be given to print to get printed.
  • Now look at what’s going to print : len(s)
  • Here we see that the length of the string s is being computed. What you should be thinking is:
  • First, the length will get computed.
  • Second, the result (36, in this case) will be sent to print.
  • print then prints it to the output, which is what we see.
  • One way to think about this is to use a term, nesting, that we’ve seen before:
    • Here, the function invocation to len is nested in the function invocation to print
    • The innermost in this case executes first.
    • In case you were wondering: yes, one can nest deeply with one function invocation inside another, inside another etc. But we won’t need that any time soon.

Often we want to concatenate strings with numbers, or other kinds of things:

For example: consider

k = 26
s = str(k)
t = 'A pangram must have all ' + s + ' letters'
print(t)

Here, the value in k is an integer. Prior to concatenation with a string, we first need to make a string out of the integer: s = str(k) We do this by sending the integer k to the str function, which builds a string version of the integer and gives that back. The string so computed is then placed into the variable s above. This string s gets concatenated with the other strings to produce the final result.

Exercise 0.6.3

Exercise 0.6.3

Type up the above in string_example4.py to confirm.

Let’s examine a small variation:

k = 26
# You can also build a string in the print statement itself:
print('A pangram must have all ' + str(k) + ' letters')

Exercise 0.6.4

Exercise 0.6.4

Type up the above in string_example5.py to see the resulting output.

0.6.2 - More String Concatenation

We will occasionally introduce programs we’ve written to both simplify your programming and yet allow for interesting examples.

If all we did was compute with integers, it would be boring. You’ve already seen one example of such a tool: drawtool. We’ll now use wordtool, another tool that you will use by calling appropriate functions. You are welcome and are encouraged to “look inside” by skimming over the code in any tool.

Let’s look at an example that will also introduce some new ideas:

import wordtool

# Invoke functions in wordtool to pick random words:
adj = wordtool.get_random_adjective()
noun = wordtool.get_random_noun()
noun2 = wordtool.get_random_noun()
verb = wordtool.get_random_verb()
prep = wordtool.get_random_preposition()

# Build a sentence with these random words:
sentence = (
    'The ' + adj + ' ' + noun + ' ' + verb + 's' +
    ' ' + prep + ' a ' + noun2)
print(sentence)
  • To run the above code, you must first download wordtool.py and wordsWithPOS.txt into the same folder as your program.
  • Now run the program several times.

Let’s point out: - wordtool.py is merely another Python program, like the ones you’ve been writing, just a bit more complex.
- wordsWithPOS.txt is plain text data (about English words, and parts-of-speech).
- wordtool.py is written to read the data and make some functionality available, one of which is to randomly pick words from amongst the nouns, adjectives, and so on. - To use functions in wordtool.py in your program, you need (and this is a new thing we’ve introduced) the import statement at the top of your program:

import wordtool

(Remember: the .py part is not in the import statement) Then, to use a function defined in that other file, we use syntax like this:

adj = wordtool.get_random_adjective()

Here, adj is a string variable that we made. The combination of wordtool, a period, and the desired function get_random_adjective(), is what’s needed to invoke that particular function. In this case, it results in a randomly selected adjective (from the thousands in the data) being copied into the adj variable. Similarly, after getting a random noun, verb, and so on, we put those together to make a sentence, perhaps with amusing results. We’ll point out one other new thing:

sentence = (
    'The ' + adj + ' ' + noun + ' ' + verb + 's' +
    ' ' + prep + ' a ' + noun2)
  • Here, we are concatenating many strings into a long one.
  • However, if it’s unwieldy to type them all in one line, we can spill over into multiple lines.
  • One way to do that is to use parentheses as shown above.

Exercise 0.6.5

Exercise 0.6.5

Open wordtool.py and examine the functions within. Then, make a longer random sentence in random_sentence2.py.

0.6.3 - Input From the Terminal

Thus far we have printed (output) to the screen but have not taken in any input.

Thus, we haven’t written any programs that interact with potential users of our programs. There’s a limited market for programs that only execute once with no input whatsoever, right? So, let’s do something more interesting by asking the user to type in a string:

import wordtool

# We will get the user to type their name:
name = input('Enter your name: ')

# We'll use that and make rudimentary conversation:
print('Hi ' + name + '!')

Exercise 0.6.6

Exercise 0.6.6

Try out the above program in random_conversation.py. Since you are writing this in the same folder, you won’t need to download wordtool.py. and its data.

Note: - The function input is exactly what it sounds like: get input (from the user typing). name = input (‘Enter your name:’) Here, there’s a string that goes into the input function. This string is displayed as a prompt in the output:

Enter your name:

Of course, you aren’t writing this tiny program to send to someone who will run your program and type in their name. You are playing both roles: programmer and intended user. Whatever the user types in (from the keyboard) becomes a single string that’s placed in the variable we’ve called name. Then, we’ve concatenated whatever gets into name with two other strings and printed the result: print('Hi ' + name + '!')

Next, let’s make it more interesting:

import wordtool

name = input('Enter your name: ')

print('Hi ' + name + '!')

adv = wordtool.get_random_adverb()
adj = wordtool.get_random_adjective()
sentence = name + ', you are ' + adv + ' ' + adj
print(sentence)

Exercise 0.6.7

Exercise 0.6.7

Improve on the above program by writing your version of a longer conversation random_conversation2.py. Allow the user to type in something at least three or four times. This will require using input multiple times to go back and forth with the user. Be creative.

0.6.4 - Strings and for Loops

Consider this program:

n = 6
s = ''

for j in range(1, n):
    s = s + '*'

print(s)

Note: We can initialize the value of s to the empty string '' (nothing between the quotes). The loop successively concatenates a string with an asterisk ('*') onto the gradually accumulative string s.

Try to trace the execution of this loop. Confirm the final output in your trace by running the program.

Next, let’s use a nested loop to output a triangle of asterisks:

n = 5
s = ''

for i in range(1, n+1):
    for j in range(0, i):
        s = s + '*'
    s = s + '\n'

print('A triangle with base=' + str(n))
print(s)

Exercise 0.6.8

Exercise 0.6.8

Before typing up the program, trace the execution of the loop, showing the contents of the string s in each iteration of the outer for loop. Then confirm the final output in your trace by typing it up in triangle.py. Use the tabular tracing approach as in the example from Module 3.

Next, we’ll make this more interactive:

s = ''

n_str = input('Enter triangle base size: ')
n = int(n_str)

for i in range(1, n+1):
    for j in range(0, i):
        s = s + '*'
    s = s + '\n'

print('A triangle with base=' + str(n))
print(s)

Exercise 0.6.9

Exercise 0.6.9

Type up the above in triangle2.py.

We’ve introduced some new concepts above. Since everything typed as input initially is made into a string, the actual input, even if it’s an integer, is a string. This is a little strange but it’s how Python works. Consider this example:

n = 42       # This is an integer
s = '42'     # This is a string

k = 5        # Integer
t = '5'      # String

Thus, when the user types in what they intend to be an integer, the input function makes a string out of it:

n_str = input('Enter triangle base size: ')

Here, the variable n_str will have a string. We need to convert that string version of an integer into an actual integer using the int function: n = int (n_str) Here the variable n will have the actual integer, which we can use in loops, in arithmetic, and so on. Notice that, since we want the loop to run n times, we’ve begun the outer loop at 1, running through to n (inclusive). This means using range(1, n+1) in the outer loop.

Exercise 0.6.10

Exercise 0.6.10

Trace (using a table) through what happens in the above program when the user enters 4.

Exercise 0.6.11

Exercise 0.6.11

What happens when the conversion from string to integer is not done? Find out by trying this:

n = input('Enter an integer: ')
k = 5 * n
print(k)

Fix the issue by converting the string that’s in n and making an integer. Type your code in conversion.py.

0.6.5 - The Useful Relationship Between Characters and Integers

Consider these three strings:

x = 'hello'
y = 'h'
z = '$'

Note: The strings in variables y and z are fundamentally different from the one in x in that the strings in y and z have only one letter (or symbol) in them. We call such a single-letter or single-symbol string a character .

There is a special relationship between characters and some integers: For example:

  • The character ‘a’ is sometimes represented by the integer 97.
  • The character ‘b’ is sometimes represented by the integer 98.
  • There are other examples. Think of this as a “secret code” maintained by Python, with that associates a number for every letter. The technical term for this “secret code” is ASCII . We often use the shorter term “char” instead of “character”.
  • So, every char has an ASCII code (pronounced “ask-ey”).

ASCII stands for American Standard Code for Information Interchange and was developed in the 1960s for telecommunications use. It has been updated several times.

Recently, another standard, Unicode, superseded ASCII, but “basic” characters in the Latin alphabet, along with Arabic numerals, use the same values in ASCII and Unicode.

Unicode supports many more characters, including Latin characters used in foreign languages, such as ő and ç, non-Latin script such as Є, И, عَ, हिं, and か, and special characters such as emoji 😌.

Consider this program:

first_letter = 'a'
last_letter = 'z'

k = ord(first_letter)
print(k)

n = ord(last_letter)
print(n)

Exercise 0.6.12

Exercise 0.6.12

Type up the above in char_example.py.

Note:

  • We’ve used longer, more descriptive variable names like first_letter.
  • We’ll say more about this in a separate section below. the ord function takes a char and produces the corresponding ASCII code. k = ord(first_letter)

Going the other way: from ASCII code to char Consider this program:

k = 97
s = chr(k)
print(s)    # Prints the char 'a'

Exercise 0.6.13

Exercise 0.6.13

Type up the above in char_example2.py. Then, change 97 to 98. What should the value of k be to print the char ‘z’ (last lowercase letter)?

You can save with k set to either 97 or 98.

The value of knowing the ASCII code is that we can iterate over numbers and use that to iterate over letters. For example:

for i in range(97, 123):
    s = chr(i)
    print(s, end='')
print()
  • Inside a computer, all characters are actually stored as integers, and merely interpreted as characters when the occasion calls for it.

Exercise 0.6.14

Exercise 0.6.14

Type up the above in char_example3.py. What does it print? What is the significance of the number 123?

0.6.6 - Variable Names

He have often used single letter variable names, for example:

i = 7
j = 15 * i
print(j)

x = 'Hello' 
y = 'World'
z = x + ' ' + y
print(z)

Let’s rewrite the above with more meaningful variable names:

days_in_a_week = 7
days_in_a_semester = 15 * days_in_a_week
print(days_in_a_semester)

first_greeting_word = 'Hello' 
second_greeting_word= 'World'
full_greeting = first_greeting_word + ' ' + second_greeting_word
print(full_greeting)

About variable names:

  • First, let’s review the very notion of a name in
  • Python, along with other kinds of “words” that are allowed in Python programs. Reserved words or keywords:
  • Some words in the language belong formally to the language itself. These are words like for, in, def, and others. In fact, just to complete this, here’s the full set of 33 reserved words:

and as assert break class continue def del elif else except False finally for from global if import in is lambda None nonlocal not or pass raise return True try while with yield

  • Every one of these must be used in very specific ways, in statements or in more complex structures.
  • We will learn more about these as we proceed in the course.
  • We won’t learn about all of them, because full mastery of the Python language takes more than one course. At the other end of the spectrum, there are words that we freely create for our use, as in these examples:
def print_hello():
    print('Hello') 
    
print_hello() 

x  = 'How have you been?'
print(x)

for i in range(1,10):
    print(i)   
    

Here:

  • We made a function and decided to call it print_hello
  • We decided to call the string variable x, and to call the loop variable i
  • These are called identifiers .

Since we get to choose them, we could rewrite the above program as:

def say_greeting():
    print('Hello') 
    
say_greeting() 

follow_up = 'How have you been?'
print(follow_up)

for loop_variable in range(1,10):
    print(loop_variable)  

So, what should dictate our choice of these names?

  • Generally, for numbers, loop-variables and short calculations, we prefer single-letter names like x and i.
  • Sometimes, variable names should be longer when we want the code to be readable long after it’s written.
  • Function names should carry some meaning so that anyone else who needs to use them should be able to make sense of them.
    • For example, it would not make sense for Python to use steganographic_mysteries instead of print.
  • The use of underscores :
  • You’ve noticed by now the presence of underscores in our variable and function names.
  • An underscore is a convenient visual aid to help us read.
  • Writing days_in_a_week is better than daysinaweek because it’s easier to see.
  • Important : we cannot use spaces, so days in a week would be incorrect as a variable name.
  • Generally, we shouldn’t go overboard and make unnecessarily long names either, as in the_number_of_days_in_a_week.
  • Another important rule: use only letters, digits and underscores, and always start with a letter. Do NOT use other symbols like % or $ in variable names.
  • Finally, there are some function and variable names “already taken”. That is we can’t use them (or shouldn’t) because Python has decided to use them.
    • For example: print is a function name. As are chr and ord and various math functions.
    • While we could technically use these names, we should avoid the conflict because it breaks convention and can cause hard-to-find problems.
    • Here is a list of “avoid” names. Do NOT use these as your own variable or function names.
  abs, all, any, ascii, bin, bool, bytearray, bytes, callable, 
  chr, classmethod, compile, complex, delattr, dict, dir, divmod, 
  enumerate, eval, exec, filter, float, format, frozenset, getattr, 
  globals, hasattr, hash, help, hex, id, input, int, isinstance, 
  issubclass, iter, len, list, locals, map, max, memoryview, min, 
  next, object, oct, open, ord, pow, print, property, range, repr, 
  reversed, round, set, setattr, slice, sorted, str, sum, super, 
  tuple, type, vars, zip

Exercise 0.6.15

Exercise 0.6.15

Let’s see what can go wrong if we use one of the words “already taken”. Consider this program:

Write this program as variable_name4.py:

s = 'Hello'
len = 5       # This is a BAD idea
print(len)
t = 'I could erase the weight of remorse'
k = len(t)
print(k)

The error: 'int' object is not callable, is because we have overridden the built-in len() function with an integer. Fix the error such that the program prints out

45

0.6.7 - A Problem-Solving Example

Possibly the hardest aspect of programming is problem-solving : Typically, we’re given an English description of a problem, with the goal of writing a program to solve the problem.

  • The hard part is not the typing, and remembering syntactic details like parentheses.
  • The hard part is figuring out what to write in code to solve the problem.

Let’s work through an example problem and solve it.

Before that, we’ll learn one more string function:

s = 'The quick brown fox jumps over the lazy dog'
n = s.count('a')     # How many a's occur in the string s?
print(n)
1

Note: We’ve introduced a new feature of strings: the ability to count occurrences of a letter in that string. Notice the unusual way by which the function must be used: n = s.count('a') and NOT n = count(s, 'a').

That is, the function count appears to be part of the string variable s. This is a somewhat advanced topic, so we’ll just use it and be glad we have this feature. (There are other such functions we will use.) This is possible for any string, such as:

x = 'helloooooooo'
print(x.count('o'))    # Number of o's in string x
y = 'mississippi'
print(y.count('i'))    # Number of i's in string x
8
4

Exercise 0.6.16

Exercise 0.6.16

Confirm by typing up the above in letter_count.py.

And now, the problem we’re going to solve:

A pangram is an English sentence that contains all 26 letters. We’ve already seen two examples:

The quick brown fox jumps over the lazy dog
Sphinx of black quartz, judge my vow

There’s an informal competition running over a hundred years to find the shortest grammatically correct English sentence that’s a pangram. - Our smaller problem: given a sentence, print the number of a’s, the number of b’s … and so on. This could be useful in judging such a competition.

Let’s work towards a solution: given a candidate pangram, we could count the number of a’s:

s = 'The quick brown fox jumps over the lazy dog'
print(s.count('a'))

Then, we could also count the number of b’s:

s = 'The quick brown fox jumps over the lazy dog'
print(s.count('a'))
print(s.count('b'))

If we repeated this 26 times, we’d have a count for each letter. But the moment we see a bunch of repetition, our computational problem-solving instincts should kick in: What is the nature of iteration in this problem? We are iterating through the letters a to z And we already know how to iterate over the letters:

for i in range(97, 123):
    letter = chr(i)       
    # Recall: We're going from Ascii code to letter

Could we combine this with the counting of occurrences inside the given string? So, the idea would be:

for i in range(97, 123):
    letter = chr(i)       
    # Now somehow use that to do the counting of occurrences

Combining:

s = 'The quick brown fox jumps over the lazy dog'

for i in range(97, 123):
    letter = chr(i)       
    k = s.count(letter)
    print('Number of occurrences of ' + letter + ' is ' + str(k))

Exercise 0.6.17

Exercise 0.6.17

Type up the above idea count_problem.py. Then, find another pangram and apply the program to that pangram.

Exercise 0.6.18

Exercise 0.6.18

This can be taken a step further. Instead of typing the pangram in the program itself (as was done above), ask the user to enter it as input. In count_problem2.py, read a string from the user and print the letter counts. When you test your program, use the pangram you found as input.

Lastly, we’ll point out a slightly more elegant way of iterating over the 26 letters:

ascii_a = ord('a')
ascii_z = ord('z')
for i in range(ascii_a, ascii_z + 1):
    letter = chr(i)       
    # Now somehow use that to do the counting of occurrences

Note: Instead of typing in the numbers 97 and 123 (which we’d have to remember), we’re instead using the ord function itself to identify the limits of the loop. An even more (but harder to read) compact way is to write:

for i in range(ord('a'), ord('z') + 1):
   letter = chr(i)

We’ll understand this better once we see functions in more detail.

0.6.8 - When Things Go Wrong

In each of the exercises below, try to identify the error before typing it up and confirming. Then, fix the error in the code, using the specified program (.py) name.

Exercise 0.6.19

Exercise 0.6.19
x = 'Hello'
y = 'World'
s = 'x' + ' ' + 'y'

(This should print Hello World, with a space in between). Fix the error in error1.py.

Exercise 0.6.20

Exercise 0.6.20
k = 8
s = 'There are ' + k + ' planets in our solar system'
print(s)

Fix the error in error2.py.

Exercise 0.6.21

Exercise 0.6.21
long_sentence = 'How' + ' ' + 'vexingly' + ' ' +
   'quick' + ' ' + 'daft' + ' ' + 'zebras' + ' ' + 'jump'

Fix the error in error3.py. (This program does not need to print, but there should not be any error when it runs.)

Exercise 0.6.22

Exercise 0.6.22
x = input('Enter a number between 1 and 10')
y = 2 * x
print(y)

Fix the error in error4.py. The result should print out whatever number you input, multiplied by two.

Exercise 0.6.23

Exercise 0.6.23
for i in range(1, 6):
    s = s + '**'
print(s)

(This should print a string with 10 asterisks). Fix the error in error5.py.

Exercise 0.6.24

Exercise 0.6.24
miles = 10
feet per mile = 5280
feet in ten miles = miles * feet per mile
print(feet in ten miles)

Fix the error in error6.py.

End-Of-Module Problems

Full credit is 100 pts (complete at least two problems). There is no extra credit.

These problems make use of return statements in functions. Remember: using return is different than printing!

Problem 0.6.1 (75 pts)

Problem 0.6.1 (75 pts)

The function greet takes as argument a string your_name. The starter program defines an empty string greeting. Write code to generate a greeting of the format shown below. The starter program already returns the string and will print out the output. All you need to do is ensure that the string greeting is correct at the end of the function, before the return statement.

def greet(your_name):
    greeting = ""
    # Write code here to build the greeting
    return greeting

print(greet("Otis"))
print(greet("Carl"))
print(greet("Frances"))
print(greet("Yvan"))

The output should be:

Hello, Otis.
Hello, Carl.
Hello, Frances.
Hello, Yvan.

Submit as greeting_function.py.

Problem 0.6.2 (75 pts)

Problem 0.6.2 (75 pts)

The function temperature_report takes as argument an integer temperature. The starter program defines an empty string report. Write code to generate a temperature report of the format shown below. The starter program already returns the string and will print out the output. All you need to do is ensure that the string report is correct at the end of the function, before the return statement.

def temperature_report(temperature):
    report = ""
    # Write code here to build the report
    return report

print(temperature_report(26))
print(temperature_report(49))
print(temperature_report(85))

The output should be:

The average temperature today will be 26 degrees.
The average temperature today will be 49 degrees.
The average temperature today will be 85 degrees.

Submit as report_function.py.

Problem 0.6.3 (75 pts)

Problem 0.6.3 (75 pts)

The function alphabets takes as argument an integer length. The starter program defines an empty string alphabet_string. Write code to generate a string of sequential alphabetical characters as shown below. The starter program already returns this string and will print out the output. All you need to do is ensure that the string alphabet_string is correct at the end of the function, before the return statement.

def alphabets(length):
    alphabet_string = ""
    # Write code here to build the alphabet string
    return alphabet_string

print(alphabets(5))
print(alphabets(10))
print(alphabets(27))
print(alphabets(29))

The output should be:

abcde
abcdefghij
abcdefghijklmnopqrstuvwxyza
abcdefghijklmnopqrstuvwxyzabc

Submit as alphabet_strings.py.