Module 1 Supplement : Pangram
The Obvious Strategy
There are a variety of strategies that we can use to solve the Pangram problem, so first consider your first instinct to the problem.
The most straightforward strategy/algorithm looks something like this:
- Start with the assumption that the string in question is a pangram and we will attempt to disprove this assumption. How? Well, if we find a character in the alphabet that is not in the string, then we have proven it is not a pangram, BUT If we look through the entire alphabet and find every alphabet character in the string, then it must be a pangram.
- Pick a character from the alphabet. Let's call it the 'current alpha character' or 'current alpha'. To keep thing simple, start with 'A' since it is the beginning of alphabet and stop with 'Z' since it is the end of the alphabet. Anytime we are to pick the 'next' alpha, select the alpha character following the current alpha character and make that next alpha the current alpha character. In other words, the next alpha after 'A' is 'B' and the next alpha after 'M' is 'N' and so forth.
- We will 'scan' through the string. This involves selecting the string character at the start of the string. Assuming we continue scanning, we select the subsequent string character on the next pass and so forth until we reach the end of the string. We will call the currently selected character in the string the 'current string character'.
- If the current alpha matches the current string character, we have 'found' the current alpha in the string.
- If the current alpha does NOT match the current string character, we need to continue scanning the string at the next character for the current alpha character.
- If we scan past the end of the string and the current alpha is not found, we have proven the string is not a pangram and the process has failed.
- Otherwise, if the current alpha is found, advance to the next alpha character and repeat the above process by scanning again from the start of the string.
- If we reach the end of the alphabet and the above process has never failed, then the string must be a pangram (our original assumption holds).
Analysis of the Obvious Strategy
While this is a perfectly fine algorithm that accomplishes the goal, it is not an optimal approach to the problem. Take a step back and generalize this algorithm such that it searches for some list of symbols in an arbitrary sequence of symbols. If our list of symbols or our the sequence in which we search grow larger, the algorithm will become significantly slower because we are nesting a loop within another loop. Can we quantify this so that we have an objective measure of how 'fast' the program is?
We can approximate how many operations the program will take by multiplying togther the number of symbols by the length of the sequence. For our pangram, this would be 26n where n is the length of the string sequence. For our abstract problem, it would be mn where m is the number of symbols in our 'alphabet' and n is the length of our string.
So, is there a faster way?
The Fast Strategy
This strategy will scan through the string only once:
- Again start with the assumption that the string is a pangram.
- In addition, allocate an array of integers containing 26 'buckets' with each bucket intialized to zero. (Use this array to count the number of times an alpha character appears in the the string. The zero index will map to the count of 'A's in the string, the one index will map to the count of 'B's in the string, the twenty-fifth index will map to the count of 'Z's in the string, and so forth. We intiialize all values in the array to zero because we don't know how many instances of each character appear in the string until we scan through and count them.)
- Select the string character at the start of the string. We will scan through and repeat the following process for all subsequent string characters after examining the first.
- Compute the zero-based ordinal position in the alphabet for the current string character. If the current string character is an 'A', it's zero-based ordinal position is 0, if 'B' it's ordinal position is 1, and so forth up to 'Z' whose ordinal position is 25.
- Increment the count of the array cell indexed by the computed ordinal position. For example, if the current string character is an 'A', add 1 to the value in the array's 0th bucket, if the current string character is 'B', add 1 to the 1th bucket, and so forth up to 'Z', where we add 1 to the 25th bucket.
- Repeat the above steps until we scan past the end of the string.
- Scan through the array starting at index 0, the count of 'A's, and ending at index 25, the count of 'Z's.
- If any bucket contains a zero, the string must not be a pangram.
- If each cell in the array is non-zero, then our assumption holds and the string must be a pangram.
Analysis of the Fast Strategy
Why is this a faster algorithm?
We have broken the problem up into a number of seperate, individual tasks that we can perfom quickly:
- We scan through the string sequence only once to count the occurances of each symbol.
- We iterate over the counts and check for a zero count
In the obvious algorithm, the task of checking a letter is nested within the outer loop that scans through the string. In this algorithm, checking the letter is an entirely seperate process that operates on the data stored in the array. As a result, this program takes 26 + n operations where n is the number of symbols in our string sequence.
If we generalize this problem as we did with the obvious algorithm, we will find that this algorithm take m + n operations where m is the number of symbols in our alphabet and n is the number of symbols in our string. m + n is much less than mn for any notable size of m or n, so this algorithm is faster than the obvious algorithm.
© 2006-2020, Rahul Simha & James Taylor (revised 2021)