Module 5: Problem Solving with Chars and Strings


Objectives

 

Problem solving, using computer programming of course, is the quintessential skill in computer science.

It is also a vague, slow to acquire skill that takes time and patience. And with lots of examples, and practice.

What exactly does this mean how is it different from programming?

 

The last point is worth elaborating:

 

By the end of this module, for simple programs, you will be able to:

 

5.0 Audio:
 


5.0    Palindromes

 

Suppose we want to check whether sentences like

  char[] sentence = {'n','e','v','e','r',' ','o','d','d',' ','o','r',' ','e','v','e','n'};
are palindromes (without counting the spaces).
 

How do we get started?

The wrong way is to start programming right away:

  • "Oh, I'll use a for-loop to ..."
  • "Looks like I'll need to define an array variable ..."
 

Instead, let's use a higher-level approach:

  • See if a simpler version of the problem can be posed.

  • Then see if that can be used or modified easily.

In this case, a simpler version might be: identifying whether or not a char array (without spaces) is a palindrome.

Furthermore, suppose this sub-problem is already solved. That is, we have a method, say, that does this:

     static boolean isPalindrome (char[] letters)
     {
         // Assumption: letters has no spaces

         // ... some code ...
     }
     
 

Can we now solve the larger problem? We will use these steps:

   1.  Make a new array with the spaces removed.
       Thus "never odd or even" will become "neveroddoreven"
   2.  Then call our already written isPalindrome() method.
 

Is the first step straightforward? Here are some issues:

  • How do we know the size of the second array (without spaces)?
           ⇒ It's the length of the former minues the number of spaces.
  • How do we know how many spaces there are in the original?
           ⇒ We can do a count (for-loop) looking for spaces.
 

So, a slightly more detailed higher-level approach is:

   1.  Make a new array with the spaces removed.
       1.1   Count the number of spaces in the array.
       1.2.  Use this to determine the size of the new array.
       1.3.  Make the new array.
       1.4   Copy over the non-space characters from the old to the new.
   2.  Then call our already written isPalindrome() method.
 

We can now begin to outline some code.

  • First, in main:
         public static void main (String[] argv)
         {
             // Data:
             char[] sentence = {'n','e','v','e','r',' ','o','d','d',' ','o','r',' ','e','v','e','n'};
    
             // Step 1: create a new array with spaces removed:
             char[] withoutBlanks = removeBlanks (sentence);
    
             // Step 2: apply the usual palindrome checking:
             if ( isPalindrome (withoutBlanks) ) {
                 System.out.println ("Yup, it's a palindrome alright");
             }
         }
         
  • Now let's flesh out the space removal a little:
         static char[] removeBlanks (char[] A) 
         {
             // 1.1 Count the number of spaces.
             // 1.2 Calculate the size of the smaller array.
             // 1.3 Make the new array.
             // 1.4 Copy over the non-space chars from the old to the new.
         }
         
  • Notice how we've merely copied over the high-level approach as comments.

  • It is now much easier to do each one of these one by one.
 

Here's the method removeBlanks:

     static char[] removeBlanks (char[] A) 
     {
         // 1.1 Count the number of spaces.
         int numBlanks = 0;
         for (int i=0; i<A.length; i++) {
             if (A[i] == ' ') {
                 numBlanks ++;
             }
         }

         // 1.2 Calculate the size of the smaller array.
         int size = A.length - numBlanks;

         // 1.3 Make the new array.
         char[] B = new char [size];

         // 1.4 Copy over the non-space chars from the old to the new.

         // ... fill in the code needed ... 

     }
     
 

5.1 Exercise: In CheckPalindrome.java write out the rest of the method and test it out by printing the resulting smaller array.
 

Next, let's write isPalindrome():

     static boolean isPalindrome (char[] letters)
     {
         int n = letters.length;
         for (int i=0; i<(n-1)/2; i++) {
	     if (letters[i] != letters[n-i-1]) {
		 return false;
	     }
         }
	 return true;
     }
     
 

5.2 Exercise: In module5.pdf trace through the above program when the word is "river". And again when the word is "kayak". Why does the for-loop stop at (n-1)/2? Will the loop work for both odd and even sizes?
 

5.3 Exercise: Put the pieces together so that you have a complete program in CheckPalindrome.java and test with the following:

  char[] sentence = {'n','e','v','e','r',' ','o','d','d',' ','o','r',' ','e','v','e','n'};
  char[] studentPlea = {'C', 'a', 'n', ' ', 'i', ' ', 'a', 't', 't', 'a', 'i', 'n', ' ', 'a', ' ', 'C'};
  char[] yourFavorite = ...
Also, include a string that's not a palindrome.
 

5.4 Audio:
 


5.1    Approximate word matching

 

Let's start by testing whether two char arrays are equal:

  • Notice that the return value of isEqual is boolean:

  • Next, before even comparing array contents, we check that the lengths are equal:

    Clearly, if the lengths aren't equal, we can return false immediately.

  • One could use a "flag" variable:

  • Notice that, in main, we directly printing the return value of the method isEqual().
 

When are two words approximately equal?

    For numbers, we could say that 3.14 is approximately equal to 3.15. What about words?

  • We will use wildcard matching for approximate comparison.

  • Suppose an asterisk * represents "match any character".

  • Then, "rive*" matches "river" and "rivet".

  • Similarly, "*i*er" matches "river" and "tiger".

  • Thus, the problem we wish to solve: given two words with asterisks in them, establish whether they are equal.

  • Here's the program:

 

5.5 Exercise: In module5.pdf trace through the above program. Why is the condition B[i]!='*' needed? If you remove it, does the comparison fail for the above examples? Can you find examples where it would fail?
 

5.6 Exercise: In MyEqualWords.java replace the nested-if with a single if-statement that combines the conditions.
 

5.7 Video:

 


5.2    Searching for a word within a sentence

 

In this next example, we will look for a word (a search term) within a sentence and identify the position where it occurs (if it does):

  • Thus, if the sentence is "never odd or even" and the search word is "odd", we want to return 6.

    For the same sentence and search word "prime", we want to return -1 (not found).

  • Let's write a method called wordsearch that has the following signature:

  • If that's the case, we can use it as follows:

  • Here's one attempt:

 

5.8 Exercise: In module5.pdf trace through the program when the sentence is "never odd or even" and the search word is "eve"? What does this say about our wordsearch solution?
 

5.9 Exercise: In MyWordSearch.java modify the wordSearch method so that it returns a proper index only if a full word is found to match. That is, a search for "eve" in "never odd or even" should return -1, whereas searches for "never", "odd", "or", or "even" should return the correct position. Then, in the main method, print out the return values only for four calls to wordSearch using "Never", "even", "odd", and "ever" as the word to search for in the string "Never odd or even".
 

5.10 Audio:
 


5.3    Common prefixes

 

Next, let's consider the prefix problem: given two words, find their common prefix:

  • Example: rive is the common prefix shared by river and rivet.

  • Our goal is to write a method with this signature:
            static char[] commonPrefix (char[] A, char[] B)
    
    so that it takes two char arrays and returns the common prefix in a char array.

  • Then, we can use this method as follows:
        char[] word = {'r','i','v','e','r'};
        char[] word2 = {'r','i','v','e','t'};
        char[] prefix = commonPrefix (word, word2);
        // Should return "rive" in array prefix.
    

  • Let's start with this idea:
       1.  Suppose the two arrays are A and B.
       2.  Start with i=0 at the leftmost position of each array.
       3.  As long as A[i]==B[i], advance along the array.
       4.  The moment A[i] != B[i], stop.
    

  • Consider this attempt:

 

5.11 Exercise: In module5.pdf trace through the program when the words are "river" and "rivet". Why are we computing the minimum of the lengths of the two arrays? What would go wrong if we used A.length instead as the for-loop limit? Change the words to "river" and "rover". What do you notice? Trace through the program. Why doesn't it work?
 

Let's address the problem as follows:

 

5.12 Exercise: In module5.pdf trace through the program with the above fix. Why does this work?
 


5.4    Cryptography Part I: The Caesar cipher

 

From times ancient rulers and rebels found the need to communicate in code.

This meaning of code is different: something encrypted to prevent understanding.

The essence of cryptography:

  • Although there is lots of two-way communication, in essence, it boils down to a Sender that needs to send a message to a Receiver in the presence of an enemy interceptor.

  • Goals:
    • It should be hard for the enemy to decrypt.
    • It should be easy for sender to encrypt, and for the receiver to decrypt.

  • A cipher is an encryption technique.

  • The original text or message to be encrypted is called the plaintext.

  • The encrypted message or text is called: ciphertext.
 

One of the oldest ciphers is the Caesar cipher:

  • To simplify, let's assume: lowercase letters only (no whitespace or punctuation).

  • For encryption: each letter is shifted by a fixed amount (e.g., 2).

  • For decryption: shift back by the same amount.

  • Important: use wraparound.

 

5.13 Audio:
 

5.14 Exercise: In module5.pdf explain how you would use the integer representation of chars and the % (remainder) operator to compute the Caesar shift of a char. To simplify, first consider only lowercase letters 'a' through 'z' and suppose 'a' is represented by 0, 'b' is represented by 1 ... etc. Next, suppose we do a Caesar shift by the amount s. Write an expression for computing the shifted character. Show how it works for 'z'.
 

Next, let's write some code to implement the Caesar cipher:

  • To begin, let's write the test code in main:

  • Notice that the method caesarEncrypt takes a char[] array (the plaintext), and shift as parameters:

  • And returns a char[] array:

  • To print the array after encryption, we have used a method called Arrays.toString from the Java library:

  • To be able to use some library methods, we need to import the library:

  • import statements need to appear at the top, outside the class:

  • An import statement uses the import reserved word:

  • And names the library, in this case java.util:

  • The asterisk simply asks to import everything in the java.util library.

  • Alternatively, we could have imported just the class we need:

 

Now, let's write the code inside the method caesarEncrypt:

 

5.15 Exercise: In module5.pdf trace through the above method if it's called with with the letters "abc" and a shift of 2.
 

5.16 Exercise: In MyCaesar.java write the method caesarDecrypt so that the whole program works correctly.
 

5.17 Exercise: Now suppose k is an integer. What is the relationship between k % 26 and (k+26) % 26? Try out a few examples and see if you can infer a rule. Write it out in module5.pdf.
 

Now, let's look at an alternative way of writing the same program:

  • It seems that we should be able to use the almost-symmetry between encryption in the following way:

  • Thus, decryption is merely a negative shift (shift in the reverse direction):

  • To do this, we'll use the property of the mod operator that we explored earlier:

  • We've added 26 so that the number being mod'ed is positive.

 

5.18 Exercise: In MyCaesar2.java fill in the remaining code (in method caesarShift) to make the program work.
 

5.19 Audio:
 


5.5    Cryptography Part II: The Vigenere cipher

 

Improving on the Caeser cipher:

  • As we've seen, the Caesar cipher is rather easy to break.

  • The problem is: there are only 26 possible shifts
           ⇒ the same shift is used for each plaintext letter.

  • It's convenient of course: the receiver only needs to know the shift once
           ⇒ It can be used for many messages.

  • An idea for strengthening the encryption: use a different shift for each char in the text

  • Unfortunately, there are several downsides:
    • Both sender and receiver have to have communicated the sequence of "shifts" earlier.
      In this case, for example, both would have to "know"
      
        2 - 1 - 0 - 3 - 10 - 8 - 3 - 1 - 4 - 7 - 7 - 9  
                
    • Because it's hard to remember, it would have to be stored somewhere, risking discovery.
    • Messages can't be longer than the shift-pattern.
 

The Vigenere cipher addresses this problem:

  • First, note that shifts are numbers between 0 and 25.
           ⇒ A shift can be encoded as a letter!

  • Thus, a "shift of 3" can be expressed as a "shift by 'd'".

  • But is a pattern like
    
      c  b  a  d  k  i  d  b  e  h  h  j  
              
    any easier to remember than
    
      2 - 1 - 0 - 3 - 10 - 8 - 3 - 1 - 4 - 7 - 7 - 9?
              

  • Also, it is just as susceptible to discovery if written somewhere.

  • The "key" insight of Monsieur Vigenere:
    • Both sender and receiver agree on a reasonable-size, easy-to-memorize key
             ⇒ A string of letters.
    • Example: "bad cafe".

  • Then, the key is repeated (concatenated with itself) as much as needed to match the length of the text.

  • Then, above each text letter is a key letter.
           ⇒ The key letter specifies the shift.
 

5.20 Exercise: What is the 3-letter Vigenere key such that encryption with this key will result in the equivalent ciphertext that you get with a Caesar shift by 2?
 

Next, we'll work towards an implementation:

  • Let's start with the "test" code in main

  • Next, consider vigenereEncrypt:
        static char[] vigenereEncrypt (char[] text, char[] key)
        {
    	int n = text.length;
    
    	// Make the space for the encrypted text.
    	char[] encrypted = new char [n];
    	
    	// Repeat the key up to n chars and get the shifts
    	int[] shift = computeShifts (key, n);
    
    	// Apply the shift for each text letter
    	for (int i=0; i<n; i++) {
    	    encrypted[i] = shiftChar (text[i], shift[i]);
    	}
    	return encrypted;
        }
    

  • Notice we have used the same shiftChar method used with the Caesar cipher.

  • Decryption is symmetric, with the same key:
        static char[] vigenereDecrypt (char[] text, char[] key)
        {
    	int n = text.length;
    	char[] encrypted = new char [n];
    	int[] shift = computeShifts (key, n);
    	for (int i=0; i<n; i++) {
    	    encrypted[i] = shiftChar (text[i], -shift[i]);
    	}
    	return encrypted;
        }
    

  • What's left: extending the key by repetition and returning the corresponding array of shifts.
 

5.21 Exercise: In MyVigenere.java write the code for computeShifts and complete the whole program so that both encryption and decryption work.
 


5.6    Cryptography Part III: Transposition ciphers

 

The ciphers we've seen so far use letter substitutions: they are called substitution ciphers.

In a transposition cipher, all the original plaintext letters are used, but they are shuffled.

For example, one could reverse the characters:

     nwadtakcatta
  

Clearly, this is not as effective as other possible shuffles.

Let's consider a two-interleaving:

  • Here all the letters are array indices 0, 2, 4, ... are clustered together on the left in the order in which they occur, while the odd ones are to the right.

  • Now let's write encryption:
        static char[] transposeEncrypt (char[] text)
        {
    	int n = text.length;
    	char[] encrypted = new char [n];
    
    	// First copy over the chars located at even positions.
    	int k = 0;
    	for (int i=0; i<n; i+=2) {
    	    encrypted[k++] = text[i];
    	}
    
    	// Now copy over the chars at odd positions.
    
            // FILL IN THIS PART
    
    	return encrypted;
        }
    

  • Notice how the letters in the even-numbered positions get copied into the first half of the encrypted text:
    	int k = 0;
    	for (int i=0; i<n; i+=2) {
    	    encrypted[k++] = text[i];
    	}
    
    Here, we use two array index variables:
    • The variable i traverses the text array.
    • And k traverses the encrypted array.
    • Because both traverse differently, we increment them differently.

  • Next, observe how we incremented k:
    	    encrypted[k++] = text[i];
    
    Here
    • k is used as the array index and THEN incremented.
    • It has exactly the same effect as:
      	    encrypted[k] = text[i];
                  k++;
      
    • It's used as a "short cut" because it's more compact.
 

5.22 Exercise: In MyTranspose.java complete the rest of the encrypted (copying over from the odd-numbered position in the text).
 

5.23 Exercise: Write the code for decryption and complete the program (in MyTranspose.java).
 


5.7    Cryptography Part IV: Combining ciphers

 

One can easily encrypt already-encrypted text, e.g.

Notice how each successive encryption is fed as the plaintext to the next encryption:

 

5.24 Exercise: Pool together your code from the three ciphers and apply the above encryption in MyCombinedCipher.java What is the resulting ciphertext? Then, show that the ciphertext decrypts when the decryption is applied in reverse order.
 

5.25 Exercise: With the same ciphers in the same order but using "java" as the Vigenere key, the following is the result of encryption: "mgfcmymfmiqqddufmyowomyhfgbhssrvpdzb" Decrypt to find the original plaintext.
 

About cryptography:

  • Modern ciphers operate at the bit-level, not on characters, but the same principles apply.

  • Modern ciphers apply several types of encryption in sequence.

  • We have barely scratched the surface of cryptography.

  • "Crypto", as it's informally called by its practitioners, is vast subject with many deep and fascinating results.

  • Many key people involved in cryptography were or are computer scientists - such as Alan Turing.

  • Crypto is a part of a larger field called computer security, which also involves secure computing systems and networks.
 

Let's step back for a moment:

  • With the cryptography examples, we've gone beyond problem-solving to actually developing an application.

  • It would be challenging to implement all the ideas in one go.

  • Instead, by breaking it down into manageable (yet still challenging) pieces, we've developed a fairly sophisticated program.
 

5.26 Audio:
 


5.8    Computational Biology: Protein "strings"

 

Having scratched the surface of crypto, we'll now scratch another: computational biology.

What we'll do:

  • Understand that proteins (an important type of bio molecule) can be represented using strings.
           ⇒ the strings are called protein sequences.

  • Perform some types of operations on these protein sequences that are useful in biology.
 

Let's start by understanding what a protein is.
 

5.27 Exercise:

  • Look up proteins - what is a protein and what does it do?
  • Look up insulin. What is its purpose? What diseases are associated with insulin? What Nobel prizes were associated with insulin?
 

What we need to know about proteins:

  • Although a protein has a 3D shape, it can be considered a long chain of "units".
           ⇒ The 3D shape results from this chain folding on itself.

  • Each unit in the chain is determined by a particular kind of small (or sub-) molecule called an amino-acid.

  • There are 20 types of amino acids.

  • These are the 20:

    # Amino Acid Symbol
    1 Alanine A
    2 Arginine R
    3 Asparagine N
    4 Aspartic acid D
    5 Cysteine C
    6 Glutamic acid E
    7 Glutamine Q
    8 Glycine G
    9 Histidine H
    10 Isoleucine I
    11 Leucine L
    12 Lysine K
    13 Methionine M
    14 Phenylalanine F
    15 Proline P
    16 Serine S
    17 Threonine T
    18 Tryptophan W
    19 Tyrosine Y
    20 Valine V

  • Think of the 20 amino acids as represented by their (20) letters:
          A R N D C E Q G H I L K M F P S T W Y V
        

  • Then, a protein is a "word" made from letters in this 20-letter alphabet.

  • For example: here is part of the insulin protein:
    	F V N Q H L C G S H L V E A L Y L V C G E R G F F Y T P K T 
        
    • It's length is 30.
    • It happens to be called the B-chain in human insulin.
 

To start with, let's understand a couple of things about insulin:

  • Insulin is made from a precursor molecule call preproinsulin.

  • Preproinsulin is made directly from the gene for insulin.

  • For example, here is the 110-letter sequence for preproinsulin:
    MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    

  • Parts of this are called the A-chain, B-chain, and C-chain respectively.

  • For example, this is the 30-letter B-chain above:
    MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    

  • And the 21-letter A-chain:
    MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    

  • The sequence in between is called the C-chain:
    MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    

  • The real insulin molecule is actually a protein complex that is formed from the A and B chains.
 

Let's now work on some string problems related to insulin.

Before doing so, it'll help to review some methods available for strings in Java.
 

5.28 Exercise: Go back to Module 4 and review these methods in String: charAt, indexOf, length, substring, equals.
 

To start with, let's look for the three chains in preproinsulin:

  • Suppose we are given:
    1. The sequence for preproinsulin.
    2. The A-chain and B-chain.

    The goal: find where in the preproinsulin they occur, and extract the C-chain.

  • Here is the program:
    	String preproSeq = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN";
    	String aChain = "GIVEQCCTSICSLYQLENYCN";
    	String bChain = "FVNQHLCGSHLVEALYLVCGERGFFYTPKT";
    
    	// Find where the A and B chains occur in preproinsulin.
    	int aIndex = preproSeq.indexOf (aChain);
    	int bIndex = preproSeq.indexOf (bChain);
    	System.out.println ("A chain starts at position " + aIndex);
    	System.out.println ("B chain starts at position " + bIndex);
    
    	// Extract the C-chain.
    	int cIndex = bIndex + bChain.length();
    	String cChain = preproSeq.substring (cIndex, aIndex);
    	System.out.println ("C-chain: " + cChain);
         
  • Note: we've used a version of the method indexOf() that takes a String as parameter and returns the start index where it occurs.
 

5.29 Exercise: The following are the preproinsulin sequences for gorilla and owl monkey. Use the equals method to see whether they are the same as human preproinsulin. That is, write a short program in MyInsulin.java to do this testing in main.

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAEDLQVGQVELGGGSITGSLPPLEGPMQKRGVVDQCCTSICSLYQLQNYCN
Download sequences.txt, which has the above two sequences for copy-paste, as well as the sequences needed for the exercises below.
 

Next, let's consider the problem of approximate string matching:

  • Suppose we don't know the identity of the A and B chains of the owl monkey.

  • We do know the human ones.

  • Thus, we'll solve this problem:
    • We're given the preproinsulin sequence of the owl monkey:
      MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAEDLQVGQVELGGGSITGSLPPLEGPMQKRGVVDQCCTSICSLYQLQNYCN
      
    • We know the human B-chain:
      FVNQHLCGSHLVEALYLVCGERGFFYTPKT
      
    • We want to know: which substring of the owl-monkey matches the human B-chain most closely.

  • To do this, we will develop the idea of distance between two strings.
 

Consider the following computation of string distance:

 

5.30 Exercise: Write code in main above to apply this distance measure to compare "bold" with each of the following strings: "bald", "boy", "old".
 

About string distance:

  • In general, there are many definitions of string distance.

  • The simple definition above failed for a simple "shifted" string.

  • In fact, in computational biology, there are some fairly complicated definitions that make use of statistics.
 

We'll use our simple distance measure to find the best B-chain match within the owl monkey chain.

  • For example, suppose we compare the human B-chain at the 5-th position of the owl-monkey preproinsulin:
    MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAEDLQVGQVELGGGSITGSLPPLEGPMQKRGVVDQCCTSICSLYQLQNYCN
         FVNQHLCGSHLVEALYLVCGERGFFYTPKT
    

  • To use the distance method we wrote earlier, we'd have to extract the sub-string from the preproinsulin sequence.

  • Here's a program that makes the comparison at the 5-th position.
    	String owlMonkeyPrepro = "MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAEDLQVGQVELGGGSITGSLPPLEGPMQKRGVVDQCCTSICSLYQLQNYCN";
    	String humanB = "FVNQHLCGSHLVEALYLVCGERGFFYTPKT";
    
    	int i = 5;
    	String sub = owlMonkeyPrepro.substring (i, i + humanB.length());
    	int d = distance (sub, humanB);
    	System.out.println ("Distance=" + d + " at position=" + i);
    	System.out.println ("Human B  : " + humanB);
    	System.out.println ("Owl match: " + sub);
        
 

5.31 Exercise: In StringDistance2.java, expand on the above code to search all possible positions in the owl-monkey preproinsulin and find the best possible match (with the least distance). Print the position where this occurs, and the distance.
 

5.32 Exercise: Now try the same thing with a few non-primates. Before running your code, which of the three do you expect to produce a result closest to human? Farthest?

Hummingbird:
IQSLPLLALLALSGPGTSHAAVNQHLCGSHLVEALYLVCGERGFFYSPKARRDAEHPLVNGPLHGEVGDLPFQQEEFEKVKRGIVEQCCHNTCSLYQLENYCN

Zebrafish:
MAVWLQAGALLVLLVVSSVSTNPGTPQHLCGSHLVDALYLVCGPTGFFYNPKRDVEPLLGFLPPKSAQETEVADFAFKDHAELIRKRGIVEQCCHKPCSIFELQNYCN

Fruitfly:
SSSSSSKAMMFRSVIPVLLFLIPLLLSAQAANSLRACGPALMDMLRVACPNGFNSMFAKRGTLGLFDYEDHLADLDSSESHHMNSLSSIRRDFRGVVDSCCRKSCSFSTLRAYCDS
 

Even when a mismatch looks very difficult, it is possible to identify reasonable matches.
       ⇒ This is the subject of advanced techniques in computational biology.
 

5.33 Video:

 


5.9    Meta

 

This module stretched us a bit. Let's ask ourselves: why was it challenging?

  • The issue is: when applying programming to a domain, we need to spend effort in understanding the domain. Doing both, programming and wading into the details of a domain, can be challenging.

  • Over time, as you get better at programming, you will worry less about the programming and concentrate on the domain of interest.

  • Another challenge is that our programs have gotten longer because they are "doing more".

  • Again, we will get better at reading and writing. After a few more units, this module will seem much simpler.

  • Pushing yourself through challenging material like this will build confidence for later.

  • After all, if you can handle cryptography and computational biology, can anything else be difficult?
 

5.34 Audio:


On to Module 6



© 2017, Rahul Simha