\( \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \newcommand{\bigsp}{\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\plss}{\;\;+\;\;} \newcommand{\miss}{\;\;-\;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \)


Module 3: Search Trees


Module objectives

 

By the end of this module, you will be able to:

 


3.1   Introduction: What does "search" mean?

 

The general search problem:

  • Given a collection of data and a "search term", find matching data from the collection.

Examples:

  • Word search in a dictionary
    • Dictionary: collection of words (strings), with accompanying "meanings"
    • Search term: a word
    • Goal: find the word's "meaning", if it exists in the dictionary.
    • Similar examples: symbol table search.

  • Pattern search in a text
    • Text: a large blob of characters.
    • Pattern: boolean or regular expression (e.g, "Find 'Smit*' OR 'J*n*s'")
    • Goal: find all occurences of pattern in text.
    • Similar examples: the Unix grep utility.

  • Document retrieval
    • A large collection of documents.
    • Input: a search string, boolean conditions
    • Goal: retrieve all documents containing the string.

  • Geometric searching
    • Data: list of points
    • Input: a query rectangle
    • Goal: find all points that lie in the query rectangle
    • Similar examples: nearest-point, intersecting objects

  • Database search
    • Data: collection of relational tables
    • Input: a database query (SQL)
    • Goal: compute query results (find matching data)
 

We will divide search problems into the following categories:

  • Ordered Search:

    • Example data: a set of strings
    • Input: a query string
    • Main goals: is the query string in the collection? Find minimum, maximum.
    • Other goals: support insertion, deletion, sorting, find-nearest (in order).
    • Variations: other data types (numbers, Comparable-objects).
    • Typical (Java) interface:
      
      public interface OrderedSearchAlgorithm {
      
        public void insert (Comparable key, Object value);
        
        public Object search (Comparable key);
      
        public Object minimum ();
      
        public Object maximum ();
      
      }
             

  • Equality Search:

    • Example data: a set of images
    • Input: a query image
    • Main goal: is the query image in the collection?
            ⇒ Faster search if "order" is not important.
    • Other goals: support insertion, deletion.
    • Variations: other data types (numbers, Comparable-objects).
    • Typical interface:
      
      public interface EqualitySearchAlgorithm {
      
        public void insert (Object key, Object value);
        
        public Object delete (Object key);
        
        public Object search (Object key);
      
      }
             

  • Pattern Search:

    • Data: a large text.
    • Input: a query string.
    • Goal: find first occurence of the pattern in text.
    • Variations: find all occurences.
 

Algorithms vs. data structures:

  • Search "algorithm" usually implies underlying data structures
          ⇒ distinction is blurred.

  • Search_solution = data_structure + algorithm_collection
 

Maps and Sets:

  • It is useful to distinguish between a map and a set.

  • Set:
    • Usually a collection of objects.
    • Typical interface:
      
      public interface Set {
      
        public void add (Object obj);
      
        public boolean search (Object obj);
      
        public void delete (Object obj);
      
        public Enumeration getEnumeration ();
      
      }
           
    • Note: no notion of a "key".
    • Examples: collections of strings, collections of integers.
    • Java API implementations: HashSet, TreeSet.

  • Map:
    • Supports key-value pairs (the "mapping").
    • Associate a key with every object.
    • Example:
      • Suppose we need to store complex (compound) objects, e.g.

      • Associate a key with each object:

      • Store both keys and objects:

    • Typical interface:
      
      public interface Map {
      
        public void add (Comparable key, Object value);
      
        public Object search (Comparable key);
      
        public void delete (Comparable key);
      
        public Enumeration getKeyEnumeration ();
      
        public Enumeration getObjectEnumeration ();
      
      }
           
    • Java API implementations: Hashtable, HashMap, TreeMap.

  • Note:
    • Most often we will study maps instead of sets.
    • "key-value" pair is also used instead of "key-object" pair.
    • Often, we will ignore the "value" part since only the "keys" play a role in searching.
      We will say "insert a key" where we mean "insert a key-value pair".

Preview:

  • Ordered search:
    • Uses natural ordering among items
      (e.g., alphabetically-ordered strings).
    • For fast navigation: split up data according to order:
            ⇒ tree structure
    • Examples: binary tree, AVL-tree, multi-way tree, self-adjusting binary tree, tries

  • Equality search:
    • Examples: self-adjusting binary tree, hashing

  • Pattern search:
    • String searches in large texts.
    • Boyer-Moore algorithm, finite-automata searching.
 

In-Class Exercise 3.1: Download MapSetExample.java and add code:

  • Create an instance of a set data structure from the Java library, add a few elements to the set, and print those elements at the end.
  • Create a similar example of a map data structure, add a few elements to the map, and print those elements at the end.
 


3.2   Binary search trees

 

What is a binary search tree?

  • A Binary Search Tree (BST) is a binary tree in in-order.

  • The following ordered-search operations are supported:
    • Insertion: insert a key-value pair.
    • Search: given a key, return corresponding key-value pair if found.
    • Min, max: return minimal and maximal keys in structure.
    • Deletion: given a key, delete the corresponding key-value pair (if found).

  • Additional operations: successor, predecessor of keys.

  • BST's are implemented using binary trees.
 

Insertion:

  • Key ideas:
    • Search to see if it's already there.
    • If not, navigate tree looking for the "right" place to insert.

    • Go left if key is less than current node, right otherwise.
     

    In-Class Exercise 3.2: Write pseudocode for search, both a recursive and an iterative version.
     

  • Pseudocode for insertion:
    
    Algorithm: insert (key, value)
    Input: key-value pair
    
    1.   if tree is empty
    2.     create new root with key-value pair;
    3.     return
    4.   endif
    
    5.   treeNode = search (key)
    
    6.   if treeNode not null
           // It's already there.      
    7.     Replace old with new;
    8.     return
    9.   endif
    
         // Otherwise, use recursive insertion 
    10.  recursiveInsert (root, key, value)
      

    
    Algorithm: recursiveInsert (node, key, value)
    Input: a tree node (root of subtree), key-value pair 
    
    1.   if key < node.key
    
    2.     if node.left not empty
    3.       recursiveInsert (node.left, key, value)
    4.       return
    5.     endif
    
           // Otherwise, append key here.      
    6.     Create new node and append to node.left;
    
         // Otherwise, go right. 
    7.   else
    
    8.     if node.right not empty
    9.       recursiveInsert (node.right, key, value)
    10.      return
    11.    endif
    
           // Otherwise, append key here. 
    12.    Create new node and append to node.right;
    
    13.  endif
      

  • Example: we will insert the (single-letter) keys "D A C B E F G".
    Insert D: "D" becomes the root.
    • Inserting key = "A"

    • Inserting key = "C"

    • Inserting key = "B"

    • Inserting key = "E"

    • Inserting key = "F"

 

In-Class Exercise 3.3: Insert the following (single-character) elements into a binary tree and draw the binary tree: "A B C D E F G".
 

Min (and max keys)

  • To find the minimum value:
    • Start at the root.
    • Keep going left until you can't anymore.
    • The stopping point has the least value.

  • Example (see above tree).

  • To find the minimum value in a subtree:
    treat the root of the subtree as a root.
 

In-Class Exercise 3.4:

  • Write pseudocode (high-level and detailed) for computing Max, both recursive and iterative versions.
  • Review pre-, in- and post-order traversal of binary trees. Write pseudocode for recursive versions of each.
 

Successors (and predecessors):

  • The successor to a particular key is the smallest key in the tree that is larger than the given key.

  • The successor to a particular node is the successor to the node's key.

  • Example: the successor to "E" above is "F".

  • Finding the successor to a node:
    • If the node has a right child, successor is
          ⇒ minimum value in subtree rooted by right child.
      (Intuition: if a node has "stuff" to the right-and-below, the successor has to be in there).

    • Note: we already know how to find the minimum element in a sub-tree (see above).

    • If the node does NOT have a right child, there are two cases:
      • Case 1: the node is the root
              ⇒ no successor.
      • Case 2: the node is not the root
              ⇒ successor is between the node and the root.
              ⇒ need to walk back up until a "right turn" occurs.
        (Intuition: if you are walking up "right links" you get smaller values, until the turn occurs.)

 

Deletion:

  • Deletion is a little more involved.

  • Case 1: node is a leaf (no children)
    • Easy case: simply delete leaf and set pointer to null in parent.

  • Case 2: node has one child
          ⇒ "splice out" node
          ⇒ make the node's child a direct child of node's parent

  • Case 3: node has two children:
    • Find the node's successor.
    • The successor will have at most one child.
       

      In-Class Exercise 3.5: Why is this true?

    • Splice out the successor using Case 1 or 2.
    • Replace the node with the successor, setting pointers correctly.

  • Example:
    • Delete key="C" (Case 1: no children ⇒ direct deletion)

    • Delete key="F" (Case 2: one child ⇒ replace with child "G")

    • Delete key="D" (Case 3: two child ⇒ splice out and replace with successor "E")

    • Final tree:

 

Analysis:

  • What are we analysing?
          ⇒ Time taken for operations: insert, search, delete.

  • Let h be the length of the longest path from the root to a leaf.

  • Time per insert: O(h).

  • Time for n insert's: O(nh).

  • Time per search: O(h).

  • Time per delete:
    • O(h) time to find node.
    • O(h) time to find successor (if needed)
    • O(1) (constant) time to splice, replace.
    ⇒ total time: O(h).

  • Worst case: h = n.

  • Best case: h = log(n).
 

In-Class Exercise 3.6:

  • For a complete and balanced binary tree with n nodes, show that the height is O(log2(n). Start by computing the number of nodes in a balanced binary tree when it's height is h.
  • For a complete and balanced ternary tree (three-way branching at each node) with n nodes, what is the height?
 


3.3   AVL trees

 

Motivation:

  • Binary search trees can get very "unbalanced" quite easily
    e.g., when data inserted is almost-sorted.

  • AVL idea: hey, why not enforce balance?
 

Key ideas:

  • Use tree rotations to force balancing.

  • A rotation re-organizes the tree locally.
    (at the root of a subtree).

  • Maintain left-height and right-height in each node:
    left-height = worst height of (longest path in) left subtree
    right-height = worst height of left subtree

  • During insertion: if heights are unbalanced, restore balance with rotations.

    Note: AVL = Adelson-Velskii and Landis (authors of original paper)

 

Rotations:

  • The (sub) tree rooted at G is unbalanced.
          ⇒ left-height is larger.
          ⇒ rotate left-child of G into root position:

    Note:

    • Rotation preserves in-order.
    • \(S_1, S_2, S_3\) refer to entire sub-trees. whereas E and G are single nodes (keys).

  • Example of larger right-height (left-rotate):

  • A case where it fails:

  • To restore, use double rotation: (doubleRotateLeftChild operation)

    followed by

  • Note: We've revealed parts of the subtree labeled \(S_2\).
    • The root of that subtree is the key "F".

  • Example of double-rotation on the right side: (doubleRotateRightChild operation)

    followed by

 

In-Class Exercise 3.7:

  1. Consider the following unbalanced tree:

    Perform a left rotation of node 8. Then perform a rotation of node 8 into the root position.

  2. For review purposes (this part does not need to be submitted), write down the (recursive) pseudocode for binary-search-tree insertion, without looking at any notes. Then insert D G B C A E F in that order. How does the stack change when inserting F?
 

Insertion in an AVL tree:

  • Pseudocode:
    
    Algorithm: insert (key, value)
    Input: key-value pair
    1.   if tree is empty
    2.     create new root with key-value pair;
    3.     return
    4.   endif
         // Otherwise use recursive insertion 
    5.   root = recursiveInsert (root, key, value)
      

    
    Algorithm: recursiveInsert (node, key, value)
    Input: subtree root (node), key-value pair
         // Bottom out of recursion 
    1.   if node is empty
    2.     create new node with key-value pair and return it;
    3.   endif
         // Otherwise, insert on appropriate side 
    4.   if key < node.key
           // First insert. 
    5.     node.left = recursiveInsert (node.left, key, value)
    6.     // Then, check whether to balance. 
    7.     if height(node.left) - height(node.right) > 1
    8.       if insertion occurred on left side of node.left
    9.         node = rotateLeftChild (node)
    10.      else
    11.        node = doubleRotateLeftChild (node)
    12.      endif
    13.    endif
    14.  else
           // Insert on right side 
    15.    node.right = recursiveInsert (node.right, key, value)
           // Check balance. 
    16.    if (height(node.right) - height(node.left) > 1
    17.      if insertion occurred on right side of node.right
    18.        node = rotateRightChild (node)
    19.      else
    20.        node = doubleRotateRightChild (node)
    21.      endif
    22.    endif
    23.  endif
    24.  recomputeHeight (node);
    25.  return node;
    Output: pointer to subtree
      

  • Example: insert D A C B E F G

  • A red-outlined node indicates the currently active node during recursion (not every step is shown).
    • Inserting "A"

    • Inserting "C"

    • Inserting "C" (recursion back up at "D")

    • Inserting "C": Rebalance required at "D", leftheight=2, rightheight=0, double-rotate left-child

    • Inserting "C": after double rotation

    • Inserting "B":

    • Inserting "B": recursion back up at "C" (no rebalancing needed)

    • After inserting "E" and then "F":

    • Inserting "F": recursion back up at "D" (requires rebalancing: rotate right child)

    • Inserting "F": after rotation

    • Final tree after inserting "G"

Search: same as in binary search tree.

Deletion: complicated (not covered here).
 

Analysis:

  • Cost of balancing adds no more than O(h), where h is the worst tree height.
  • Because the AVL tree remains balanced, insertion and search require O(log(n)) time per operation.
  • Extra space in each node required for storing height.
          ⇒ O(n) extra space.

In practice:

  • AVL trees are rarely used because of the difficulty of deletion, and because of the height-maintenance overhead.
  • Options: multiway trees, red-black trees, or self-adjusting binary trees.
 


3.4   Multiway trees

 

About multiway trees:

  • In a binary tree: only one key per node.

  • In a multiway tree: many keys per node (and many pointers).

  • We will study a specific kind of multiway tree where all leaf nodes will be at the same level.
          ⇒ always balanced.

  • Associate a number m, called the degree of the tree.

  • Node size: each node can contain at most 2m-1 keys.

  • Each node (except the root) must contain at least m-1 keys.
          ⇒ important for efficiency.

  • About the root:
    • May have fewer than m-1 keys.
    • Must have at least one key.

  • An internal node with k keys contains k+1 pointers.
    (Note: k < 2m)

  • Leaf nodes do not contain pointers.

  • New elements get inserted at the leaf level.
 

In-Class Exercise 3.8: Suppose a multiway tree of degree m has n keys.

  1. Argue that the maximum possible height is obtained when each node except the root has m-1 keys.
  2. What is the maximum possible height in terms of n?
  3. What is the maximum possible height when n=109 and m=10?
 

Searching:

  • Start at root node and search recursively.

  • Search current node's keys.
    If search-key is found, done.
    Else, identify child node, search recursively.
    If child-pointer is null, key is not in tree.

  • Example: (m = 2)
    Search for `6' in this tree:


    Search for `14':

 

Insertion:

  • We will learn insertion via an example:
    • m = 2.
    • Key values are integers in this example.
    • Insertion order: 1,7,8,10,9,3,2,5,4,6,11.

    Note:

    • When m=2: at least m-1=1 value per node.
    • When m=2: at most 2m-1=3 values per node.
    • "Median" value is 2nd value.
 

Insertion example:

  • Initially, create (empty) root node:

  • After insertion of `1', `7' and `8':

  • Insert `10':
    • Root node is full ⇒ must split root.
    • Median element `7' is bumped up one level (into new root).
    • New element is added in appropriate split node.
    Step 1: split node:

    Step 2: Add new key `10':

    Step 3: Insert median element `7' with left and right pointers in level above.
          ⇒ in this case, create new root.

  • Key ideas in insertion:
    • First, find correct leaf for insertion (wherever search ends).
    • If space is available, insert in node.
    • Else, split node and "push up" median:
            ⇒ insert median element into parent node (with pointers).
    • Add new element to left or right split nodes.
    • If parent is full, split that ...etc recursively.

  • (Example continued) After insertion of `9', `3' and `2':

  • Insert `5':
    • Search for correct leaf ⇒ the 1-2-3 node.
    • Node is full ⇒ split (median is `2').
    • Add new key to correct child: the `3' node

    • Insert `2' into parent:

  • After insertion of `4' and `6':

    Note:

    • The key `7' and everything to the right of it (including pointers) are shifted to the right.
    • The pointer between `2' and `7' is overwritten by `4' and its pointers.

    Final tree after inserting `11':

Pseudocode:

  • Search:
    
    Algorithm: search (key)
    Input: a search key
    1.   Initialize stack;
    2.   found = false
    3.   recursiveSearch (root, key)
    4.   if found
    5.     node = stack.pop()
    6.     Extract value from node;
           // For insertions: 
    7.     stack.push (node)
    8.     return value
    9.   else
    10.    return null
    11.  endif
    Output: value corresponding to key, if found.
      
    
    Algorithm: recursiveSearch (node, key)
    Input: tree node, search key
    1.   Find i such that i-th key in node is smallest key larger than or equal to key.
    2.   stack.push (node)
    3.   if found in node
    4.     found = true
    5.     return
    6.   endif
         // If we're at a leaf, the key wasn't found. 
    7.   if node is a leaf
    8.     return
    9.   endif
         // Otherwise, follow pointer down to next level
    10.  recursiveSearch (node.child[i], key)
      

  • Insertion:
    
    Algorithm: insert (key, value)
    Input: key-value pair
    1.   if tree is empty
    2.     create new root and add key-value;
    3.     return
    4.   endif
         // Otherwise, search: stack contains path to leaf. 
    5.   search (key)
    6.   if found
    7.     Handle duplicates;
    8.     return
    9.   endif
    10.  recursiveInsert (key, value, null, null)
      
    
    Algorithm: recursiveInsert (key, value, left, right)
    Input: key-value pair, left and right pointers
         // First, get the node from the stack. 
    1.   node = stack.pop()
    2.   if node.numEntries < 2m-1
           // Space available. 
    3.     insert in correct position in node, along with left and right pointers.
    4.     return
    5.   endif
    6.   // Otherwise, a split is needed. 
    7.   medianKey = m-th key in node;
    8.   medianValue = m-th value in node;
    9.   Create newRightNode;
    10.  Place keys-values-pointers from 1,...,m-1 in current (left) node.
    11.  Place keys-values-pointers from m+1,...,2m-1 in newRightNode;
    12.  Put key and value in appropriate node (node or newRightNode).
    13.  if node == root
    14.    createNewRoot (medianKey, medianValue, node, newRightNode)
    15.  else
    16.    recursiveInsert (medianKey, medianValue, node, newRightNode)
    17.  endif
      
 


3.5   Red-Black trees

 

Recall the advantages of a multiway tree:

  • The path length to any leaf is the same
        ⇒ the tree is always balanced

  • This guarantees \(O(\log n)\) operations (search, insert).

But the disadvantages:

  • If the nodes are mostly half-empty, that wastes space.

  • There are two parts to searching:
    • Finding the right node (tree search)
    • Searching within a node

  • We've lost the simplicity (search, insert) of the binary tree.
 

This raises the question: is there a hybrid between a multiway tree and a binary tree?

The answer is yes, a red-black tree.
 

A red-black tree is really a binary-tree representation of a multiway tree of degree 2:

  • A red-black tree is equivalent to a multiway tree of degree 2.

  • But each node in a red-black tree is like a binary tree node but with a color (red or black):
    • Three pointers: left, right, parent.
    • A color: red or black.

  • Think of black nodes as representing the nodes in the equivalent multiway tree.

  • Think of red nodes as additional nodes needed when values inside a multiway node are pulled out to become binary tree nodes:

  • The links are not colored, but we have drawn them so because:
    • A black node with one or two red children (and therefore red links) should be thought of as one multiway node.
    • It's as if the red links are elastic and can be retracted to pull the red nodes into a glommed-together multiway node.

  • Searching in a red-black tree is the same as in a regular binary tree, in contrast to a multiway tree:

 

Insertion is the tricky operation, so let's take a closer look:

  • We of course do not maintain a multiway tree and then convert to red-black:
        ⇒ We're only showing them side-by-side for conceptual understanding

  • Also, recall the easy and hard parts of insertion in multiway tree:
    • When it's easy: insertion occurs in a leaf node that already has space.
    • When it's hard: when the leaf node is full, and the node is split, which can percolate up all the way to the root.

  • Correspondingly, the two easy/hard cases for a red-black tree are:
    • Easy: when insertion occurs as a child of a black leaf.
    • Hard: the other case.

  • An example of the easy case: Consider inserting "0" into

    • In the multiway equivalent, this is just plain insertion into the leaf containing "1" but where "1" is now the center value:

    • In the red-black version:

  • We'll do the hard case in two parts:
    1. When the multiway splits just once.
    2. When the multiway percolates up further towards the root.

  • Consider inserting "5" into

    For the multiway tree:

    • We first need to split the "1-2-3" node and add "5" to the right child:

    • Then "2" gets into the node above (which happens to be the root here):

  • What is the equivalent of "moving up" in the red-black tree?
        ⇒ A rotation!
    (And, sometimes, a change of color)

  • Let's follow the insertion of "5" into the equivalent red-black tree:

    • After searching, we insert "5" as the right child of "3":

    • Now consider what we did with the multiway: we split the node containing "1-2-3".
    • Here, "5" is the new center value of the equivalent multiway right-side node.
    • Accordingly, to make "5" a center (black) value, we rotate "5" into its parent's position in the red-black:

    • This gives:

    • Lastly, in the multiway tree, "2" moves into the node with "7".
    • To make this happen in the red-black, we need to change the color of "2" to red (to make it part of "7"):

 

Can we boil down the red-black cases into some simple "rules"?

  • When the newly inserted node (example: insert '0') has a black parent:

    • Make the new node red.
    • Insertion is complete (no need to rotate).

  • When the newly inserted node ('5' in the example below) has both a red parent and red "uncle" (meaning, a full equivalent multiway node that needs splitting), perform a rotation:

  • This, in turn causes the new parent "2" (after rotation) to change color:

  • Finally, we need to consider that, whenever a value is being rotated up, the next level could be "full" (in the multiway sense): red parent, red uncle

    In this case, rotation continues upwards.

 

In-Class Exercise 3.9: Draw the red-black tree corresponding to this multiway tree:

Then, insert the value "11" and, side-by-side, show the steps in both the multiway tree and the red-black tree.
 

In-Class Exercise 3.10: Argue that the number of black nodes counted along a path from the root to any leaf is the same for all leaves. Also, why is true that every red node must have a black parent?
 

Analysis:

  • Since the number of black nodes along any path from root to leaf is the same, the tree is balanced as far as black nodes are concerned.

  • Thus, the path length in terms of black nodes is at most \(\log n\) for n keys in the tree.

  • Each red node can lengthen the depth of a black node by at most 1.

  • Thus, the worst path length is no more than \(2\log n\) .

  • Search, which is a single walk down the tree, therefore takes at most \(O(\log n)\).

  • An insert operation includes a search, and can result in rotations back up to the root, resulting in a constant number of operations per rotation.

  • Which means insert overall takes at most \(O(\log n)\).
 


3.6   Self-adjusting binary trees

 

What does "self-adjusting" mean?

  • Consider this linked list example:
    • The list stores the 26 keys "A" ... "Z".
    • For each letter, we store associated "value" information.
    • Thus we have 26 key-value pairs.

  • Typical linked-list access:
    • To access a particular key: start from the front and walk down the list until the key is found.
    • The time taken to find a particular key is the number of keys before it in the list.

  • Suppose we store the keys in the order "A" ... "Z":
    • Is this the optimal order?
    • What about the order "E" "T" "A" "I" "O" "N" ... ?
 

In-Class Exercise 3.11: Suppose the frequency of access for letter "A" is fA, the frequency of access for letter "B" is fB ... etc. What is the best way to order the list?
 

  • In many applications, the frequencies are not known.
  • Yet: successive accesses provide some frequency information.
  • Idea: why not re-organize the list with each access?
 

Self-adjusting linked lists:

  • Move-To-Front strategy:

    • Whenever an item is accessed, move the item to the front of the list.
    • Cost of move: O(1) (constant).
    • More frequently-accessed items will tend to be found towards the front.

  • Transpose strategy:

    • Whenever an item is accessed, swap it with the one just ahead of it
      (i.e., move it towards the front one place)
    • Cost of move: O(1).
    • More frequently-accessed items will tend to bubble towards the front.
 

In-Class Exercise 3.12: Consider the following 5-element list with initial order: "A" "B" "C" "D" "E".

  1. For the Move-To-Front strategy with the access pattern "E" "E" "D" "E" "B" "E", show the state of the list after each access.
  2. Do the same for the Transpose strategy with the above access pattern.
  3. Create an access pattern that works well for Move-To-Front but in which Transpose performs badly.
 

About self-adjusting linked lists:

  • It can be shown that Transpose performs better than Move-To-Front on average.

  • It can be shown that no strategy can result in 50% lower access costs than Move-To-Front.

  • The analysis of self-adjusting linked lists initiated the area of amortized analysis: analysis over a sequence of accesses (operations).
 

Self-adjusting binary trees:

  • Where should frequently-accessed items be moved?
          ⇒ near the root.

  • But ... re-arranging the tree might destroy the "in-order" property?

  • To "move" an element towards the root: use rotations.
          ⇒ this keeps the "in-order" property intact.
 

Using simple rotations:

  • A rotation can move a node into its parent's position.

  • Goal: use successive rotations to move a node into root position.

  • However:
    • Consider this example:
      Tree before rotating "C" to the root:

       

      In-Class Exercise 3.13: Show the intermediate trees obtained in rotating "C" to the root position.


      After rotating "C" to the root:


      ⇒ simple rotations can push "down one side".

    • It can be shown that simple rotations can leave the tree unbalanced.
 

Using splaystep operations:

  • The splaystep is similar to the AVL tree's double rotation, but is a little more complicated.

  • Note: Below, \(S_1, S_2, ...\) etc above refer to whole subtrees and are not keys, whereas E and G are single nodes (keys).

  • There are 5 cases:
    • CASE 1: the target node (to move) is a child of the root.
      • CASE 1(a): target node (E, in this case) is the root's left child:
              rotate node into root position.
              ⇒ rotate left child of root (in AVL terms).

      • CASE 1(b): target node (G, in this case) is the root's right child:
              rotate node into root position.
              ⇒ rotate right child of root.

    • CASE 2: the target (E) and the target's parent (G) are both left children
      • First, rotate the parent into its parent's (grandparent's) position.
              This will place the target at its parent's original level.
              To do this: rotate the left child of the grandparent.
      • Now rotate the target into the next level
              This will place the target at its grandparent's original level.
              To do this: rotate the left child of the parent (which is now in the grandparent's old position)

    • CASE 3: the target (J) and the target's parent (G) are both right children
      • First, rotate the parent into its parent's (grandparent's) position.
              ⇒ rotate the right child of the grandparent.
      • Now rotate the target into the next level
              ⇒ rotate the right child of the parent

    • CASE 4: the target (G) is a right child, the parent (E) is a left child.
      • First, rotate the target into the parent's old position.
              ⇒ rotate the parent's right child.
      • Now the target is a (left) child of the original grandparent.
      • Next, rotate the target into the next level.
              ⇒ rotate left child of grandparent.

    • CASE 5: the target (G) is a left child, the parent (J) is a right child.
      • First, rotate the target into the parent's old position.
              ⇒ rotate the parent's left child.
      • The target is now a (right) child of the original grandparent.
      • Next, rotate the target into the next level.
              ⇒ rotate right child of grandparent.

  • Thus, a target node may move up one or two levels in a single splaystep operation.

  • Through successive splaystep operations a target node can be moved into the root position.
 

Search:

  • Recursively traverse the tree, comparing with the input key, as in binary search tree.

  • If the key is found, move the target node (where the key was found) to the root position using splaysteps.

  • Pseudocode:
    
    Algorithm: search (key)
    Input: a search-key
    1.   found = false;
    2.   node = recursiveSearch (root, key)
    3.   if found
    4.     Move node to root-position using splaysteps;
    5.     return value
    6.   else
    7.     return null
    8.   endif
    Output: value corresponding to key, if found.
       
    
    Algorithm: recursiveSearch (node, key)
    Input: tree node, search-key
    1.   if key = node.key
    2.     found = true
    3.     return node
    4.   endif
         // Otherwise, traverse further 
    5.   if key < node.key
    6.     if node.left is null
    7.       return node
    8.     else
    9.       return recursiveSearch (node.left, key)
    10.    endif
    11.  else
    12.    if node.right is null
    13.      return node
    14.    else
    15.      return recursiveSearch (node.right, key)
    16.    endif
    17.  endif
    Output: pointer to node where found; if not found, pointer to node for insertion.
       
 

Insertion:

  • Find the appropriate node to insert input key as a child, as in a binary search tree.

  • Once inserted, move the newly created node to the root position using splaysteps.

  • Pseudocode:
    
    Algorithm: insert (key, value)
    Input: a key-value pair
    1.   if tree is empty
    2.     Create new root and insert key-value pair;
    3.     return
    4.   endif
         // Otherwise, use search to find appropriate position 
    5.   node = recursiveSearch (root, key)
    6.   if found
    7.     Handle duplicates;
    8.     return
    9.   endif
         // Otherwise, node returned is correct place to insert. 
    10.  Create new tree-node and insert as child of node;
    11.  Move newly created tree-node to root position using splaysteps;
        
 

Deletion:

  • Same as in binary search tree.

Analysis:

  • It is possible to show that occasionally the tree gets unbalanced
          ⇒ particular operations may take O(n) time.

  • Consider a sequence of m operations (among: insert, search, delete).

  • One can show: O(m log(n)) time for the m operations overall.

  • Thus, some operations may take long, but others will be short
          ⇒ amortized time is O(log(n)) per operation.

  • Note: amortized O(log(n) time is stronger (better) than statistically-average O(log(n) time.
 

In-Class Exercise 3.14: Show the intermediate trees and final tree after "J" is moved into the root position using splaysteps. In each case, identify which "CASE" is used.

 

Lastly, are there other kinds of useful "tree" data structures?

Yes. Some examples (not covered in this course):

  • Binomial heap, skew heap (both as priority queues)
  • kd-tree, R-tree, quadtree (all three for geometric data)



© 2001, Rahul Simha