Analyzing Binary Search Trees by simulation

Chapter: Analyzing Binary Search Trees by simulation

So now let's check our Hamlet play.


Exercise 4

What is the height of the tree resulting from inserting every word from Hamlet? How many nodes does the tree contain? What is the log (base 2) of the number of nodes in the tree? (Use your program from the previous exercise.)

One swallow does not a summer make (Aristotle). Neither does one input file suffice to demonstrate that a data structure has O(log(n)) performance. So:


Exercise 5

Find five different text files, of very differing sizes, but all between 20KB and 20MB in size. Build a binary search tree from each one. Count the nodes and measure the height of each tree. Plot a graph of tree height (y axis) against number of nodes (x axis). Then plot a graph of tree height (y axis) against log of number of nodes (x axis). Formulate and state a reasonable hypothesis linking the height to the number of nodes in a binary search tree.

In that same zipped directory that you downloaded, there is a file named mystery.txt. It's not much of a mystery since you are perfectly at liberty to view its contents. Doing so will definitely help you solve the next exercise, so you may want to avoid looking at the file until after you have tried the exercise.


Exercise 6

Load the contents of the file mystery.txt into your binary search tree. (Even if you overflow your stack, you should still be able to answer the following questions. If you have trouble, use an editor to select a small part of mystery.txt instead of using the whole thing.) How many nodes are in the resulting tree? What is the height? Do these values fit well on the graph from the previous exercise? Write a short paragraph explaining your observations.

The next exercise requires some investigation and experimentation as well as thought.


Exercise 7

Optional -- extra credit. The file mystery.txt has 69904 lines each with its own word. It should have given you a tree with 69904 nodes, right? In fact, I'm guessing you got more like 67879 words. Explain this discrepancy. Furthermore, the words in the file are in dictionary order. So how come the height of the tree is not the same as the number of nodes? Why is it significantly less than 67879?


rhyspj@gwu.edu