School of Engineering and Applied Science Department of Computer Science CSci 133 -- Algorithms and Data Structures I http://www.seas.gwu.edu/~csci133/fall05 Prof. Michael B. Feldman mfeldman@gwu.edu

BINARY TREES

Michael B. Feldman, Software Construction and Data Structures, Chapter 11 (Addison-Wesley, 1996)

A binary tree is a tree all of whose vertices have out-degree < 2. Furthermore, the subtrees of a binary tree are ordered in the sense that there is a left child and a right child. If a vertex has only one child, it must be clearly identified as left or right. The two trees shown in Figure 1a are different binary trees; so are the two trees in Figure 1b.

Figure 1. Binary Trees Have Ordered Children

(a) These are different binary trees

(b) So are these

Here are two important properties of binary trees.

Strictly Binary

T is a strictly binary tree iff each of its vertices has out-degree = 0 or out-degree = 2. Vertices with out-degree = 1 are not allowed in strictly binary trees.

Balanced

Intuitively, a binary tree is balanced if it is not “heavy” on either side. There are several alternative balance conditions, each with its areas of application. For our purposes here, we will say that T is a balanced (sometimes called height-balanced) binary tree iff for each vertex v in T, the depths of v’s right and left subtrees differ by at most one. If one subtree is null, the other subtree must either be null or be a leaf.

It is important to understand that for a tree to be balanced, the property must hold for every vertex in the tree, not just its root. For all the trees in Figure 2, make sure you know why each is either balanced or not balanced.

Figure 2. Balanced (or Height-Balanced) Binary Trees

(a) These binary trees are height-balanced

(b) These binary trees are notheight-balanced

The definition of balance can also be stated recursively:

1. A binary tree consisting of a single vertex is balanced.
2. A vertex with a single subtree is balanced iff that subtree is a leaf.
3. A binary tree is balanced iff its left and right subtrees are balanced and their depths differ by at most 1.
Balance is an important property of binary search trees.

Traversals of Binary Trees

Many applications require traversing a tree in a particular way so that all the vertices are visited in a certain order. Three traversal algorithms are particularly useful in dealing with binary trees. These are sometimes called preorder, inorder, and postorder, but different authors occasionally disagree on what these three terms should mean. To avoid any confusion, we will call these algorithms by names that are more descriptive once they are understood, namely TraverseNLR, TraverseLNR, and TraverseLRN, respectively. In these names, L stands for “left subtree,” R stands for “right subtree,” and N stands for “node.” The order of the letters indicates the traversal order. For example, in TraverseNLR, a node is visited, then its left subtree is traversed, then its right subtree is traversed; in TraverseLNR, the left subtree is traversed before the node is visited, then the right subtree is traversed.

How do these algorithms work? Since every subtree of a binary tree is a binary tree, a traversal defined for a tree is also defined for any subtree. We can thus write the three traversal algorithms recursively in pseudocode, as shown in Figure 3. In each one, the details of the Visit operation are deferred, because precisely what Visit should accomplish is application-dependent. Often Visit is simply a display of the information in the node.

Figure 3. Three Recursive Binary Tree Traversal Algorithms

TraverseNLR (tree T)
{
if (T is not null)
{
Visit (T);
TraverseNLR (left subtree of T);
TraverseNLR (right subtree of T);
}
}

TraverseLNR (tree T)
{
if (T is not null)
{
TraverseLNR (left subtree of T);
Visit (T);
TraverseLNR (right subtree of T);
}
}

TraverseLRN (tree T)
{
if (T is not null)
{
TraverseLRN (left subtree of T);
TraverseLRN (right subtree of T);
Visit (T);
}
}

Figure 4 shows the steps in performing these three traversals for the given tree.

Expression Trees

One common application of binary trees is in interpreters or compilers for programming languages, where the statements of a source program are converted into trees so that the structure of the statements is apparent. As a simple case of this type, we will consider expression trees, which are transformations of arithmetic expressions into binary trees.

We will use, for simplicity, a restricted expressions that consists of single-letter identifiers or variable names, one-digit integer constants, the four arithmetic operators +, -, *, and /, and parentheses.

Constructing Expression Trees

The next section shows how to construct a scanner or parser program that can construct an expression tree for these simple expressions. For now, let us just see how to construct an expression tree manually.

We consider first only fully parenthesized expressions. An expression tree always has an operator at its root and identifiers or constants at its leaves. (The exception is an expression consisting only of a single identifier or constant; here, there is just one vertex, both root and leaf.) The root operator is the “main” operator of the expression—that is, the operator that is performed last as the expression is evaluated. Interior vertices are the operators of subexpressions.

To give a few examples, Figure 5 shows the expression trees for A, A-B, (A-B)+C, A-(B+C), and (A+B)*(C-D). Notice carefully how these trees are constructed  and make sure you understand well how (A-B)+C and A-(B+C) give rise to different trees. In (A-B)+C, the + is the main operation, since it is performed last; in A-(B+C), it is the - that is the main operation.

Try building expression trees from (A*B)-(C+(D/E)) and ((A-B)+(C/D))*E to make sure you understand how these trees are produced.

Let us now relax the condition that expressions must be fully parenthesized. We use the usual association and priority rules: + and - are priority 2 operators, * and / are priority 1 operators, and adjacent operators of equal priority associate left-to-right. The expression A+B*C will be treated as though it were parenthesized A+(B*C); A/B-C will be evaluated as though it were parenthesized (A/B)-C. So in the first expression the main operator is + and in the second it is -. Their expression trees are as shown in Figure 6. Using the left-to-right rule in the case of equal-priority operators, A-B-C is treated as though it were written (A-B)-C and A/B*C is treated as though it were written (A/B)*C.

Using the left-to-right rule in the case of equal-priority operators, A-B-C is treated as though it were written (A-B)-C and A/B*C is treated as though it were written (A/B)*C.

Now let’s look at expressions containing a mixture of parentheses and operators of both priorities. Consider first A+B-C+D. Since adjacent operators of equal priority are handled left-to-right, we treat it as though it were ((A+B)-C)+D. Now look at A-(B+C)*D. As before, the two operators of interest are - and * (the + doesn’t count because it’s inside a subexpression) and the * is done first because its priority is 1. So this expression is handled as though it were A-((B+C)*D). These trees are shown in Figure 7. Try A-B*C/(D-E) and A*B-(C+D)+E.

Traversing Expression Trees

Figure 8, 9, and 10 show the three traversals TraverseNLR, TraverseLNR, and TraverseLRN performed on the given expression trees. It is interesting that TraverseNLR produces the forward Polish, or prefix, form of the original expression, and TraverseLRN produces the RPN form.

What about TraverseLNR? This traversal turns out not to be terribly useful for expression trees, since it produces an infix form of the expression with the parentheses removed. Thus, it can lead to ambiguities, since, for example, the expressions (A-(B-C)) and ((A-B)-C), which clearly have different expression trees, have the same TraverseLNR infix form, namely A-B-C. Indeed, if numerical values were substituted for A, B, and C, the two original expressions would evaluate to different results, only one of which would result from evaluating the infix form.

Note that no similar ambiguities arise in the prefix and postfix cases. Even though TraverseLNR is not very useful for expression trees, it does have a very useful application in binary search trees.

You have seen that there is an intimate relationship between an infix expression, its tree, and its forward and reverse Polish forms. In compiler applications, some form of the expression tree is often used as a convenient intermediate internal representation of a program. An expression tree is a structure that can easily be manipulated by a program and can even be restructured to optimize the object-program instructions that are generated.

Building an Expression Tree

The algorithm to produce an expression tree is very similar to the one to produce an RPN expression; the decision process for pushing and popping operators on and off the operator stack is exactly the same. There is a difference, though.

In the RPN case, when an operand (letter or number) is scanned, it is immediately output (concatenated to the RPN string). Similarly, an operator that is popped from the stack is immediately output. In this situation, however, we need to retain those operands and operators, connecting them together in a tree. We do this by maintaining a separate stack for intermediate tree results, letting items in the stack be pointers to subtrees instead of just characters. Our operator stack is also converted to hold pointers to tree nodes; an operator is placed in such a node before being pushed.

At the end of the algorithm, a pointer to the root of the resultant tree is left on top of the node stack. Let's examine the algorithm in pseudocode. First let's define a subalgorithm popConnectPush:

PopConnectPush:
{
pop the top node off the operator stack and call it N;
pop the top node off the tree stack and make it N's right child;
pop the top node off the tree stack and make it N's left child;
push N back into the tree stack;
}

Here is the full algorithm:

Convert Expression to Tree:
{

initialize operator and tree stacks;
while (there are tokens remaining in the expression)
{
T = next token from expression;
if (T == '(')
{
create a node and store T in it;
push the node onto the operator stack;
}
else if (T is a variable or numeric literal)
{
create a node and store T in it;
push the node onto the tree stack;
}
else if (T is '+', '-', '+', or '/')
{
create a node and store T in it;
if ((operator stack is empty) or
(the value at the top of the operator stack is '(') or
(priority(operator at top of stack) < priority(T)))
{
push the node onto the operator stack;
}
else // clear operator stack and push new one onto it
{
do
{
PopConnectPush;
}
while ((the operator stack is not empty) and
(the top of the operator stack is not '(') and
(priority(T) < priority(operator at top of stack)));

create a node and store T in it;
push the node onto operator stack;
}
else if (T is ')')  // clear operator stack back to the '('
{
while (top of operator stack is not '(')
{
PopConnectPush;
}
}
else
{
}
}

// no more tokens left in expression
while (operator stack is not empty)
{
PopConnectPush;
}

// pointer to root of final tree is on top of the tree stack

}

Figure 11 shows an example of converting an expression to a tree. All the details of the nodes are illustrated.

Evaluating an Expression from Its Expression Tree

Finding the value of an expression is easy when we have a representation of that expression in tree form. The algorithm is a short and straightforward recursive one:

Evaluate(tree T)
{
if (T contains a numeric literal value)
{
return this value;
}
else if (T contains a variable)
{
return the result of evaluating this variable
}
else // T contains an operator O
{
switch (O)
{
case '+':
return Evaluate(left subtree of T) + Evaluate(right subtree of T);
case '-':
return Evaluate(left subtree of T) - Evaluate(right subtree of T);
case '*':
return Evaluate(left subtree of T) * Evaluate(right subtree of T);
case '/':
return Evaluate(left subtree of T) / Evaluate(right subtree of T);
}
}
}

(end of article)