Data Structures



I.

Preliminaries

II.

Overview of Stacks and Queues

III.

Records and Pointers

IV.

Linked Lists

V.

Graphs

VI.

Trees

VII.

Binary Search Trees

VIII.

Perfect Trees and Almost Complete Trees

IX.

Heaps

X.

Sets and Union-Find


I. Preliminaries

  1. What is a data structure?

  2. The general problem of data structure design
  3. When do we need a data structure?

II. Overview of Stacks and Queues

  1. A stack is a data structure S that supports three operations:

  2. The pushing (insertion) and popping (detetion) are done at one and the same end of the stack, namely, its top.

  3. One implementation is an array S[1:N] and a global index I pointing to the top of the stack, assuming that S[I] is empty but S[I-1] is full
    Procedure Push (S,a)
    begin
    S[I] := a;
    I := I+1;
    end

    function Pop(S)
    begin
    I := I-1;
    return(S[I]);
    end

    function Top(S)
    begin
    return(S[I-1]);
    end

  4. A queue is a data structure Q that supports two operations:

  5. One implementation is a circular array Q[0:N-1] and two global indexes H and T, pointing to the head and tail elements of Q.

    Q[H] is considered full, Q[T] is considered empty.

    By circular we mean: the element before Q[0] is Q[N-1],
    and the element after Q[N-1] is Q[0].
    Procedure enqueue(Q,a)
    begin
    Q[T] := a;
    T = T - 1 modulo N;
    end

    function dequeue(Q)
    begin
    H := H-1 modulo N;
    return (Q[H+1 modulo N]);
    end


III. Records and Pointers

  1. Records: A record is a built-in structure data type. It is an aggregate of several elements called fields, where each field is a variable of a standard type or of a record type.

  2. Syntax of a record:


    record name
    begin
    field declarations;
    field declarations;
    ...
    field declarations;
    end


    Example:
    record employee
    begin
    char name[1:30]
    integer SSN;
    char address[1:100]
    real salary;
    end


    Once a record has been defined, it becomes a new data type that can be used in declaring record variables:

    employee X;
    employee Y[1:100];
    record company
    begin
    char name[1:50]
    char address[1:100]
    employee Z[1:1000]
    end

    to access an individual field F of a record R, use
    R.F

    Examples: X.SSN := 123456789;
    X.salary := X.salary + 1000;

  3. Pointers data types
    A pointer is a data type whose contents are addresses of other variables of a specified type.


    Examples: charpointer p; intpointer q[1:n]; employeepointer r;
    charptr p; intptr q[1:n]; employeeptr r;

    to access the field F of a record pointed to by a record pointer r, use
    r ---> F

  4. A self-referential record is a record that has a pointer field that points to a record of the same type.

  5. Example:
    record employee
    begin
    char name[1:30]
    integer SSN;
    employeeptr next;
    end

IV. Linked Lists

  1. Singly Linked Lists:

  2. Doubly linked lists: these are like single linked lists except that each record has a field that points to the previous record, and another field that points to the next record.


V. Graphs

  1. Definition: A graph G=(V,E) consists of a finite set V, whose elements are called nodes, and a set E, which is a subset of V x V. The elements of E are called edges.

  2. Directed vs. undirected graphs: If the directions of the edges are of significance, that is, (x,y) is different from (y,x), the graph is called directed. Otherwise, the graph is called undirected.

  3. Graph concepts:

  4. Graph representations: Adjacency lists and adjacency matrices
    Given a graph G=(V,E) of n nodes, G can be represented by a matrix A[1:n,1:n], where
    A[i,j]=1 if (i,j) is an edge;
    A[i,j]=0 if (i,j) is not an edge.
    The nodes are assumed to be conveniently labeled 1,2,...,n.
    A is called an adjacency matrix.

    G can also be represented by what's called adjacency lists: an array P[1:n] of pointers, where P[i] points to the linked list of all the nodes ajdacent from node i. Each one of those nodes is represented by a simple record of two fields: the node label and the next pointer.

VI. Trees

  1. Definition: a tree is a connected acyclic graph (i.e., it has no cycles).
  2. Rooted trees: A rooted tree is a tree where one node is designated as root. By holding the tree at its root, and letting the other nodes descend from it, we get a hierarchical structure. In that structure, every node has a parent, and some nodes have one or more children.
  3. Tree concepts:
  4. Binary trees: rooted trees where every node has at most two children

  5. Representation of binary trees: every node s represented by a record that has at least three fields: the node label (or contents), a pointer to the left child, and a pointer to the right child. The tree is pointed to by the pointer to the root node.

VII. Binary Search Trees

  1. A binary search trees (BST) is a binary tree T where

  2. BST supports three operations: insert, delete, and search
  3. In the three algorithms below, T is a pointer to the tree root, and a a is a key.


    The search function returns a pointer to the record containing a, if any. Otherwise, it returns nil.

    functionsearch(T,a)
    begin
    nodeptr p;
    p=T;
    while (p != nil and p --> key != a) do
    if a < p --> key then
    p := p --> left;
    else
    p := p --> right;
    endif
    endwhile
    return (p);
    end search

    The insert procedure inserts the key a into T.
    procedure insert(T,a)
    begin
         nodeptr p;
         p=T;
         Bool done = false;
         while not done do
            if a <= p --> key then
                 if p --> left != nil then
                     p = p --> left;
                 else
                     p --> left = new(node);
                     p --> left --> key := a;
                     done =true;
                 endif
             else
                 if p ---> right != nil then
                     p = p --> right;
                 else
                     p --> right = new(node);
                     p --> right -- > := a;
                     done = true;
                 endif
             endif
         endwhile
    end insert

    procedure  delete(T,a)	 
    begin 
    	nodeptr  p,q,r,s; 
    	integer  direction; 
    	p = T; 
    	while  (p != nil and p-->key != a) do 
    		if a < p-->key then 
    			q := p; 
                           	p := p-->left; 
    		 	direction := 0; 
                 	else 
    		 	q := p; 
                    	p := p-->right; 
    			direction := 1; 
                 	endif 
            endwhile 
    	if p == nil then 	 
    	 	return; 
    	elseif p-->left == nil and p-->right == nil then 
    		/* p has no children */ 
    		/* delete node pointed to by p and exit */ 
    		if direction == 0 
    			q-->left = nil; 
    		else 
    			q-->right = nil 
    		endif 
    		free (p); 
    	elseif p-->left == nil then 	
    		/* p has only one child, the right one */ 
    		if direction == 0 then 
    			q-->left := p-->right; 	
    			/* shortcut from parent to grandchild */ 
    		else 
    			q-->right := p-->right; 
    		endif 
    	elseif p-->right == nil then 
    		/* p has only one child, the left one */ 
    		if direction == 0 then 
    			q-->left := p-->left; 
    		else 
    			q-->right := p-->left; 
    		endif
    	else	/* p has two children*/ 
    		/* find the maximum node in the left subtree*/
    		/* subtree of p */ 
    		s := p-->left; 
    		q := p;	
    		/* now q will be the parent of s , and direction*/
    		/* will indicate the type of child s is to q*/ 
    		direction = 0; 
    		while s-->right != nil do 
    			q := s; 
    			s := s-->right; 
    			direction := 1; 
    		endwhile 
    		/*Now s points to the maximum node in the */
    		/*left subtree of p*/ 
    		p-->key := s-->key; 
    		/* now node s must be deleted. But since s has */
    		/* no right child, the deletion is done by */ 
    		/* deletion or shortcutting */ 
    		if s-->left == nil then 	/* s is a leaf*/ 
    			if direction == 0 then 
    				q-->left := nil; 
    			else 
    				q-->right := nil; 
    			endif 
    			free (s); 
    			return; 
    		else		/* s has a left child */ 
    			if direction == 0 then 
    				q-->left := s-->left; 
                           	else 
                                   	q-->right := s-->left; 
    			endif 
    			free(s) 
    			return; 
    		endif 
    	endif 
    end delete 
    


  4. the time complexity of insert, delete, and search is O(h) each, where h is the height of the tree.

VIII. Perfect Trees and Almost Complete Trees

  1. A perfect binary tree is a binary tree where every non-leaf has two children and all the leaves are at the same level.

  2. Exercises: show that the number of nodes in level i of a perfect binary tree is 2i. Show also that a perfect binary tree of height h has 2h+1 - 1 nodes.

  3. The canonical labeling of a perfect binary tree is a labeling of the nodes from top to bottom, left to right, starting with the root being labeled 1.

  4. Observe that the labels of the children of node i are 2i and 2i+1. The label of the parent of i is [i/2] (integer division).

  5. An almost complete binary tree of n nodes is the binary tree consisting of the first n nodes of a perfect binary tree.

  6. Observe that if the bottom level of an almost complete binary tree is removed, we are left with a perfect tree. Also, the nodes in the bottom level are packed to the left end of the tree without any "holes"
  7. an almost complete binary tree of n nodes can be implemented with an array A[1:n] of n elements, where A[i] contains the contents of node of label i.

IX. Heaps (min-heaps and max-heaps)

  1. Definition: a min-heap is an almost complete binary tree where every node has a value smaller than the values of its children

  2. Operations on a min-heap: delete-min and insert
    function delete-min(H) /* H is the heap*/
    begin
        x= root of H;
    a=key of x; /* to be returned at the end*/
    remove a from node x;
    take the last node (node n), remove its key (call it b), and store b in the root;
    remove node n;
    /* now we have to restor the heap*/
        while (x has a key bigger than one of its children) do
    swap x with the smaller child;
    make x point to that child;
    endwhile
        /* note the while loop will stop when x becomes smaller */
        /* than both its children, or when x becomes a leaf*/
    end delete-min

    Time of delete-min: O(h)=O(log n)


    procedure insert(H,a)         /* inserts the key value a into the heap*/ begin
        create a node of label n+1, and insert a into that node;
    let x point to that node;
    while (x is smaller than its parent) do
               swap the key of x with key of the parent of x;
    let x point to its parent;
        endwhile
        /* this will loop until either x becomes larger than its parent, or it becomes the root*/
    end insert
    Time of insert: O(h)=O(log n)

  3. Note that with insert(), we can create a heap from scratch by n insertions into an intially empty heap. This takes time =
    O(log 1 + log 2 + ... + log n) = O(log(n!))= O(nlog n).
    There is a procedure for constructing a heap in O(n) time.

  4. By starting with a heap, and calling delete-min n times, we sort the heap in time =
    O(logn + log(n-1) + ... + log 1) = O(nlog n).

X. Sets and Union-Find

  1. General implementations of sets
  2. Union-Find problem: Given n sets, where set i has a single element {i}, we need to perform O(n) operations of union (U) and find (F):

  3. note that the sets are mutually disjoint at all times.

  4. the problem is to design a data structure so that O(n) calls to U and F take as little time as possible.

  5. we will carry out the design by having two different representations of the sets: one conceptual and one physical.

  6. the conceptual representation of each set is a rooted tree (not necessarily binary). The nodes are labeld with the elements of the corresponding set. The label of the root serves the extra purpose of being the name of the set.

  7. the physical representation of all the trees (at once) is a single array called PARENT[1:n].

  8. if i is not a root, PARENT[i] is the parent of node i in whatever tree i happens to be.

  9. if i is a root, PARENT[i]= -(number of nodes in the tree rooted at i).
  10. three different implementations of the Union-Find

  11. first implementation
    U(i,j) is carried out by simply making i the parent of j, thus combining the two trees into a single tree, and at once eliminating the old trees as distinct sets.

    F(x) returns the root of the tree containing x by walking up that tree from x, using the PARENT array, until a node is reached which has no parent. That is the root.

    procedure U(i,j)
    begin
         PARENT[i]=PARENT[i]+PARENT[j]; /* this computes the total number of nodes in the new tree */
    PARENT[j]=i;
    end

    Time: O(1), that is, constant.




    function F(x)
    begin
         integer r=x;
    while (PARENT[r] > 0) do
    r = PARENT[r];
    endwhile
         /*now r is a root*/
    return(r);
    end

    Time: O(h) where h is the height of the tree containing x.
    time for O(n) U's and F's: up to O(n2). The proof is an exercise.

  12. Second Implementation
    in U(i,j), make the root of the heavier tree become the parent of the other root. By heavier we mean "has more nodes".

    F(x) is not modified here.

    procedure U(i,j)
    begin
         if |PARENT[i]| >= |PARENT[j]| then
                  PARENT[i]=PARENT[i]+PARENT[j];
    PARENT[j]=i;
         else
                  PARENT[j]=PARENT[i]+PARENT[j];
    PARENT[i]=j;
         endif
    end
    time: O(1)
    Time of an O(n) sequence of U's and F's: O(nlog n).
    This is because every tree is of height <= log(its number of nodes).
    The proof for that is by induction on the number of U's performed.
    Therefore, every tree is of height <= O(log n).
    Hence, every F(x) takes O(log n) time.
    As a result, O(n) U's takes O(n), and O(n) F's take O(nlog n).
    The total time for O(n) U's and F's is then O(nlog n).

  13. Third implementation
    U will remain the same as in the second implementation.
    F will be changed using a technique called path compression. In particular, F(x) first finds the root. Afterwards, in a second walk from x to the root, every node on that path is made a direct child of the root.
    function F(x)
    begin
         integer r,s,t;
    r=x;
    while PARENT[r] > 0
                 r = PARENT[r];
         endwhile
    /*now r is a root*/
    s=x;
    while s != r do
                 t := s;
    s := PARENT[s];
    PARENT[t] := r;
         endwhile
    return (r);
    end

    Time of F: O(h) where h is the height of the tree containing x.
    Time of O(n) sequence of U's and F's: O(nG(n)), where
    G(n) <= 5 for all n <= 265000. That is, for all practical values of n, the time is O(n).
    Abdou Youssef