\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecfour}[4]{\left[\begin{array}{r} #1 \\ #2\\ #3\\ #4\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \newcommand{\bigsp}{\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\plss}{\;\;+\;\;} \newcommand{\miss}{\;\;-\;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \)


Module 11: Eigenworld

Part I: Eigenvectors, eigenvalues and eigen-applications


Module objectives

 
By the end of this module, you should be able to

 


11.0   Some examples and definitions

 

As we've seen, there are two interpretations of \({\bf Ax}={\bf b}\):

  1. The numbers in \({\bf x}\) are the scalars in the linear combination of the columns of \({\bf A}\), which results in vector \({\bf b}\).

  2. Matrix \({\bf A}\) "acts" on vector \({\bf x}\) to produce vector \({\bf b}\).

We will focus on the second interpretation.
 

Let's start with some examples of a matrix transforming a vector.
 

In-Class Exercise 1: Download MatrixVectorExample.java and examine the code. Notice that a matrix \({\bf A}\) multiplies the vector \({\bf u}\) to produce vector \({\bf v}\), plotting both vectors. Observe what happens when

  • \({\bf u}=(0,1)\)
  • \({\bf u}=(1,2)\)
  • \({\bf u}=(1,0)\)
  • \({\bf u}=(1,0)\) and \({\bf A}[0][0] = -5\).
 

We observe that the same matrix can sometimes:

  • Change a vector's direction and length.
  • Have no effect on both direction and length.
  • Change only the length, keeping the direction.
  • Flip the direction (while perhaps also changing the length).
 

We'll now explore the idea of repeatedly applying a matrix iteratively:

  • That is,
        u = some initial vector
        draw (u)
        for i=0 ...
           u = A u
           draw (u)
        endfor
      

  • Mathematically, we would describe this as the iteration \({\bf u}^{(k)} = {\bf A}{\bf u}^{(k-1)}\):
    • Let \({\bf u}^{(0)}\) be an initial vector.
    • The superscript denotes the iteration number.
    • After the first iteration, we get \({\bf u}^{(1)} = {\bf A}{\bf u}^{(0)}\)
    • After the second iteration, we get \({\bf u}^{(2)} = {\bf A}{\bf u}^{(1)}\)
    • And so on.
 

In-Class Exercise 2: Download MatrixVectorExample2.java and examine the code to verify that an initial vector is iteratively multiplied by a matrix. (You will also need DrawTool.java). Observe what happens when we change the starting vector to:

  • \({\bf u}=(1,1)\)
  • \({\bf u}=(0,1)\)
  • \({\bf u}=(0.1,1)\)
  • \({\bf u}=(1,2)\)
  • \({\bf u}=(1.1,2)\)
  • \({\bf u}=(1.01,2)\)
 

Observe that:

  • In all but one case, the iteration converges to the same positive or negative direction.
          \(\rhd\) In case, to the (1,0) direction.

  • The only case when it's different is when \({\bf u}^{(0)} = (1,2)\).
 

Note: we will consider "same direction" to mean either positive or negative (flipped).
 

Eigenvalues and eigenvectors:

  • Definition: A vector \({\bf x}\) is called an eigenvector of matrix \({\bf A}\) if $${\bf Ax}=\lambda{\bf x}$$ for some number \(\lambda\).

  • Definition: The scalar \(\lambda\) is called the eigenvalue associated with the eigenvector \({\bf x}\).

  • In other words, when \({\bf A}\) multiplies \({\bf x}\), we get a result vector \(\lambda{\bf x}\) in the same direction.

  • Example: $$ \mat{5 & -2\\ 0 & 1} \vectwo{1}{0} \eql \vectwo{5}{0} \eql 5 \vectwo{1}{0} $$ Here \({\bf x}=(1,0)\) is an eigenvector with eigenvalue \(\lambda=5\).
     

    In-Class Exercise 3: Show that \((1,2)\) is an eigenvector for the above matrix. What is the associated eigenvalue?
     

    In-Class Exercise 4: Suppose \({\bf Ax}=\lambda{\bf x}\). Let \({\bf y}=-2{\bf x}\). Show that \({\bf y}\) is an eigenvector of \({\bf A}\). What is the associated eigenvalue?

  • Thus, an eigenvector is not unique: given an eigenvector \({\bf x}\), any other vector \(\alpha{\bf x}\) for any \(\alpha\) is also an eigenvector.

  • Note: the matrix \({\bf A}\) in \({\bf Ax}=\lambda{\bf x}\) needs to be square:
    • Consider \({\bf A}_{m\times n} {\bf x}_{n\times 1}\).
    • This produces a vector of dimension \(m\times 1\).
    • The only way \({\bf Ax}=\lambda{\bf x}\) is if \(m=n\).
 

In-Class Exercise 5: Show by hand that \({\bf x}_1=(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})\) and \({\bf x}_2=(\frac{1}{\sqrt{2}},-\frac{1}{\sqrt{2}})\) are eigenvectors of \({\bf A}=\mat{2 & 1\\1 & 2}\). What are the associated eigenvalues?
 

In-Class Exercise 6: Download MatrixVectorExample3D.java, compile and execute to see an example of repeated iteration (\({\bf u} \leftarrow {\bf Au}\)) in 3D.
 


11.1   The power method

 

Iterative application of \({\bf A}\):

  • We saw earlier that iterative multiplication leads to convergence to one of the eigenvectors.

  • We will see momentarily why this works, and to which eigenvector.

  • Suppose \({\bf A}\) has eigenvectors \({\bf x}_1, {\bf x}_2, \ldots, {\bf x}_n\) with associated eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_n\).
    • \(\lambda_i\) is the eigenvalue for \({\bf x}_i\).
    • In other words: \({\bf Ax}_i=\lambda_i {\bf x}_i\).

  • Suppose also that $$ |\lambda_1| \geq |\lambda_2| \geq \ldots \geq |\lambda_n| $$ If not, we'll re-label the eigenvectors (that's our choice) so that it's true.

  • Example:
    • We saw earlier that $$ {\bf A} \eql \mat{5 & -2\\ 0 & 1} $$ has two eigenvectors \((1,2)\) and \((1,0)\) with eigenvalues \(1\) and \(5\).
    • Here, we would label $$\eqb{ {\bf x}_1 & \eql & \vectwo{1}{0}\\ {\bf x}_2 & \eql & \vectwo{1}{2} }$$ with $$\eqb{ \lambda_1 & \eql & 5\\ \lambda_2 & \eql & 1\\ }$$

  • It will turn out that iterative multiplication will lead to the eigenvector with the largest eigenvalue \(\lambda_1\).

  • Recall our algorithm:
        u = some initial vector
        for k=0 ...
           v = A u
           u = v / |v|
        endfor
      

  • Or, mathematically, $$\eqb{ {\bf u}^{(0)} & \eql & \mbox{ some initial vector }\\ {\bf v}^{(k)} & \eql & {\bf A} {\bf u}^{(k-1)} \\ {\bf u}^{(k)} & \eql & \frac{1}{|{\bf v}^{(k)}|} {\bf v}^{(k)} }$$

  • Consider a few iterations with this example:
    • Suppose we start with $$ {\bf A} \eql \mat{5 & -2\\ 0 & 1} \;\;\;\;\;\;\; {\bf u}^{(0)} \eql (1,1) $$
    • After one iteration: $$\eqb{ {\bf v}^{(1)} & \eql & {\bf A}{\bf u}^{(0)} & \eql & (3,1)\\ {\bf u}^{(1)} & \eql & \frac{1}{|{\bf v}^{(1)}|} {\bf v}^{(1)} & \eql & (0.949,0.316)\\ }$$
    • Then, the second iteration: $$\eqb{ {\bf v}^{(2)} & \eql & {\bf A}{\bf u}^{(1)} & \eql & (4.111,0.316)\\ {\bf u}^{(2)} & \eql & \frac{1}{|{\bf v}^{(2)}|} {\bf v}^{(2)} & \eql & (0.997,0.077)\\ }$$
    • Thus, we can see rapid convergence towards the eigenvector \((1,0\) in just a few iterations:

  • Let's now analyze the iterations:
    • Recall the first iteration: $$\eqb{ {\bf v}^{(1)} & \eql & {\bf A} {\bf u}^{(0)} \\ {\bf u}^{(1)} & \eql & \frac{1}{|{\bf v}^{(1)}|} {\bf v}^{(1)} }$$
    • Second iteration: $$\eqb{ {\bf v}^{(2)} & \eql & {\bf A} {\bf u}^{(1)} \\ {\bf u}^{(2)} & \eql & \frac{1}{|{\bf v}^{(2)}|} {\bf v}^{(2)} }$$
    • Let's substitute for \({\bf u}^{(1)}\) from the first iteration: $$\eqb{ {\bf v}^{(2)} & \eql & {\bf A} \frac{1}{|{\bf v}^{(1)}|} {\bf v}^{(1)} \\ {\bf u}^{(2)} & \eql & \frac{{\bf A} {\bf A} {\bf u}^{(0)}}{ |{\bf v}^{(2)}| |{\bf v}^{(1)}|} }$$
    • Thus, \({\bf u}^{(3)}\) will have three multiplications of \({\bf A}\): $$\eqb{ {\bf u}^{(3)} & \eql & \frac{{\bf A}^3 {\bf u}^{(0)}}{ |{\bf v}^{(3)}||{\bf v}^{(2)}| |{\bf v}^{(1)}|} }$$

  • Continuing, $$\eqb{ {\bf u}^{(k)} & \eql & \frac{{\bf A}^k {\bf u}^{(0)}}{ |{\bf v}^{(k)}|\ldots |{\bf v}^{(2)}| |{\bf v}^{(1)}|} }$$

  • To simplify the expression, let's define $$ \nu_k \eql |{\bf v}^{(k)}|\ldots |{\bf v}^{(2)}| |{\bf v}^{(1)}| $$

  • Then, $$ {\bf u}^{(k)} \eql \frac{1}{\nu_k}{\bf A}^k {\bf u}^{(0)} $$

  • Next, suppose that \({\bf u}^{(0)}\) can be expressed as a linear combination of the eigenvectors: $$ {\bf u}^{(0)} \eql c_1 {\bf x}_1 + c_2 {\bf x}_2 + \ldots + c_n {\bf x}_n $$

  • Let's substitute in the expression for \({\bf u}^{(k)}\): $$\eqb{ {\bf u}^{(k)} & \eql & \frac{1}{\nu_k}{\bf A}^k {\bf u}^{(0)} \\ & \eql & \frac{1}{\nu_k}{\bf A}^k (c_1 {\bf x}_1 + c_2 {\bf x}_2 + \ldots + c_n {\bf x}_n) \\ & \eql & \frac{1}{\nu_k} (c_1 {\bf A}^k {\bf x}_1 + c_2 {\bf A}^k {\bf x}_2 + \ldots + c_n {\bf A}^k{\bf x}_n) \\ }$$

  • But for any eigenvector \({\bf x}_i\): $$ {\bf A}^k {\bf x}_i \eql \lambda_i^k {\bf x}_i $$
     

    In-Class Exercise 7: Show that this is true.
     

  • Thus, substituting in the expression we derived for \({\bf u}^{(k)}\): $$\eqb{ {\bf u}^{(k)} & \eql & \frac{1}{\nu_k} (c_1 {\bf A}^k {\bf x}_1 + c_2 {\bf A}^k {\bf x}_2 + \ldots + c_n {\bf A}^k{\bf x}_n) \\ & \eql & \frac{1}{\nu_k} (c_1 \lambda_1^k {\bf x}_1 + c_2 \lambda_2^k {\bf x}_2 + \ldots + c_n \lambda_n^k{\bf x}_n) \\ & \eql & \frac{\lambda_1^k}{\nu_k} (c_1 {\bf x}_1 + c_2 (\frac{\lambda_2}{\lambda_1})^k {\bf x}_2 + \ldots + c_n (\frac{\lambda_n}{\lambda_1})^k{\bf x}_n) \\ }$$ where we've divided by \(\lambda_1^k\).

  • Now consider what happens as \(k\rightarrow\infty\).

  • First, notice that $$ (\frac{\lambda_i}{\lambda_1})^k \;\rightarrow \; 0 $$ for any \(i\neq 1\).
     

    In-Class Exercise 8: Why is this true?
     

  • This means that $$ \lim_{k\rightarrow\infty} {\bf u}^{(k)} \eql \frac{c_1\lambda_1^k}{\nu_k} {\bf x}_1 $$

  • That is, \({\bf u}^{(k)}\) converges to the direction of the eigenvector \({\bf x}_1\)

  • By design (normalization) $$ |{\bf u}^{(k)}| \eql 1 $$ and so $$ \lim_{k\rightarrow\infty} {\bf u}^{(k)} \eql \alpha {\bf x}_1 $$ for some constant \(\alpha\).

  • What we've shown is that the iterative process yields the eigenvector with the largest eigenvalue.

  • We can also recover the eigenvalue corresponding to that eigenvector:
    • Suppose \({\bf x}\) is an eigenvector of \({\bf A}\): $$ {\bf Ax} \eql \lambda {\bf x} $$
    • Then, $$ {\bf Ax}\cdot {\bf x} \eql \lambda {\bf x} \cdot {\bf x} $$
    • And so, $$ \lambda \eql \frac{{\bf Ax}\cdot {\bf x}}{{\bf x} \cdot {\bf x}} $$

  • In pseudocode:
        Algorithm: powerMethod (A)
        Input:  a matrix A
        1.   u = some initial vector
        2.   for k=0 ...
        3.      v = A u
        4.      u = v / |v|
        5.   endfor
        6.   λ = (A u · u) / (u · u) 
        7.   return λ, u
      
 

In-Class Exercise 9: Download PowerMethodExample.java, compile and execute. For illustration, the code also computes \({\bf A}^k {\bf u}^{(0)}\). Is the resulting vector the same eigenvector as computed in the loop? Explain.
 


11.2   The page rank algorithm, part I: web and surfer model

 

Modeling the web as a graph:

 

Graph structure:

  • We will make some assumptions about the graph structure (not the original web structure).

  • Reachability assumption: It is possible to start at any node and reach any other node through link (edge) following.

  • Of course, this is not a realistic assumption for the web and so, we will add links to the graph to induce reachability.
          \(\rhd\) We will show later how this is done.

  • For the moment, let's assume reachability.
 

Random surfer model:

  • Imagine a surfer starting at some node.

  • The surfer picks a random link to follow. And repeats this process.

  • Thus, the high-level idea is:
        i = initial node
        for k=0 to ...
            j = randomly selected node from N(i)
            i = j
        endfor
        

  • This process of walking around a graph is sometimes called a random walk on graphs.

  • Next, we'll count how often a node is visited
    • Suppose we have simulated random surfing for \(t\) steps.
    • Let \(v_i(t) = \) the total number of times we land on node i during steps \(0,\ldots,t\).
     

    In-Class Exercise 12: Click "start" (which begins at node 0), and then "next" 10 times, observing the node reached at each step. Write down \(v_i(10)\) for each node \(i\). Then, click "run" to watch a longer simulation of the random walk.
     

  • The number $$ \frac{v_i(t)}{t} $$ for some large \(t\) is an estimate of the fraction of time a random walk (random surfer) spends on page \(i\).
     

    In-Class Exercise 13: Click "start" and "fast-run", and let it run for a full minute before clicking on "stop". The values \(\frac{v_i(t)}{t}\) are printed out.
     

  • Define $$ \pi_i \eql \lim_{t\rightarrow\infty} \frac{v_i(t)}{t} $$ and the vector $$ {\bf \pi} \eql (\pi_1,\pi_2,\ldots,\pi_i,\ldots, \pi_n) $$

  • If we could compute \(\pi_i\) for each page \(i\) analytically, without simulation, we would have a a way to rank pages:
          \(\rhd\) The higher \(\pi_i\) is, the higher the rank of page \(i\).

  • Then, the page rank algorithm
        Algorithm: pageRank (G)
        Input: a directed, reachable graph G
         1.    π = computePageProbabilities (G)
         2.    sort pages according to πi's
        

  • Page ranks are computed occasionally, based on how the web structure is assumed to change.

  • Given a search query like "Hello world"
        Algorithm: search (s)
        Input: a search query s
         1.    Find subset of pages relevant to s
         2.    Sort subset pages according to πi's
         3.    Report list in order of rank (highest first)
        
     

    In-Class Exercise 14: How does one find the subset of pages relevant to a search query?
     

 


11.3   The page rank algorithm, part II: an eigenvalue problem

 

We'll now develop some of the theory to compute \({\bf \pi}\), the vector of page (node) probabilities:

  • First, recall how we defined \({\bf \pi}\):
    • Each $$ \pi_i \eql \lim_{t\rightarrow\infty} \frac{v_i(t)}{t} $$ which is the long-term fraction of time that the random surfer spends time on page \(i\).
    • \({\bf \pi}\) is merely the vector of these probabilities: $$ {\bf \pi} \eql (\pi_1,\pi_2,\ldots,\pi_i,\ldots, \pi_n) $$

  • Next, define the transition probabilities $$ P_{ij} \eql \mbox{ the probability of jumping from i to j} $$ and the transition probability matrix $$ {\bf P} \eql \mat{P_{11} & P_{12} & \ldots & P_{1n}\\ P_{21} & P_{22} & \ldots & P_{2n}\\ \vdots & \vdots & \vdots & \vdots \\ P_{n1} & P_{n2} & \ldots & P_{nn} } $$
     

    In-Class Exercise 15: For our running example, write down the transition probability matrix. Then, un-comment Section B of the code in PageRank, compile and execute. Compare your matrix with the one printed out. Explain the relationship between the entries of the matrix and \(N(i)\).
     

  • We claim that $$ {\bf \pi} \eql {\bf \pi}^T {\bf P} $$ which we will explain below.

  • If this were true, we take transposes of both sides: $$ {\bf \pi}^T \eql {\bf P}^T {\bf \pi} $$

  • Define the matrix \({\bf A} = {\bf P}^T\).

  • Then, we've rewritten this as $$ {\bf A} {\bf \pi} \eql {\bf \pi} $$ Or, in our more familiar form: $$ {\bf A} {\bf x} \eql {\bf x} $$

  • Clearly, this makes \({\bf \pi}\) an eigenvector of the matrix \({\bf A}\).
     

    In-Class Exercise 16: What is the corresponding eigenvalue?
     

  • Furthermore, if it so happens that \({\bf \pi}\) is the eigenvector corresponding to the highest eigenvalue, then we can use the power method.
     

    In-Class Exercise 17: Un-comment Section C in the code to run the power method. Examine the code in the power method to confirm that we are indeed applying the power method. Compare the resulting probabilities \({\bf \pi}_i\) with the estimates \(\frac{v_i(t)}{t}\) printed out (after simulating a few minutes).
     

  • We thus have a way to compute the vector \({\bf \pi}_i\) as a first step towards ranking the pages:
        Algorithm: pageRank (G)
        Input: a directed, reachable graph G
         1.    P = makeTransitionMatrix (G)
         2.    A = PT
         3.    π = powerMethod (A)
         4.    sort pages according to πi's
        

  • The transition matrix is of course $$ {\bf P}_{ij} \eql \frac{1}{|N(i)|} \;\;\;\; \mbox{whenever (i,j) is an edge} $$

  • For example:

 

We have two things remaining to prove:

  1. Explain why \({\bf \pi} \eql {\bf \pi}^T {\bf P}\).
  2. Show that \(\lambda=1\) is the largest eigenvalue, which would make \({\bf \pi}\) the eigenvector that results from the power method.
 


11.4   The page rank algorithm, part III: the theory

 

Why \({\bf \pi} \eql {\bf \pi}^T {\bf P}\):

  • Consider the i-th component of \({\bf \pi}\), that is, \(\pi_i\).

  • This is the product of \({\bf \pi}^T\) and the i-th column of \({\bf P}\): $$ \pi_i \eql \pi_1 {\bf P}_{1,i} + \pi_2 {\bf P}_{2,i} + \pi_3 {\bf P}_{3,i} + \ldots + \pi_n {\bf P}_{n,i} $$

  • To show that this is true, first define $$ q_i(t) \eql Pr[\mbox{you land on page i in step t}] $$

  • Then, we'll argue that $$ q_i(t+1) \eql q_1(t) {\bf P}_{1,i} + q_2(t) {\bf P}_{2,i} + q_3(t) {\bf P}_{3,i} + \ldots + q_n(t) {\bf P}_{n,i} $$ Why should this be true?
    • Suppose we are on page 1 at time \(t\).
    • The probability of getting to page i at \(t+1\) is the "jump probability" from 1 to i: $$ Pr[\mbox{go from 1 to i}] \eql {\bf P}_{1,i} $$
    • Similarly, if we are on page \(j\), $$ Pr[\mbox{go from j to i}] \eql {\bf P}_{j,i} $$
    • Thus, the probability of find oneself on page i comes from accounting for all possible previous states: $$ Pr[\mbox{on page i at t+1}] \eql \sum_{j} Pr[\mbox{on page j at t}] {\bf P}_{j,i} $$
    • In other words, $$ q_i(t+1) \eql \sum_j q_j(t) {\bf P}_{j,i} $$

  • Now consider the limit at \(t\rightarrow\infty\): $$\eqb{ \lim_{t\rightarrow\infty} q_i(t+1) & \eql & \pi_i\\ \lim_{t\rightarrow\infty} q_j(t) & \eql & \pi_j }$$

  • And so, $$ \pi_i \eql \sum_j \pi_j {\bf P}_{j,i} $$ which is the i-th component of $$ {\bf \pi} \eql {\bf \pi}^T {\bf P} $$
 

The largest eigenvalue of \({\bf A}={\bf P}^T\):

  • First, observe that the sum of elements of a row in \({\bf P}\) is 1.
     

    In-Class Exercise 18: Why is this true? Hint: look at the matrix you wrote for the running example.
     

  • This means each column sum of \({\bf A}\) is 1.
          \(\rhd\) This also follows intuitively from the argument about \(q_i(t+1)\) above.

  • Next, suppose that \(\lambda\) is some eigenvalue of \({\bf A}\) larger than 1.

  • We've seen that for any eigenvalue \(\lambda\) and corresponding eigenvector \({\bf x}\) $$ {\bf A}^k {\bf x} \eql \lambda^k {\bf x} $$ by repeated application of \({\bf A}\).

  • Now, in our application \({\bf x}\) is a probability vector, so all components are non-negative.

  • Next, suppose that \(x_j\) is the smallest element of \({\bf x}\).

  • Then, no matter how small \(x_j\) is, eventually for some large enough \(k\), it will be true that $$ \lambda^k x_j > 1 $$

  • But \({\bf A}^k {\bf x}\) is a probability vector, and so we would get a contradiction.

  • Hence, \(\lambda\) cannot be larger than 1.

  • Finally, we've already seen that \(\lambda=1\) is an eigenvalue, and therefore it's the largest.
 


11.5   The page rank algorithm, part IV: practicalities

 

We'll address two practical issues:

  1. What to do about dangling nodes?
  2. The matrix is very, very large.
 

Dangling nodes:

  • These are pages that don't link to anything.

  • Just as bad are subgraphs that cause the random surfer to loop endlessly within a tiny part of the graph.
     

    In-Class Exercise 19: Draw an example of a 6-vertex graph with both of the above problems.
     

  • To address the above problem, we'll use a simple trick.

  • At each page, the surfer decides between
    1. Following one of the outgoing edges, selected randomly.
    2. Randomly jumping to any node in the whole graph.

  • To decide between the two, the surfer uses a biased coin with \(Pr[\mbox{heads}] = \alpha\).

  • If the old transition matrix was \({\bf P}\), the new transition matrix then becomes $$ {\bf G} \eql \alpha \mat{ P_{11} & \ldots & P_{1n} \\ P_{21} & \ldots & P_{2n} \\ \vdots & \vdots & \vdots \\ P_{n1} & \ldots & P_{nn} \\ } \plss (1-\alpha) \mat{ \frac{1}{n} & \ldots & \frac{1}{n} \\ \frac{1}{n} & \ldots & \frac{1}{n} \\ \vdots & \vdots & \vdots \\ \frac{1}{n} & \ldots & \frac{1}{n} \\ } $$
     

    In-Class Exercise 20: Explain why this is true. That is, why is \({\bf G}_{ij}\) the probability of jumping from i to j with the new two-step surfing rule?
     

    In-Class Exercise 21: Why does this two-step rule solve the dangling node problem?
     

 

Now for the size problem:

  • Let's examine the power method iteration with \({\bf G}^T\).

  • Since \({\bf G}^T\) is a full matrix (every entry is nonzero), that's a huge number of entries to store.

  • However, there is a simplification that works.

  • First, write the transpose in terms of \({\bf P}^T\) $$ {\bf G}^T \eql \alpha {\bf P}^T \plss (1-\alpha) \mat{ \frac{1}{n} & \ldots & \frac{1}{n} \\ \frac{1}{n} & \ldots & \frac{1}{n} \\ \vdots & \vdots & \vdots \\ \frac{1}{n} & \ldots & \frac{1}{n} \\ } $$

  • Next, define the vector $$ {\bf h} \eql \vecfour{1}{1}{\vdots}{1} $$

  • Then we can write the matrix on the right as $$ \mat{ \frac{1}{n} & \ldots & \frac{1}{n} \\ \frac{1}{n} & \ldots & \frac{1}{n} \\ \vdots & \vdots & \vdots \\ \frac{1}{n} & \ldots & \frac{1}{n} \\ } \eql \frac{1}{n}{\bf hh}^T $$ where, remember, that \({\bf h}^T\) is a single-row matrix.
     

    In-Class Exercise 22: Show that this is the case.
     

  • Then, we can write $$ {\bf G}^T \eql \alpha {\bf P}^T + \frac{(1-\alpha)}{n} {\bf hh}^T $$

  • And the power method iteration becomes $$\eqb{ {\bf x}^{(k)} & \eql & {\bf G}^T {\bf x}^{(k-1)} \\ & \eql & \alpha {\bf P}^T {\bf x}^{(k-1)} + \frac{(1-\alpha)}{n} {\bf hh}^T {\bf x}^{(k-1)} \\ & \eql & \alpha {\bf P}^T {\bf x}^{(k-1)} + \frac{(1-\alpha)}{n} {\bf h} ({\bf h}^T {\bf x}^{(k-1)}) \\ & \eql & \alpha {\bf P}^T {\bf x}^{(k-1)} + \frac{(1-\alpha)}{n} {\bf h} }$$ where the last simplification comes from $$ {\bf h}^T {\bf x}^{(k-1)} \eql 1 $$
     

    In-Class Exercise 23: Why is this last step true? Hint: remember that \({\bf x}^{(k-1)}\) are probabilities that sum to 1. What happens when you left multiply by \({\bf h}^T\)?
     

  • Thus, the only real "work" that takes time is to multiply the matrix \({\bf P}^T\) into the vector \({\bf x}^{(k-1)}\).

  • Here, the matrix \({\bf P}^T\) is very sparse.
     

    In-Class Exercise 24: Why is \({\bf P}^T\) sparse?
     

  • Fortunately, there are some simple tricks to store and multiply sparse matrices.

  • For example, store each row as a linked list.
          \(\rhd\) Each row will only have a few entries.
     

    In-Class Exercise 25: Explain how matrix-vector multiplication would work. That is, write pseudocode to multiply a matrix row represented as a linked-list into a vector stored as an array.
     

 


11.6   The QR algorithm

 

So far, we have a method to find one eigenvector: the one corresponding to the largest eigenvalue
      \(\rhd\) The power method
 

What about the other eigenvectors?

  • It turns out: this is quite a difficult problem.

  • It took decades to develop techniques to be able to do this reliably, and for large matrices.

  • There's a variety of quite different techniques today
          \(\rhd\) The subject of numerical linear algebra

  • We could easily spend a whole semester on these methods.

  • But since we capped off the study of orthogonality with the QR-decomposition, we'll describe an algorithm based on it.

  • The algorithm is quite simple to implement, but impossibly difficult to intuit.

  • What's nice about it: it gives us all the eigenvalues (and eventually eigenvectors) at once.
 

The QR algorithm:

  • First recall the QR-decomposition: any matrix \({\bf A}\) with independent columns can be decomposed into the product $$ {\bf A} \eql {\bf QR} $$ where \({\bf Q}\) is orthogonal and \({\bf A}\) is upper-triangular and invertible.

  • Let's describe the steps in the iterative QR algorithm:
    • Start by decomposing \({\bf A}\): $$ {\bf A} \eql {\bf Q}_0 {\bf R}_0 $$
    • Then, create a new matrix $$ {\bf A}_1 \; \defn \; {\bf R}_0 {\bf Q}_0 $$
    • Then, do the QR for \({\bf A}_1\) $$ {\bf A}_1 \eql {\bf Q}_1 {\bf R}_1 $$
    • Then, create \({\bf A}_2\) $$ {\bf A}_2 \; \defn \; {\bf R}_1 {\bf Q}_1 $$
    • And so on.

  • In pseudocode:
         Algorithm: QRmethod (A)
         Input: a square matrix A with independent columns
         1.   for k=0 to ...
         2.       (Q, R) = QR-decomposition (A)
         3.       A = R Q
         4.   endfor
         5.   return diagonal (A)             // Eigenvalues are on the diagonal 
         

  • It turns out that this iteration converges. Let's give the limiting matrix a name: $$ \lim_{k\rightarrow\infty} {\bf A}_k \eql {\bf A}^\prime $$

  • Next, define $$ \mbox{diag}({\bf A}^\prime) \eql \mbox{ the diagonal entries of } {\bf A}^\prime $$

  • It turns out, quite surprisingly, that: $$ \mbox{diag}({\bf A}^\prime) \eql (\lambda_1,\lambda_2,\ldots,...,\lambda_n) $$ That is, the diagonal entries of \({\bf A}^\prime\) are the eigenvalues of \({\bf A}\)!

  • This is not at all trivial to prove. We will refer to the text by Olver and Shakiban (Chapter 10) for a proof.

  • The iteration only provides the eigenvalues.

  • But once we know an eigenvalue \(\lambda\), the corresponding eigenvector can be found from the RREF, as follows:
    • If \({\bf x}\) is the eigenvector corresponding to \(\lambda\) then $$ {\bf Ax} = \lambda{\bf x} $$
    • We can write the right side as $$ {\bf Ax} = \lambda {\bf I} {\bf x} $$
    • Which means $$ {\bf Ax} - \lambda {\bf I} {\bf x} \eql {\bf 0} $$ Or $$ ({\bf A} - \lambda {\bf I}) {\bf x} \eql {\bf 0} $$
    • Since we know \(\lambda\), we know the matrix $$ {\bf B} \eql {\bf A} - \lambda {\bf I} $$
    • Which means, to find \({\bf x}\) we need to solve the equations $$ {\bf Bx} \eql {\bf 0} $$
    • Which we know how to do quite well.
 

In-Class Exercise 26: Review your work in Exercise 5 showing that \({\bf x}_1=(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})\) and \({\bf x}_2=(\frac{1}{\sqrt{2}},-\frac{1}{\sqrt{2}})\) are eigenvectors of \({\bf A}=\mat{2 & 1\\1 & 2}\). What were the eigenvalues? Download, compile and execute QRExample.java to see the QR algorithm find the eigenvalues.
 

Note:

  • We have described the above algorithm just for completeness, to show that there are ways of computing a full set of eigenvalues.
  • This is not really an "exam" topic or anything critical to understanding the remaining material.
  • The general problem of computing eigenvalues efficiently is difficult and an advanced topic.
 


© 2016, Rahul Simha