\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecfour}[4]{\left[\begin{array}{r} #1 \\ #2\\ #3\\ #4\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \newcommand{\bigsp}{\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\plss}{\;\;+\;\;} \newcommand{\miss}{\;\;-\;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \)


Module 5: Solving Equations

Matrices that transform other matrices, the RREF, and the row view of \({\bf A x} = {\bf b}\)


Module objectives

 
There are few things more boring that solving simultaneous linear equations. But ...

So, let's roll up our sleeves and dig in.
 
By the end of this module, you should be able to
 

But, first, a little practice exercise multiplying matrices.
 

In-Class Exercise 1: Work out the following matrix products: $$\eqb{ \mbox{(1)} & \mat{0 & 1\\1 & 0} \mat{1 & 3 \\ 4 & 2} \\ \mbox{(2)} & \mat{\alpha & 0\\0 & 1} \mat{1 & 3 \\ 4 & 2}\\ \mbox{(3)} & \mat{1 & 0\\-4 & 1} \mat{1 & 3 \\ 4 & 2} \\ }$$
 


5.0   Matrices that transform matrices

 

Matrices that transform matrices:

  • Thus far, we've seen that a matrix multiplying a vector transforms that vector into another vector.

  • Thus, in \({\bf Ax} = {\bf b}\), the matrix \({\bf A}\) transforms the vector \({\bf x}\) into the vector \({\bf b}\).
          \(\rhd\) \({\bf A}\) "acts" on \({\bf x}\) to produce \({\bf b}\).

  • For example, the matrix \({\bf A} = \mat{1 & 3 \\ 4 & 2}\) transforms \({\bf x}=(1.5, 2)\) into \((7.5,10)\) because $$ \mat{1 & 3 \\ 4 & 2} \vectwo{1.5}{2} \eql \vectwo{7.5}{10} $$

  • In a similar manner, we could interpret the matrix multiplication \({\bf B A} = {\bf C}\) as the matrix \({\bf B}\) transforming the matrix \({\bf A}\) into the matrix \({\bf C}\).

  • Thus, for example, \({\bf B}=\mat{1 & 0\\-4 & 1}\) multiplied into \({\bf A}=\mat{1 & 3\\4 & 2}\) produces \({\bf C}=\mat{1 & 3\\0 & -10}\) $$ {\bf B A} \eql \mat{1 & 0\\-4 & 1} \mat{1 & 3\\4 & 2} \eql \mat{1 & 3\\0 & -10} \eql {\bf C} $$ We could say that multiplication by \({\bf B}\) transformed \({\bf A}\) into \({\bf C}\)

  • This view of matrix-matrix multiplication will later be useful in understanding how the inverse of a matrix is computed.
 

Some useful transformations:

  • Consider the matrix \(\mat{0 & 1\\1 & 0}\) acting on \(\mat{1 & 3 \\ 4 & 2}\): $$ \mat{0 & 1\\1 & 0} \mat{1 & 3 \\ 4 & 2} \eql \mat{4 & 2 \\ 1 & 3} $$ This has the effect of swapping the rows of the second matrix.
     

    In-Class Exercise 2: Design a matrix to swap rows 2 and 3 of a \(3\times 3\) matrix, and show how it works on an example. Then design a matrix to swap rows 3 and 2. What happens when you multiply the matrix that does the 2-3 swap with the one that does the 3-2 swap? What is the matrix that swaps rows \(i\) and \(j\) of an \(n\times n\) matrix? What happens when you multiply this matrix by itself?
     

  • Next, consider the matrix \(\mat{\alpha & 0\\0 & 1}\) applied to \(\mat{1 & 3 \\ 4 & 2}\): $$ \mat{\alpha & 0\\0 & 1} \mat{1 & 3 \\ 4 & 2} \eql \mat{\alpha & 3\alpha \\ 4 & 2} $$ This has the effect of multiplying the first row of \(\mat{1 & 3 \\ 4 & 2}\) by \(\alpha\).
     

    In-Class Exercise 3: Design a matrix to divide row 3 of a \(3\times 3\) matrix by 2 and show how it works on an example. What is the matrix that divides row \(i\) of an \(n\times n\) matrix by the number \(\beta\)? What is the inverse of this matrix?
     

  • We'll now look at a slightly more complicated, but extremely useful transformation.

  • But before that, let's introduce some notation:
    • Let \({\bf r_A}(i)\) denote the i-th row of matrix \({\bf A}\).
    • Example: if $$ {\bf A} \eql \mat{1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9} $$ then \({\bf r_A}(2) = (4,5,6)\) and \({\bf r_A}(3) = (7,8,9)\).
    • \({\bf r_A}(i)\) is not really a vector, but since it's an ordered collection of numbers, we'll occasionally treat it like a vector for purposes of multiplication.
     

    In-Class Exercise 4: What is \({\bf r_A}(2) + {\bf r_A}(3) \) in the above example?
     

  • Now for the important transformation. Consider \(\mat{1 & 0\\-4 & 1}\) applied to \(\mat{1 & 3 \\ 4 & 2}\): $$ \mat{1 & 0\\-4 & 1} \mat{1 & 3 \\ 4 & 2} \eql \mat{1 & 3 \\ 0 & -10} $$
    • This has the effect of replacing row 2 with the sum of: row 2 and \(-4 \times\) row 1.
    • We can separately work out the arithmetic as: $$\eqb{ 4 & \;\;\;2 & \;\;\;\;\;\;\;\mbox{row 2, notated as } r_{\bf A}(2)\\ -4 & \; -12 & \;\;\;\;\;\;\;\mbox{this is -4 times row 1 denoted by } -4 \times r_{\bf A}(1)\\\hline 0 & \;-10 & \;\;\;\;\;\;\;\mbox{the result, the new } r_{\bf A}(2)\\ }$$
    • Thus, we are achieving \({\bf r_A}(2) \leftarrow {\bf r_A}(2) - 4 {\bf r_A}(1)\)
    • You can read \({\bf r_A}(2) \leftarrow {\bf r_A}(2) - 4 {\bf r_A}(1)\) as:
      • (Left side) Replace row 2 with ...
      • (On the right) Take the 2nd row, and subtract 4 times the first row.
      • The result is a different 2nd row, after we do this.
     

    In-Class Exercise 5: What is the transformation matrix that achieves \({\bf r_A}(1) \leftarrow {\bf r_A}(1) - 3 {\bf r_A}(2)\) for a \(2\times 2\) matrix? What is the effect when applied to \(\mat{1 & 3\\0 & 1}\)?
     

     

    In-Class Exercise 6: What is the transformation matrix that achieves \({\bf r_A}(i) \leftarrow {\bf r_A}(i) + \alpha {\bf r_A}(j)\) for an \(n\times n\) matrix? What is the inverse of this matrix?
     

  • To summarize, we've seen three row operations achieved through matrix multiplication:
    • Row operation #1: swap two rows
    • Row operation #2: multiply or divide a row by a constant.
    • Row operation #3: replace a row by subtracting from it a multiple of another row.

  • All three matrices (the ones that transform) have inverses that "undo" the row operations, respectively.
 

What good are such transformations? Why would we want to transform a matrix?
 

To see, let's first review inverse and identity:

  • We will restrict our discussion to square matrices for now.

  • The number 1 is called the multiplicative identity for numbers because any number \(x\) times 1 is \(x\).

  • The equivalent identity matrix \({\bf I}\) has the property that, for any (multiply-compatible) matrix \({\bf A}\) $$ {\bf A I} \eql {\bf A} $$ and $$ {\bf I A} \eql {\bf A} $$

  • The identity matrix of size \(n \times n\) is: $$ {\bf I} \eql \mat{ 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & 1 } $$

  • Thus, in this \(2 \times 2\) example: $$ \mat{1 & 0 \\ 0 & 1} \mat{1 & 3 \\ 4 & 2} \eql \mat{1 & 3 \\ 4 & 2} $$ and $$ \mat{1 & 3 \\ 4 & 2} \mat{1 & 0 \\ 0 & 1} \eql \mat{1 & 3 \\ 4 & 2} $$
     

    In-Class Exercise 7: Prove that \({\bf I}\) as defined above is indeed the multiplicative identity for matrices for any size \(n \times n\). What is the geometric intuition for why this must be so?
     

  • With numbers, the multiplicative inverse of a number \(x\) is a number \(y\) such that \(yx = 1\).

  • For regular number multiplication, we know that if \(yx = 1\) then \(xy = 1\).

  • That is, the number that multiplies into \(x\) to produce the multiplicative identity.

  • The notation \(x^{-1}\) is used to denote the multiplicative inverse of the number \(x\).

  • So, for a matrix \({\bf A}\), the inverse will be denoted by \({\bf A^{-1}}\).

  • The inverse of a square matrix is defined as the matrix \({\bf A^{-1}}\) such that $$ {\bf A^{-1} A} \eql {\bf I} \eql {\bf A A^{-1}} $$

  • For example, if \({\bf A} = \mat{1 & 3\\4 & 2}\), then \({\bf A^{-1}}\) turns out to be \(\mat{-0.2 & 0.3\\0.4 & -0.1}\) because $$ \mat{-0.2 & 0.3\\0.4 & -0.1} \mat{1 & 3\\4 & 2} \eql \mat{1 & 0 \\ 0 & 1} $$
     

    In-Class Exercise 8: Confirm the above by doing the calculations by hand.
     

  • The inverse is usually written as multiplying on the left (pre-multiplying): \({\bf A^{-1} A}\).

  • Note: we could have defined the inverse solely using left multiplication:
    • If \({\bf A^{-1} A} = {\bf I}\), then one can show that \({\bf A A^{-1}} = {\bf I}\).
    • This is not obvious or trivial to prove (try it).

  • In case you were wondering, yes, there are equivalent definitions of additive identity and additive inverse, but these are rarely used.

  • As we will see, not every square matrix has an inverse.
 

Now for an important observation:

  • Suppose there exist matrices \({\bf R}_1, {\bf R}_2, \ldots, {\bf R}_k\) such that $$ {\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 {\bf A} \eql {\bf I} $$ That is, a series of transformations that turn \({\bf A}\) into the identity matrix \({\bf I}\).

  • Then, let $$ {\bf R} \defn {\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 $$

  • Which means $$ {\bf R} {\bf A} \eql {\bf I} $$ which by the definition of matrix inverse means that $$ {\bf R} \eql {\bf A}^{-1} $$

  • Thus, interestingly, the product of the transformation matrices is the inverse itself!

  • Furthermore, we can see that $$ {\bf R I } \eql {\bf A}^{-1} {\bf I} \eql {\bf A}^{-1} $$ and so replacing the combined transformation with the sequence: $$ {\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 {\bf I} \eql {\bf A}^{-1} $$

  • That is, applying the same transformations to the identity matrix produces the inverse.

  • Thus, if we can figure out which transformations reduce a matrix to the identity, we can compute the inverse using the same transformations applied to \({\bf I}\).

  • So, for the all-important matrix-vector equation \({\bf A x} = {\bf b}\), if we've computed \({\bf A}^{-1}\), we see that $$\eqb{ & {\bf A x} & \eql & {\bf b}\\ \Rightarrow \;\;\; & {\bf A}^{-1} {\bf A x} & \eql & {\bf A}^{-1} {\bf b}\\ \Rightarrow \;\;\;& {\bf I x} & \eql & {\bf A}^{-1} {\bf b}\\ \Rightarrow \;\;\; & {\bf x} & \eql & {\bf A}^{-1} {\bf b}\\ }$$ Note: you can read the double-arrow symbol on the left as "which implies that".

  • That is, we will have solved our most important equation: \({\bf Ax}={\bf b}\).

  • Let's make this more concrete:
    • Suppose we have the equations $$\eqb{ 2x_1 & - & 3x_2 & + & 2x_3 & \eql & 5\\ -2x_1 & + & x_2 & & & \eql & -1\\ x_1 & + & x_2 & - & x_3 & \eql & 0 }$$
    • In matrix form, this becomes $$ \mat{2 & -3 & 2\\ -2 & 1 & 0\\ 1 & 1 & -1} \vecthree{x_1}{x_2}{x_3} \eql \vecthree{5}{-1}{0} $$
    • Suppose we've found a way to compute the inverse $$ \mat{0.5 & 0.5 & 1\\ 1 & 2 & 2\\ 1.5 & 2.5 & 2} $$
    • Then, the solution to the simultaneous linear equations is: $$ \vecthree{x_1}{x_2}{x_3} \eql \mat{0.5 & 0.5 & 1\\ 1 & 2 & 2\\ 1.5 & 2.5 & 2} \vecthree{5}{-1}{0} $$

  • So, it appears we have a way to solving \({\bf Ax}={\bf b}\).

  • But there are some questions too:
    • Exactly how do we decide which transformations to make?
    • Is it guaranteed we'll be able to find transformations to compute the inverse?
    • Is this the fastest way?
    • What about equations that don't have a solution or have multiple solutions?
    • What is this mysterious and guttural sounding "RREF"? And what does it have to do with solving equations?

Before we get to these questions, let's revisit the traditional way of solving the equations.
 


5.1   Solving by hand: some insights

 

Let's look at two examples for some insight.

Example 1:

  • Suppose the equations are: $$\eqb{ x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\ 4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2) }$$
     

    In-Class Exercise 9: Write the above equations in matrix form.
     

  • There are many ways to proceed:
    • We could eliminate \(x_1\) by multiplying the first equation by 4.
    • We could eliminate \(x_1\) by dividing the second equation by 4.
    • We could multiply the first by 2, the second by 3, and eliminate \(x_2\).

  • We will multiply the equation (1) by 4 and write equation (2) below: $$\eqb{ 4x_1 & + & 12x_2 & \eql & 30 & \bigsp (3) \\ 4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2) }$$

  • This gives us a different set of equations.

  • What's most important to realize:
    • We've changed the set of equations.
    • But the solution to the new set is the same as the solution to the original equations.
     

    In-Class Exercise 10: Why? What is the precise reasoning?
     

    In-Class Exercise 11: Write the new set in matrix form.
     

  • We will now subtract equation (2) from equation (3) to a new set of equations: $$\eqb{ 4x_1 & + & 12x_2 & \eql & 30 & \bigsp (3) \\ & & 10x_2 & \eql & 20 & \bigsp (4) }$$
     

    In-Class Exercise 12: Write the new set (3), (4) in matrix form.
     

  • At last, we have a single-variable equation \(10x_2=20\) which we can trivially solve: \(x_2=2\).

  • Then, we can go back and substitute \(x_2=2\) in any of the equations (1), (2) or (3).
 

We could say we've solved the equations, but ... we will do something unusual:

  • First, let's write (1) and (4) $$\eqb{ x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\ & & 10x_2 & \eql & 20 & \bigsp (4) }$$

  • Then, divide (4) by 10: $$\eqb{ x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\ & & x_2 & \eql & 2 & \bigsp (5) }$$
     

    In-Class Exercise 13: Write the new set (1), (5) in matrix form.
     

  • Then, multiply (5) by 3: $$\eqb{ x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\ & &3x_2 & \eql & 6 & \bigsp (6) }$$

  • Subtract (6) from (1): $$\eqb{ x_1 & + & 0 & \eql & 1.5 & \bigsp (7) \\ 0 & + &3x_2 & \eql & 6 & \bigsp (6) }$$

  • Now write (7) and (5) $$\eqb{ x_1 & + & 0 & \eql & 1.5 & \bigsp (7) \\ 0 & + & x_2 & \eql & 2 & \bigsp (5) }$$

  • We can read off the solution directly from these equations.
     

    In-Class Exercise 14: Write the new set (7), (5) in matrix form.
     

 

There's a minor twist that will be useful:

  • Remember how we multiplied equation (1) to get: $$\eqb{ 4x_1 & + & 12x_2 & \eql & 30 & \bigsp (3) \\ 4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2) }$$

  • We then replaced (2) with (3)-(2).

  • This is the same as starting from $$\eqb{ x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\ 4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2) }$$ and replacing (2) with (2) - 4\(\times\)(1).

  • In other words, we're using the same row operations as described earlier for transforming matrices.

  • In this example, we did not need the row-swap operation.
 

Let's look at another, \(3\times 3\) example:

  • The equations: $$\eqb{ 2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\ -2x_1 & \plss & x_2 & & & \eql -1\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$

  • We'll start with \(x_1\) and make its coefficient 1 in the first equation:
    • The coefficient is currently 2 $$\eqb{ {\bf 2}x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\ -2x_1 & \plss & x_2 & & & \eql -1\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$
    • We'll need to divide the row by 2: $$\eqb{ x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ -2x_1 & \plss & x_2 & & & \eql -1\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$

  • Next, we'll eliminate \(x_1\) from row 2:
    • To do this, add twice row 1.
    • That is, \({\bf r}_2 \leftarrow {\bf r}_2 + 2{\bf r}_1\) $$\eqb{ {\bf x_1} & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \miss & 2x_2 & \plss & 2x_3 & \eql 4\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$

  • Next, we'll eliminate \(x_1\) from row 3 using \({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_1\) $$\eqb{ {\bf x_1} & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \miss & 2x_2 & \plss & 2x_3 & \eql 4\\ 0 & \plss & \frac{5}{2}x_2 & \miss & 2x_3 & \eql -\frac{5}{2}\\ }$$ Note: we've marked \(x_1\) in bold to draw attention to the "column" below it.
     

    In-Class Exercise 15: Write down the corresponding five matrices in sequence: the starting matrix, and then the resulting matrices after each row operation.
     

  • We'll now turn to \(x_2\) $$\eqb{ x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \miss & {\bf 2x_2} & \plss & 2x_3 & \eql 4\\ 0 & \plss & \frac{5}{2}x_2 & \miss & 2x_3 & \eql -\frac{5}{2}\\ }$$ And we'll now eliminate it from the rows below the second row.

  • First, we will make the coefficient unity: \({\bf r}_2 \leftarrow -\frac{1}{2}{\bf r}_2\) $$\eqb{ x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\ 0 & \plss & \frac{5}{2}x_2 & \miss & 2x_3 & \eql -\frac{5}{2}\\ }$$

  • Next, we'll eliminate \(x_2\) from row 3: \({\bf r}_3 \leftarrow {\bf r}_3 - \frac{5}{2} {\bf r}_2\) $$\eqb{ x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\ 0 & \plss & 0 & \plss & \frac{1}{2}x_3 & \eql \frac{5}{2}\\ }$$

  • Next, we'll make the coefficient of \(x_3\) into 1: $$\eqb{ x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \plss & x_2 & \miss & x_3 & \eql -2\\ 0 & \plss & 0 & \plss & {\bf \frac{1}{2}x_3} & \eql \frac{5}{2}\\ }$$ We'll do this via \({\bf r}_3 \leftarrow 2 {\bf r}_3\) $$\eqb{ x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \plss & x_2 & \miss & x_3 & \eql -2\\ 0 & \plss & 0 & \plss & {\bf x_3} & \eql 5\\ }$$

  • At this point, we have solved for \(x_3\).

  • We use \(x_3=5\) in row 2 and obtain \(x_2\). Then, we could use both to get \(x_1\) from the first row.

  • This approach is called backwards substitution:
    • The last row gives us a single-variable equation with the solution.
    • Working backwards, each equation going up has one unknown.

  • But we will stubbornly plough onwards, changing more coefficients.
     

    In-Class Exercise 16: Write down the corresponding three matrices in sequence after applying the above three row operations.
     

  • We'll now return to \(x_2\) and eliminate that from all equations above row 2 $$\eqb{ x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\ 0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\ 0 & \plss & 0 & \plss & x_3 & \eql 5\\ }$$ In this set, there's only one equation above (row 1)
          \(\rhd\) We need \({\bf r}_1 \leftarrow {\bf r}_1 + \frac{3}{2} {\bf r}_2\) Which gives us $$\eqb{ x_1 & \plss & 0 & \miss & \frac{1}{2}x_3 & \eql -\frac{1}{2}\\ 0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\ 0 & \plss & 0 & \plss & x_3 & \eql 5\\ }$$

  • Next, we'll eliminate \(x_3\) from rows 1 and 2: $$\eqb{ x_1 & \plss & 0 & \miss & \frac{1}{2}x_3 & \eql -\frac{1}{2}\\ 0 & \plss & x_2 & \miss & x_3 & \eql -2\\ 0 & \plss & 0 & \plss & {\bf x_3} & \eql 5\\ }$$
    • Applying \({\bf r}_1 \leftarrow {\bf r}_1 + \frac{1}{2} {\bf r}_3\) gives $$\eqb{ x_1 & \plss & 0 & \plss & 0 & \eql 2\\ 0 & \plss & x_2 & \miss & x_3 & \eql -2\\ 0 & \plss & 0 & \plss & {\bf x_3} & \eql 5\\ }$$
    • Then, applying \({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\) gives $$\eqb{ x_1 & \plss & 0 & \plss & 0 & \eql 2\\ 0 & \plss & x_2 & \plss & 0 & \eql 3\\ 0 & \plss & 0 & \plss & x_3 & \eql 5\\ }$$

  • We can now just read off the solution on the right side.
     

    In-Class Exercise 17: Write down the corresponding three matrices in sequence after applying the above three row operations.
     

 

Let's summarize the approach so far:

  • Start with the coefficient of \(x_1\) and divide to make the coefficient \(1\).

  • Then, use row operations to eliminate \(x_1\) in the first column, below the first row.
          \(\rhd\) Which is the same thing as making those coeffs \(0\).

  • Then, go to \(x_2\) in row 2, column 2
    • Divide to make its coefficient \(1\).
    • Eliminate \(x_2\) in rows 3 onwards.
            \(\rhd\) Make the coefficients \(0\) in those rows (in that column).

  • Then, \(x_3, x_4, \ldots\) etc.

  • Note: each row operation is applied to the whole row, including the right-hand-side of the equation.
 

This is the general idea, but it turns out, we'll need to tweak it to address some issues
      \(\rhd\) For example, when the coefficient of some variable is already zero.

Before that, let's look at another example, but in matrix form.
 


5.2   Solving by hand using an augmented matrix

 

The augmented matrix:

  • Consider an example of a row operation like the following:
    • Apply \({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\) to $$\eqb{ x_1 & \plss & 0 & \plss & 0 & \eql 2\\ {\bf 0} & \plss & {\bf x_2} & \miss & {\bf x_3} & \eql {\bf -2}\\ 0 & \plss & 0 & \plss & x_3 & \eql 5\\ }$$ to give $$\eqb{ x_1 & \plss & 0 & \plss & 0 & \eql 2\\ {\bf 0} & \plss & {\bf x_2} & \plss & {\bf 0} & \eql {\bf 3}\\ 0 & \plss & 0 & \plss & x_3 & \eql 5\\ }$$
    • Here, the row operation applies to both sides of the second equation.

  • In matrix form, at the step before applying the row operation, it would look like this: $$ \mat{1 & 0 & 0\\ {\bf 0} & {\bf 1} & {\bf -1}\\ 0 & 0 & 1} \vecthree{x_1}{x_2}{x_3} \eql \vecthree{2}{{\bf -2}}{5} $$

  • Thus, for row operations, we're really treating the right hand side numbers the same way as we treat any column of the matrix.

  • For convenience of both programming and visualization, we'll put the right-hand-side into a column of a larger matrix: $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ {\bf 0} & {\bf 1} & {\bf -1} & {\bf -2}\\ 0 & 0 & 1 & 5 \end{array} \right] $$ To make the last artificial column obvious, we'll draw a line down the middle.

  • Now applying \({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\) gives $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ {\bf 0} & {\bf 1} & {\bf 0} & {\bf 3}\\ 0 & 0 & 1 & 5 \end{array} \right] $$

  • Definition: Given \({\bf Ax}={\bf b}\), an initial augmented matrix is the matrix \({\bf A}_{+}\) obtained by adding \({\bf b}\) as an additional, rightmost column to \({\bf A}\).

  • Definition: An augmented matrix is any matrix obtained from row operations applied in sequence to an initial augmented matrix.
 

About augmented matrices:

  • They're defined for convenience in programming and for explaining examples.

  • They are not used as matrices to multiply into anything.

  • Thus, they're more a data structure than a real matrix.
 

Let's use this notation and work through a complete example with both the augmented matrix and the identity: $$\eqb{ 2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\ -2x_1 & \plss & x_2 & & & \eql -1\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$

  • Start with the augmented matrix and the identity side-by-side $$ \left[ \begin{array}{ccc|c} 2 & -3 & 2 & 5\\ -2 & 1 & 0 & -1\\ 1 & 1 & -1 & 0 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1} $$

  • Apply \({\bf r}_1 \leftarrow \frac{{\bf r}_1}{2} \) to both matrices $$ \left[ \begin{array}{ccc|c} 1 & -1.5 & 1 & 2.5\\ -2 & 1 & 0 & -1\\ 1 & 1 & -1 & 0 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{0.5 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1} $$

  • Apply \({\bf r}_2 \leftarrow {\bf r}_2 + 2{\bf r}_1\): $$ \left[ \begin{array}{ccc|c} 1 & -1.5 & 1 & 2.5\\ 0 & -2 & 2 & 4\\ 1 & 1 & -1 & 0 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{0.5 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 1} $$

  • Apply \({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_1\): $$ \left[ \begin{array}{ccc|c} 1 & -1.5 & 1 & 2.5\\ 0 & -2 & 2 & 4\\ 0 & 2.5 & -2& -2.5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{0.5 & 0 & 0 \\ 1 & 1 & 0 \\ -0.5 & 0 & 1} $$

  • Apply \({\bf r}_2 \leftarrow \frac{{\bf r}_2}{-2}\): $$ \left[ \begin{array}{ccc|c} 1 & -1.5 & 1 & 2.5\\ 0 & {\bf 1} & -1 & -2\\ 0 & 2.5 & -2 & -2.5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{0.5 & 0 & 0 \\ -0.5 & -0.5 & 0 \\ -0.5 & 0 & 1} $$

  • To set zeroes below the element \({\bf 1}\), apply \({\bf r}_3 \leftarrow {\bf r}_3 - 2.5{\bf r}_2\): $$ \left[ \begin{array}{ccc|c} 1 & -1.5 & 1 & 2.5\\ 0 & 1 & -1 & -2\\ 0 & 0 & 0.5 & 2.5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{0.5 & 0 & 0 \\ -0.5 & -0.5 & 0 \\ 0.75 & 1.25 & 1} $$

  • Next, set the coefficient in the third column, third row to be \({\bf 1}\) with \({\bf r}_3 \leftarrow 2{\bf r}_3\): $$ \left[ \begin{array}{ccc|c} 1 & -1.5 & 1 & 2.5\\ 0 & 1 & -1 & -2\\ 0 & 0 & {\bf 1} & 5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{0.5 & 0 & 0 \\ -0.5 & -0.5 & 0 \\ 1.5 & 2.5 & 2} $$

  • There are no further rows. Observe that both boldfaced 1's have non-zeroes above them: $$ \left[ \begin{array}{ccc|c} 1 & -1.5 & 1 & 2.5\\ 0 & {\bf 1} & -1 & -2\\ 0 & 0 & {\bf 1} & 5 \end{array} \right] $$ We will use row operations to make these values zero as well.

  • Start with the 1 in row 2, column 2 and apply \({\bf r}_1 \leftarrow {\bf r}_1 + 1.5{\bf r}_2\): $$ \left[ \begin{array}{ccc|c} 1 & 0 & -0.5 & -0.5\\ 0 & {\bf 1} & -1 & -2\\ 0 & 0 & 1 & 5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{-0.25 & -0.75 & 0 \\ -0.5 & -0.5 & 0 \\ 1.5 & 2.5 & 2} $$

  • Now examine the entries above the 1 in row 3, column 3: $$ \left[ \begin{array}{ccc|c} 1 & 0 & -0.5 & -0.5\\ 0 & 1 & -1 & -2\\ 0 & 0 & {\bf 1} & 5 \end{array} \right] $$

  • Apply \({\bf r}_1 \leftarrow {\bf r}_1 + 0.5{\bf r}_3\): $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ 0 & 1 & -1 & -2\\ 0 & 0 & {\bf 1} & 5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{-0.5 & 0.5 & 1 \\ -0.5 & -0.5 & 0 \\ 1.5 & 2.5 & 2} $$

  • Next, zero out the entry in row 2, column 3 by applying \({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\): $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ 0 & 1 & 0 & 3\\ 0 & 0 & {\bf 1} & 5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} $$

  • This completes the process.

  • We are left with the identity matrix plus an augmented column. $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ 0 & 1 & 0 & 3\\ 0 & 0 & 1 & 5 \end{array} \right] $$ This has the solution, as we can see from breaking apart the augmentation, and rewriting in the form \({\bf Ax} = {\bf b}\): $$ \mat{ 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1} \; \vecthree{x_1}{x_2}{x_3} \eql \vecthree{2}{3}{5} $$ Multiplying, we get the solution: $$ \vecthree{x_1}{x_2}{x_3} \eql \vecthree{2}{3}{5} $$

  • The other matrix to which we applied row operations is more interesting: $$ \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} $$ This is the inverse \({\bf A}^{-1}\).

  • Confirm by computing $$ {\bf A}^{-1} {\bf A} \eql \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} \mat{2 & -3 & 2 \\ -2 & 1 & 0 \\ 1 & 1 & -1} \eql \mat{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1} $$ That is, \({\bf A}^{-1} {\bf A} = {\bf I}\), as we would expect.

  • We can also confirm getting the solution by applying the inverse to the right-hand-side of the equations: $$ {\bf A}^{-1} {\bf b} \eql \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} \vecthree{5}{-1}{0} \eql \vecthree{2}{3}{5} $$

  • What's really useful, of course, is that once we've computed the inverse, we can apply it to a different right-hand-side:
    • So, if the right-hand-side of the equations changed to: $$\eqb{ 2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql {\bf 0}\\ -2x_1 & \plss & x_2 & & & \eql {\bf 2}\\ x_1 & \plss & x_2 & \miss & x_3 & \eql {\bf 1}\\ }$$
    • We would then simply compute $$ {\bf A}^{-1} \vecthree{0}{2}{1} \eql \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} \vecthree{0}{2}{1} \eql \vecthree{2}{6}{7} $$
    to solve these new equations.
 

To summarize:

  • Got equations? $$\eqb{ 2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\ -2x_1 & \plss & x_2 & & & \eql -1\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$

  • Write 'em in augmented-matrix form alongside \({\bf I}\): $$ \left[ \begin{array}{ccc|c} 2 & -3 & 2 & 5\\ -2 & 1 & 0 & -1\\ 1 & 1 & -1 & 0 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1} $$

  • Apply row operations to both matrices trying to make the left one the identity matrix (except for the augmented column): $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ 0 & 1 & 0 & 3\\ 0 & 0 & 1 & 5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} $$

  • We can read off the solution from the left, and have the inverse handy (on the right) for future use.
 


5.3   Pivots, REF, RREF

 

Let's work through a \(4 \times 4\) example:

  • Consider this set of equations: $$ \mat{1 & 2 & 0 & -5\\ -1 & -1 & 1 & 4\\ 3 & 7 & 2 & -14\\ -2 & -2 & 4 & 13} \vecfour{x_1}{x_2}{x_3}{x_4} \eql \vecfour{-5}{6}{-9}{23} $$

  • First, we form the initial augmented matrix: $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & -5\\ -1 & -1 & 1 & 4 & 6\\ 3 & 7 & 2 & -14 & -9\\ -2 & -2 & 4 & 13 & 23 \end{array} \right] $$

  • Now we look at the first element in row 1 and divide to make it \(1\) $$ \left[ \begin{array}{cccc|c} {\bf 1} & 2 & 0 & -5 & -5\\ -1 & -1 & 1 & 4 & 6\\ 3 & 7 & 2 & -14 & -9\\ -2 & -2 & 4 & 13 & 23 \end{array} \right] $$ Clearly, it's \(1\) and so, really, we're dividing the whole row by \(1\).

  • Once it's been made \(1\) the element will be called a pivot:

    Definition: a pivot is a coefficient that's been reduced to \(1\) through division or multiplication before other row ops are performed in that column.

  • Now use the pivot to zero out the elements in its column that are below the pivot:
    • \({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_1\) for row 2.
    • \({\bf r}_3 \leftarrow {\bf r}_3 - 3{\bf r}_1\) for row 3.
    • \({\bf r}_4 \leftarrow {\bf r}_4 + 2{\bf r}_1\) for row 4.
    This produces: $$ \left[ \begin{array}{cccc|c} {\bf 1} & 2 & 0 & -5 & -5\\ 0 & 1 & 1 & -1 & 1\\ 0 & 1 & 2 & 1 & 6\\ 0 & 2 & 4 & 3 & 13 \end{array} \right] $$

  • The next pivot is in row 2, column 2: $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & 2\\ 0 & {\bf 1} & 1 & -1 & 1\\ 0 & 1 & 2 & 1 & 6\\ 0 & 2 & 4 & 3 & 13 \end{array} \right] $$ This too doesn't need to be divided.

  • We now need to create zeroes below the pivot in the same column.
    • \({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_2\) for row 3.
    • \({\bf r}_4 \leftarrow {\bf r}_4 - 2{\bf r}_2\) for row 4.
    Which produces $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & -5\\ 0 & {\bf 1} & 1 & -1 & 1\\ 0 & 0 & 1 & 2 & 5\\ 0 & 0 & 2 & 5 & 11 \end{array} \right] $$

  • The next pivot is in row 3, column 3: $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & 2\\ 0 & 1 & 1 & -1 & 1\\ 0 & 0 & {\bf 1} & 2 & 5\\ 0 & 0 & 2 & 5 & 11 \end{array} \right] $$ Again, we got lucky and did not have to divide through.

  • We need the single row op \({\bf r}_4 \leftarrow {\bf r}_4 - 2{\bf r}_3\) for row 4, which produces $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & -5\\ 0 & 1 & 1 & -1 & 1\\ 0 & 0 & {\bf 1} & 2 & 5\\ 0 & 0 & 0 & 1 & 1 \end{array} \right] $$

  • Finally, the 4th column already has a 1 in the 4th row: $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & -5\\ 0 & 1 & 1 & -1 & 1\\ 0 & 0 & 1 & 2 & 5\\ 0 & 0 & 0 & {\bf 1} & 1 \end{array} \right] $$

  • In this form, the matrix has the following properties:
    • All pivots are 1.
    • All elements below a pivot in its column are zero.
    • One can work backwards to solve the equations.
    This form, an intermediate step before the final "identity" like form is called the Reduced Echelon Form (REF).

  • We'll now continue, and produce zeroes above each pivot.

  • The first pivot has nothing above: $$ \left[ \begin{array}{cccc|c} {\bf 1} & 2 & 0 & -5 & -5\\ 0 & 1 & 1 & -1 & 1\\ 0 & 0 & 1 & 2 & 5\\ 0 & 0 & 0 & 1 & 1 \end{array} \right] $$

  • Next, look at the second pivot: $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & -5\\ 0 & {\bf 1} & 1 & -1 & 1\\ 0 & 0 & 1 & 2 & 5\\ 0 & 0 & 0 & 1 & 1 \end{array} \right] $$ The row op \({\bf r}_1 \leftarrow {\bf r}_1 - 2{\bf r}_2\) produces $$ \left[ \begin{array}{cccc|c} 1 & 0 & -2 & -3 & -7\\ 0 & {\bf 1} & 1 & -1 & 1\\ 0 & 0 & 1 & 2 & 5\\ 0 & 0 & 0 & 1 & 1 \end{array} \right] $$

  • The next pivot is in row 3, column 3: $$ \left[ \begin{array}{cccc|c} 1 & 0 & -2 & -3 & -7\\ 0 & 1 & 1 & -1 & 1\\ 0 & 0 & {\bf 1} & 2 & 5\\ 0 & 0 & 0 & 1 & 1 \end{array} \right] $$ The row ops \({\bf r}_1 \leftarrow {\bf r}_1 + 2{\bf r}_3\) and \({\bf r}_2 \leftarrow {\bf r}_2 - {\bf r}_3\) produce $$ \left[ \begin{array}{cccc|c} 1 & 0 & 0 & 1 & 3\\ 0 & 1 & 0 & -3 & -4\\ 0 & 0 & {\bf 1} & 2 & 5\\ 0 & 0 & 0 & 1 & 1 \end{array} \right] $$

  • Finally, applying \({\bf r}_1 \leftarrow {\bf r}_1 - {\bf r}_4\), \({\bf r}_2 \leftarrow {\bf r}_2 + 3{\bf r}_4\) and \({\bf r}_3 \leftarrow {\bf r}_3 - 2{\bf r}_4\) produces $$ \left[ \begin{array}{cccc|c} 1 & 0 & 0 & 0 & 2\\ 0 & 1 & 0 & 0 & -1\\ 0 & 0 & 1 & 0 & 3\\ 0 & 0 & 0 & {\bf 1} & 1 \end{array} \right] $$

  • The augmented column now contains the solution \((x_1,x_2,x_3,x_4) = (2,-1,3,1)\).

  • This final form of the augmented matrix is called the Row Reduced Echelon Form (RREF) of the augmented matrix.

  • If we remove the augmented (last) column of the augmented RREF, we will have the RREF of original (not augmented) matrix.

  • For emphasis, let's identify all the pivots: $$ \left[ \begin{array}{cccc|c} {\bf 1} & 0 & 0 & 0 & 2\\ 0 & {\bf 1} & 0 & 0 & -1\\ 0 & 0 & {\bf 1} & 0 & 3\\ 0 & 0 & 0 & {\bf 1} & 1 \end{array} \right] $$ Ignoring the augmented column, every row has exactly one pivot, as does every column.

  • In this particular \(4\times 4\) RREF, there are exactly \(4\) pivots.
 

In-Class Exercise 18: Apply the same operations in the same order to the \(4\times 4\) identity matrix to compute the inverse of the coefficient matrix. Then enter the inverse matrix in FourVariableExample.java and check that \({\bf A}^{-1} {\bf A} = {\bf I}\).
 

 

In-Class Exercise 19: Apply the method above, step-by-step, to solve $$ \mat{2 & 1 & 1 & 3\\ 1 & -1 & 1 & 0\\ 0 & 0 & 2 & 1\\ 1 & 0 & 1 & 0} \vecfour{x_1}{x_2}{x_3}{x_4} \eql \vecfour{6}{-2}{-2}{1} $$ Enter your solution in FourVariableExample2.java and check that \({\bf A x} = {\bf b}\). How many pivots were identified?
 

Let's summarize the approach through pseudocode:

  1.   A+ = computeAugmentedMatrix (A, b)
  2.   A-1  = identity matrix of size n
  3.   for p=1 to n
          // First, make the coefficient of the pivot equal to 1
  4.      α = A+[p][p]
  5.      for j=1 to n
  6.         A+[p][j] = A+[p][j] / α
  7.         A-1[p][j] = A-1[p][j] / α
  8.      endfor
 
          // Zero out the coefficients below the pivot    
  9.      for r=p+1 to n
  10.         apply the appropriate row-operation to row r of A+
  11.         do the same for A-1
  12.     endfor
  13.  endfor
  
       // This produces the REF in A+, but nothing useful (yet) in A-1
       // Continue to the RREF ...

  14.  for p=1 to n
          // Zero out the coefficients above the pivot
  15.     for r=p-1 to 1
  16.         apply the appropriate row-operation to row r of A+
  17.         do the same for A-1
  18.     endfor
  19.  endfor
  
  20.  x = last column of A+
  21.  return x, A-1
 

A question arises:

  • Why compute the inverse at all, when we are already computing the solution \({\bf x}\)?

  • In many applications, it turns out, the coefficients remain the same while the right-hand-side changes.

  • Then, we compute the inverse just once, and apply it to each different right-hand-side: \({\bf x} = {\bf A}^{-1} {\bf b}\).
 

Thus, it appears we have an algorithm to solve \({\bf A x} = {\bf b}\).

Unfortunately, we need to refine the algorithm to handle a few cases:

  • When a pivot happens to be zero.

  • When no solution exists.

  • When multiple solutions solve the problem.
 


5.4   Handling special cases: no-solution, multiple-solutions

 

Row swaps:

  • Suppose, after row operations for the first column, we we end up with $$ \left[ \begin{array}{ccc|c} {\bf 1} & 2 & -1 & 3\\ 0 & 0 & 2 & 10\\ 0 & 3 & -1 & 4 \end{array} \right] $$ Here, the pivot is 1, and the column below has zeroes, as desired.

  • The next pivot should be in row 2, column 2: $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & {\bf 0} & 2 & 10\\ 0 & 3 & -1 & 4 \end{array} \right] $$ But no multiplier can remove the 3 that's below this pivot.

  • In this situation, we look further below in the column and swap the nearest row below that has a non-zero element in the same column: $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & {\bf 3} & -1 & 4\\ 0 & 0 & 2 & 10 \end{array} \right] $$

  • As we've seen, swapping rows has no effect on the solution.

  • Thus, we're going to include the possibility of looking further below in the next column to find the next pivot.

  • What happens if all the elements below (in the next column) are zero?
          \(\rhd\) That column will have no pivot.
 

The case when a column has no pivot:

  • Consider this example: $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & {\bf 0} & 2 & 10\\ 0 & {\bf 0} & -1 & 4 \end{array} \right] $$

  • The first column's pivot has been found.

  • The second column has no pivot because, below row 1, all the elements are zero.

  • In this case, there will be no pivot in the 2nd column. We move onto the next column: $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & 0 & {\bf 2} & 10\\ 0 & 0 & -1 & 4 \end{array} \right] $$

  • We still continue and complete the RREF, allowing that some columns will skipped and won't have pivots.
 

What happens when there is no solution:

  • Suppose, after row operations for the first column, we we end up with $$ \left[ \begin{array}{ccc|c} {\bf 1} & 2 & -1 & 3\\ 0 & 1 & 2 & 13\\ 0 & 2 & 4 & 20 \end{array} \right] $$ Here, the pivot is 1, and the column below has zeroes, as desired.

  • The next pivot is in row 2, column 2: $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & {\bf 1} & 2 & 13\\ 0 & 2 & 4 & 20 \end{array} \right] $$

  • Applying \({\bf r}_3 \leftarrow {\bf r}_3 - 2{\bf r}_2\) we get $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & {\bf 1} & 2 & 13\\ 0 & 0 & 0 & -6 \end{array} \right] $$ The last row is obviously a contradiction. (Write the equation and see.)

  • This is because no solution is possible for this system.
     

    In-Class Exercise 20: How do we know this? Could it be possible that some row swaps applied first could result in a normal sequence of row operations that produce a solution?
     

  • Notice that even though this problem has no solutions, there are two pivots: $$ \left[ \begin{array}{ccc|c} {\bf 1} & 2 & -1 & 3\\ 0 & {\bf 1} & 2 & 13\\ 0 & 0 & 0 & -6 \end{array} \right] $$
          \(\rhd\) This will be significant later when we understand linear dependence.
 

In-Class Exercise 21: Apply row operations to $$ \left[ \begin{array}{ccc|c} 2 & -3 & 2 & 5\\ -3 & 7 & -5 & -9\\ 1 & 1 & -1 & 0 \end{array} \right] $$ Identify the pivots.
 

Multiple solutions:

  • Suppose we end up with $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & {\bf 1} & 2 & 13\\ 0 & 2 & 4 & 26 \end{array} \right] $$

  • Now applying \({\bf r}_3 \leftarrow {\bf r}_3 - 2{\bf r}_2\) we get $$ \left[ \begin{array}{ccc|c} 1 & 2 & -1 & 3\\ 0 & {\bf 1} & 2 & 13\\ 0 & 0 & 0 & 0 \end{array} \right] $$ We've now run out of rows.

  • Nonetheless, continuing, let's zero out above the pivot by applying \({\bf r}_1 \leftarrow {\bf r}_1 - 2{\bf r}_2\) to get $$ \left[ \begin{array}{ccc|c} 1 & 0 & -5 & -23\\ 0 & {\bf 1} & 2 & 13\\ 0 & 0 & 0 & 0 \end{array} \right] $$

  • To get some insight, let's write out the equations: $$\eqb{ {\bf x_1} & & & \miss & 5x_3 & \eql -23\\ & \plss & {\bf x_2} & \plss & 2x_3 & \eql 13\\ }$$ Here, we've emphasized the variables corresponding to the two pivots.

  • Observe that setting \(x_3=0\) gives a solution: \((x_1,x_2,x_3) = (-23,13,0)\).

  • So does \(x_3=1\), which gives a different solution: \((x_1,x_2,x_3) = (-18,11,1)\).

  • Any value of \(x_3\) results in a solution.

  • Terminology:
    • Variables like \(x_3\) that were not pivots are called free variables.
    • Variables corresponding to pivots are called basic variables.

  • In a system with multiple solutions, all each setting of the free variables will give a different solution for the basic variables.
 

In-Class Exercise 22: Apply row operations to $$ \left[ \begin{array}{ccc|c} 2 & -3 & 2 & 5\\ -3 & 7 & -5 & -10\\ 1 & 1 & -1 & 0 \end{array} \right] $$ Identify the pivots and one possible solution.
 

A couple more special cases:

  • As the cases above make clear, the RREF is a tool for analyzing a matrix.

  • Thus, we will also be interested in analyzing non-standard matrices, with \(m\) rows and \(n\) columns.

  • For example, consider this \(m > n\) example: $$ \mat{2 & 0\\ 0 & 3\\ 1 & 1\\ 3 & 0} \vectwo{x_1}{x_2} \eql \vecfour{4}{9}{5}{6} $$
    • The RREF turns out to be $$ \left[ \begin{array}{cc|c} {\bf 1} & 0 & 2\\ 0 & {\bf 1} & 3\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{array} \right] $$
    • There are two pivots, and because the other rows have no contradictions, there is a solution \((x_1,x_2) = (2,3)\).
     

    In-Class Exercise 23: Work through the steps to compute the RREF in the example above.
     

  • Consider this variation: $$ \mat{2 & 0\\ 0 & 3\\ 1 & 1\\ 3 & 0} \vectwo{x_1}{x_2} \eql \vecfour{4}{9}{5}{4} $$ Here, the RREF turns out to be $$ \left[ \begin{array}{cc|c} {\bf 1} & 0 & 2\\ 0 & {\bf 1} & 3\\ 0 & 0 & 0\\ 0 & 0 & -2 \end{array} \right] $$ Which does contain a contradiction row (the last one), and so, there is no solution.
 

The number of pivots of a matrix seems like a fundamental proprerty of the matrix, and it is:

  • Definition: the number of pivots in a matrix is called the rank of the matrix.
For now, we'll just state the definition. The next module will show how critical this is.
 


Summary of the RREF procedure:

  • If a pivot is zero, try to swap a row below with a non-zero element in the same columns.

  • If the next column does not have any candidate pivot, look at columns further to the right.
          \(\rhd\) This means the skipped column will not have a pivot

  • If any row has all zeroes except for the last column, we have a contradiction
          \(\rhd\) There is no solution.

  • If there is no contradiction, but a row has all zeroes there might be multiple solutions.
    • The non-pivot variables, which are free variables, typically set to zero.
    • The pivot variables are the basic variables, whose solution depends on setting the free variables.

  • The columns with pivots will turn out to be special later
          \(\rhd\) They are the linearly independent columns.
 

In-Class Exercise 24: Why is true that a non-pivot row has all zeroes in that row except possibly the last (augmented) one?
 

 


5.5   An algorithm

 

Let's now refine our RREF-based algorithm to handle the special cases:

Algorithm: computeRREF (A, b)
Input:: matrix A, vector b

  1.   A+ = computeAugmentedMatrix (A, b)
  2.   A-1  = identity matrix of size n
  3.   currentRow = 1, currentCol = 1

       // Now search for the next column that has a non-zero element
       // below the currentRow
  4.   (r,c) = findNextPivot (currentRow, currentCol)

  5.   while r > 0 and c > 0

          // Swap into currentRow if needed
  6.      if r > currentRow
  7.         swap (r, currentRow)
  8.      endif

  9.      currentCol = c

          // The next pivot is now at (currentRow, currentCol)
  10.     isPivotColumn[currentCol] = true
  11.     numPivots = numPivots + 1

          // Make the coefficient of the pivot equal to 1
  12.     α = A+[currentRow][currentCol]
  13.     for j=1 to n
  14.        A+[currentRow][j] = A+[currentRow][j] / α
  15.        A-1[currentRow][j] = A-1[currentRow][j] / α
  16.     endfor
 
          // Zero out the coefficients below the pivot    
  17.     for R=currentRow+1 to n
  18.         apply the appropriate row-operation to row R of A+
  19.         do the same for A-1
  20.     endfor

          // All pivots will be in successive rows from the top, but
          // need not occur in successive columns.
  21.     currentRow = currentRow + 1    
  22.     currentCol = currentCol + 1
  23.     (r,c) = findNextPivot (currentRow, currentCol)

  24.  endwhile
  
       // This produces the REF in A+, but nothing useful (yet) in A-1
       // Continue to the RREF ...

  25.  for k=1 to numPivots
          // Zero out the coefficients above the pivot
  26.     (rp,cp) = findPivot (k)
  27.     for r=rp-1 to 1
  28.         apply the appropriate row-operation to row r of A+
  29.         do the same for A-1
  30.     endfor
  31.  endfor
  
       // Now analyze RREF
  32.  if existsContradictionRow ()
  33.     return no solution
  34.  else if existsAllZeroRows ()
  35.     // ... handle multiple solutions ...
  36.  else
  37.     x = last column of A+
  38.     return x, A-1
  40.  endif
 

Some terminology and further definitions:

  • The term Gaussian elimination is used to describe the combination of:
    • Applying row ops to find the REF (steps 1-24 above).
    • Then, using backwards substitution to actually produce a solution.

  • The term Gauss-Jordan elimination is used to describe the full sequence of steps leading to the RREF.

  • Now for an important observation about computing the RREF:
    • None of the row operation decisions depended on the last, augmented column.
    • Thus, we would apply the same row ops no matter what the value of \({\bf b}\).
    • What this means: we could compute the RREF of the matrix \({\bf A}\) without augmentation.
    • Of course, we would not have a solution without applying the same operations to the augmented column.

  • For some of the theoretical arguments, it will be useful to consider the RREF of the matrix without augmentation.

  • Notation:
    • \({\bf A}^\prime = \mbox{RREF}({\bf A})\), the RREF of the non-augmented \({\bf A}\).
    • \({\bf A}_{+}^\prime = \mbox{RREF}({\bf A}_{+})\), the RREF of the augmented matrix \({\bf A}_{+}\).

  • Recall, we used the notation \({\bf A}_{+}\) for the augmented matrix.

  • All of the theoretical arguments below will refer to the non-augmented RREF.
          \(\rhd\) We will explicitly use the term augmented when needed.

  • Definition: A matrix, following Gauss-Jordan elimination without augmentation, is in Row Reduced Echelon Form (RREF) if:
    1. No pivot shares a row or column with another pivot.
    2. All entries below and above a pivot are zero.
    3. Except for the first one, a pivot's column is to the right of the previous row's pivot's column.
    4. All the pivot rows are bunched together at the top.
    5. The remaining non-pivot rows are all zero.

  • An example: $$ \left[ \begin{array}{cccccc} {\bf 1} & * & 0 & * & * & 0\\ 0 & 0 & {\bf 1} & * & * & 0\\ 0 & 0 & 0 & 0 & 0 & {\bf 1}\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ \end{array} \right] $$ Note:
    • Three pivots above are in columns 1, 3, and 6.
    • All other elements in pivot columns are zero.
    • Because we search for pivots in the next row, skipping past zero elements, all elements to the left of a pivot are zero (otherwise, we'd have picked one of them as pivot).

  • Definition: An \(n\times n\) RREF matrix has full-rank if every row has a pivot.
          \(\rhd\) A full-rank RREF is the identity matrix, i.e., \({\bf A}^\prime = {\bf I}\).

  • Definition: We will use the term partial-rank for a square RREF matrix that is not full-rank.
          \(\rhd\) A partial-rank RREF has at least one all-zero row.

  • Finally, some linear algebra jargon:

    Definition: a square matrix \({\bf A}\) that has an inverse \({\bf A}^{-1}\) where \({\bf A}^{-1} {\bf A} = {\bf I} = {\bf A}{\bf A}^{-1}\) is called a nonsingular matrix.

  • Thus, a singular matrix has no inverse.

  • Sometimes we say a matrix is invertible if it has an inverse.
 


5.6   Why it works: some theory for \(n \times n\) matrices

 

How do we know (prove) that the above approach will work for every \({\bf Ax}={\bf b}\) problem?
 

What we'd like to prove, at the very least:

  • The RREF algorithm produces a matrix that satisfies the RREF definition.

  • When the RREF completes with a pivot on every row, the same row operations applied to \({\bf I}\) produce the inverse.

  • When the RREF fails to be full-rank (that is, there's least one all-zero row), the inverse does not exist.

  • Can \({\bf A}\) have more than one inverse, and if so, which one is computed by the procedure?
 

It turns out, there is a nice relationship between the existence of an inverse and uniqueness of solution:

  • \({\bf x}={\bf A}^{-1}{\bf b}\) is the unique solution when the inverse exists.

  • If a unique solution exists, so does the inverse.

  • The inverse is itself unique (so, we say "the" inverse instead of "an" inverse).
 

Let's do the easy ones first:

  • Proposition 5.1: The RREF algorithm produces a matrix that satisfies the RREF definition.
    Proof: There are five aspects to the definition, as we've seen:
    1. No pivot shares a row or column with another pivot.
    2. All entries below and above a pivot are zero.
    3. Except for the first one, a pivot's column is to the right of the previous row's pivot's column.
    4. All the pivot rows are bunched together at the top.
    5. The remaining non-pivot rows are all zero.
    Clearly, just by the way the algorithm works, none of the pivots share a row or column, and every successive pivot occurs to the right of the previous one. By row swapping, all the pivot rows are bunched at the top. All that remains is to prove that the non-pivot rows have only zeroes. Suppose a non-pivot row has a non-pivot element \(c \neq 0\) in column \(k\), as in this example: $$ \left[ \begin{array}{cccccc} {\bf 1} & * & 0 & * & * & 0\\ 0 & 0 & {\bf 1} & * & * & 0\\ 0 & 0 & 0 & 0 & 0 & {\bf 1}\\ 0 & 0 & 0 & 0 & d & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & c & 0\\ \end{array} \right] $$ When it was column \(k\)'s turn to look for a pivot, element \(c\) was ignored as a pivot. This means there was a pivot \(d\) above it. But that means, the pivot \(d\) would have eliminated \(c\) through row reduction. Therefore we have a contradiction (to the purported existence of \(c\)).    \(\Box\)

  • Proposition 5.2: When the RREF completes with a pivot on every row, the same row operations applied to \({\bf I}\) produces the inverse.
    Proof:
    • We already proved this, but let's repeat for clarity.
    • If the RREF completes with a full set of pivots, this means there were row operations and corresponding row-operation matrices \({\bf R}_1, {\bf R}_2, \ldots, {\bf R}_k\) such that $$ {\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 {\bf A} \eql {\bf I} $$
    • Let $$ {\bf R} \defn {\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 $$

    • Which means $$ {\bf R} {\bf A} \eql {\bf I} $$ And so, \({\bf R} \eql {\bf A}^{-1}\). Technically, since at this juncture we don't yet know that the inverse is unique, we should say that \({\bf R}\) is an inverse.

    • Next multiply each side by the identity to see that \( {\bf R} {\bf I} \eql {\bf A}^{-1} \)    \(\Box\)

  • Proposition 5.3: \({\bf x}={\bf A}^{-1}{\bf b}\) is the unique solution when the inverse exists.
    Proof: If there are two solutions \({\bf Ax}={\bf b}\) and \({\bf Ay}={\bf b}\), then because the inverse exists (as assumed), we'll multiply both sides by the inverse: \({\bf A}^{-1}{\bf Ax}={\bf A}^{-1}{\bf b}\) and \({\bf A}^{-1}{\bf Ay}={\bf A}^{-1}{\bf b}\). Which means \({\bf I x}={\bf A}^{-1}{\bf b}\) and \({\bf I y}={\bf A}^{-1}{\bf b}\). Or both \({\bf x}\) and \({\bf y}\) are equal (to \({\bf A}^{-1}{\bf b}\)).    \(\Box\)

  • Proposition 5.4: The inverse is unique.
    Proof: Suppose there are two matrices \({\bf B}\) and \({\bf C}\) such that \({\bf BA} = {\bf I}\) and \({\bf CA} = {\bf I}\). Then, $$\eqb{ {\bf B} & \eql & {\bf B I}\\ & \eql & {\bf B} ({\bf AC}) \\ & \eql & ({\bf BA}) {\bf C}\\ & \eql & {\bf I} {\bf C}\\ & \eql & {\bf C}\\ }$$ Notice that we replaced \({\bf I}\) with \({\bf AC}\) above, which assumes that \({\bf C}\) multiplying on the right is an inverse. If our definition of inverse was only a left inverse, we'd need an alternative proof.    \(\Box\)
 

Now for the harder ones.
 

It will help to consider the special case \({\bf b}={\bf 0}\):

  • Recall: \({\bf 0}\) is the zero vector: \({\bf 0} = (0,0,\ldots,0)\)

  • The equation \({\bf A x} = {\bf 0}\) is of special importance in the theory.

  • First, observe that \({\bf A x} = {\bf 0}\) always has at least one solution, \({\bf x} = {\bf 0}\), for any matrix \({\bf A}\).

  • Proposition 5.5: Consider the equation \({\bf A}^\prime {\bf x} = {\bf 0}\) where \({\bf A}^\prime\) is an RREF matrix. Then:
    1. if \({\bf A}^\prime\) is full-rank, \({\bf x}={\bf 0}\) is the only solution.
    2. if \({\bf A}^\prime\) is not full-rank, there is at least one non-zero solution \({\bf x} \neq {\bf 0}\)
 

In-Class Exercise 25: First, create example \(3\times 3\) RREFs (just the final RREFs, without the starting equations) for each of the two cases. Show how Proposition 5.5 is true for your example RREFs. Then prove Proposition 5.5.
 


We'll now prove the remaining results:

  • Proposition 5.6: When the RREF fails to be full-rank because of zero rows, the inverse does not exist.
    Proof: Suppose, to the contrary, that the inverse \({\bf A}^{-1}\) exists. Then consider \({\bf Ax} = {\bf 0}\). $$\eqb{ \implies & {\bf A}^{-1} {\bf Ax} & \eql & {\bf A}^{-1} {\bf 0}\\ \implies & {\bf I x} & \eql & {\bf 0}\\ \implies & {\bf x} & \eql & {\bf 0}\\ }$$ Thus, if \({\bf x}\neq {\bf 0}\) is a solution, the inverse does not exist. But proposition 5.5 showed that the only time \({\bf x}\neq {\bf 0}\) is when the RREF is not full-rank. Thus, if it's not full-rank, the inverse does not exist.    \(\Box\)

  • The first part of Proposition 5.6 can be strengthened and will shortly be useful.
    • The proof of 5.6 showed that if \({\bf A}^{-1}\) exists then \({\bf Ax}={\bf 0}\) has the unique solution \({\bf x}={\bf 0}\)
    • The other direction turns out to be true as well: if \({\bf Ax}={\bf 0}\) has the unique solution \({\bf x}={\bf 0}\) then \({\bf A}^{-1}\) exists.

  • Proposition 5.7: The inverse \({\bf A}^{-1}\) exists if and only if \({\bf Ax}={\bf 0}\) has the unique solution \({\bf x}={\bf 0}\).
    Proof: Proposition 5.6 proved the forward direction. Let's now take as a given that \({\bf Ax}={\bf 0}\) has the unique solution \({\bf x}={\bf 0}\). The RREF process applied to \({\bf Ax}={\bf 0}\) results in an RREF \({\bf A}^\prime\) with \({\bf 0}\) in the augmented column because no row reduction changes an all-zero column. If \({\bf A}^\prime\) were not full-rank, then by Proposition 5.5, there would be a non-zero solution, so \({\bf A}^\prime\) is full-rank. Then, because \({\bf A}^\prime\) is full-rank, by Proposition 5.2 above, the inverse exists.    \(\Box\)

  • Proposition 5.8: If \({\bf A}\) does have an inverse, the procedure always finds it.
    Proof: What we need to worry about is the case where there is an inverse but somehow the algorithm finds an RREF that's not full-rank. But Proposition 5.6 above showed that if the RREF is not full-rank, the inverse does not exist. We also showed that the algorithm always produces an RREF (Proposition 5.1), which means that the only RREF produced when an inverse exists is the full-rank one.    \(\Box\)

  • Proposition 5.9 If a unique solution to \({\bf Ax}={\bf b}\) exists, the inverse \({\bf A}^{-1}\) exists.
    Proof: Suppose the unique solution to \({\bf Ax}={\bf b}\) is \({\bf x}^\prime\). From Proposition 5.7, if the inverse does not exist, then we know that \({\bf Ax}={\bf 0}\) has a non-zero solution \({\bf y}\neq {\bf 0}\). Let \({\bf z} = {\bf x}^\prime + {\bf y}\). Because \({\bf y}\neq {\bf 0}\), it must be that \({\bf z} \neq {\bf x}^\prime\). Now, $$\eqb{ {\bf A z} & \eql & {\bf A} ( {\bf x}^\prime + {\bf y} ) \\ & \eql & {\bf A} {\bf x}^\prime + {\bf A}{\bf y} \\ & \eql & {\bf b} + {\bf A}{\bf y} \\ & \eql & {\bf b} + {\bf 0} \\ & \eql & {\bf b} }$$ Which means that \({\bf z}\) is a different (from the supposed unique) solution to \({\bf Ax}={\bf b}\), a contradiction. The contradiction followed from assuming the inverse did not exist, which means that it does.    \(\Box\)
 


There's a neat implication of the above results:

  • All of the following are equivalent (each one implies any of the others):
    • \({\bf A}^{-1}\) exists.
    • \({\bf Ax}={\bf b}\) has a unique solution.
    • \({\bf Ax}={\bf 0}\) has the unique solution \({\bf x}={\bf 0}\).
    • Each row and column of the RREF has exactly one pivot, with all other entries zero
            \(\rhd\) That is, \(RREF({\bf A}) = {\bf I}\).
 


Finally, some meta-level observations about the proofs above:

  • One wonders why the proofs seemed so roundabout, and why solutions to \({\bf Ax}={\bf 0}\) were important.

  • There are a couple of key ideas.

  • First, the uniqueness of solution to the RREF equation \({\bf A^\prime x}={\bf 0}\) (in 5.5) is reasoned directly from the RREF.

  • Because row ops don't change the solution, this means the same result applies to \({\bf A x}={\bf 0}\).

  • Also, the uniqueness tells us whether or not the inverse exists since uniqueness implies the RREF=\({\bf I}\).

  • Lastly, uniqueness of the \({\bf A x}={\bf 0}\) solution translates into uniqueness of \({\bf A x}={\bf b}\), via the algebraic argument in Proposition 5.9.
 


5.7   Non-square matrices

 

Intuition suggests the following:

  • If there are too few equations (more columns than rows), we are likely to have multiple solutions.

  • If there are too many equations, it's unlikely we'll find a solution (because there are too many equations to satisfy).

This is generally the case but there are exceptions:
  • We saw earlier that $$ \mat{2 & 0\\ 0 & 3\\ 1 & 1\\ 3 & 0} \vectwo{x_1}{x_2} \eql \vecfour{4}{9}{5}{6} $$ has a solution \((x_1,x_2)=(2,3)\).

  • Observe that the 3rd and 4th rows can be derived from linear combinations of the first two.

  • This suggests redundancy in the equations but consistency (since a solution exists).
 


What can we say about non-square matrices and our solution approach?

  • The RREF algorithm will still run to completion.

  • When an augmented matrix has a contradiction, there will be no solution.

  • If the number of pivots is exactly the same as the number of variables (with the other rows zero), there is a unique solution
          \(\rhd\) See the example above.

  • If there are fewer pivots than variables, there are multiple solutions with basic and free variables.
          \(\rhd\) Basic variables correspond to the pivot columns.
 


Let's look at an example:

  • Consider the equations: $$\eqb{ 2x_1 & \plss & x_2 & \plss & 4x_3 & \miss & x_4 & \eql 4\\ x_1 & & & \miss & 2x_3 & \plss & 4x_4 & \eql 1\\ 3x_1 & \plss & 2x_2 & \plss & 10x_3 & \miss & 6x_4 & \eql 7\\ }$$

  • Which becomes the augmented matrix $$ \left[ \begin{array}{cccc|c} 2 & 1 & 4 & -1 & 4\\ 1 & 0 & -2 & 4 & 1\\ 3 & 2 & 10 & -6 & 7\\ \end{array} \right] $$

  • Whose RREF is: $$ \left[ \begin{array}{cccc|c} {\bf 1} & 0 & -2 & 4 & 1\\ 0 & {\bf 1} & 8 & -9 & 2\\ 0 & 0 & 0 & 0 & 0\\ \end{array} \right] $$
     

    In-Class Exercise 26: Fill in the steps from augmented matrix to RREF above.
     

  • Notice the structure of the RREF:
    • The pivot rows are at the top.
    • Note: although pivot columns are at the left, this need not always happen.
            \(\rhd\) There is no pattern to expect from pivot columns.
    • Rows without pivots are all zero or will have a contradiction (this example does not).
    • Entries in the pivot column are zero other than the pivot.
    • Entries in a pivot row can be non-zero in the non-pivot columns.

  • What does the RREF tell us about solutions?
    • There are multiple solutions.
    • There are two basic variables corresponding to the first two columns
            \(\rhd\) \(x_1,x_2\) are basic variables.
    • Thus, \(x_3, x_4\) are free variables.
            \(\rhd\) Any choice of values for these will result in fixing a choice for \(x_1,x_2\).
    • Example: if \(x_3=1, x_4=1\), then \(x_1=-1, x_2=3\).

  • A slightly different and useful perspective comes from setting \({\bf Ax} = {\bf 0}\), which results in the RREF $$ \left[ \begin{array}{cccc|c} {\bf 1} & 0 & -2 & 4 & 0\\ 0 & {\bf 1} & 8 & -9 & 0\\ 0 & 0 & 0 & 0 & 0\\ \end{array} \right] $$
    • Once again, setting any values for free variables \(x_3, x_4\) results in fixing the values for \(x_1, x_2\)
    • Example: if \(x_3=1, x_4=1\), then \(x_1=-2, x_2=1\).
    • What this illustrates: \({\bf Ax} = {\bf 0}\) has the non-zero solution \((x_1,x_2,x_3,x_4) = (-2,1,1,1)\).
 


5.8   Alternative pivoting

 

We don't always have to pick the very next row when seeking a pivot:

  • In the \(4\times 4\) example earlier, after dealing with the first pivot in column 1, we arrived at: $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & 2\\ 0 & {\bf 1} & 1 & -1 & 1\\ 0 & 1 & 2 & 1 & 6\\ 0 & 2 & 4 & 3 & 13 \end{array} \right] $$

  • Now, instead of the above pivot in the second row, we could swap any other row into second place, e.g., $$ \left[ \begin{array}{cccc|c} 1 & 2 & 0 & -5 & 2\\ 0 & {\bf 2} & 4 & 3 & 13\\ 0 & 1 & 2 & 1 & 6\\ 0 & 1 & 1 & -1 & 1\\ \end{array} \right] $$

  • In fact, any choice of a non-zero number in the pivot column that's below the original choice will do.

  • This choice leads to the question: are some pivot choices better than others?
 

Consider this well-known example:

  • Suppose we have the equations $$\eqb{ 10^{-k} x_1 & \plss & x_2 & \eql 1\\ x_1 & \miss & x_2 & \eql 0 }$$

  • Solving by hand gives \(x_1 = x_2 = 1/(1+10^{-k}) \approx 1\), if \(k\) is large.

  • But finite arithmetic can cause a problem, as we'll see.

  • The starting point is the augmented matrix $$ \left[ \begin{array}{cc|c} 10^{-k} & 1 & 1\\ 1 & -1 & 0 \end{array} \right] $$

  • The first row scaled results in $$ \left[ \begin{array}{cc|c} {\bf 1} & 10^k & 10^k\\ 1 & -1 & 0 \end{array} \right] $$

  • Zeroing out the 1 below the pivot $$ \left[ \begin{array}{cc|c} {\bf 1} & 10^k & 10^k\\ 0 & -1-10^k & -10^k \end{array} \right] $$ For large \(k\), this will be represented by $$ \left[ \begin{array}{cc|c} {\bf 1} & 10^k & 10^k\\ 0 & -10^k & -10^k \end{array} \right] $$ which reduces to $$ \left[ \begin{array}{cc|c} {\bf 1} & 10^k & 10^k\\ 0 & 1 & 1 \end{array} \right] $$

  • This gets solved as \((x_1,x_2) = (0,1)\), drastically different.

  • The problem was that a tiny pivot, through scaling, resulted in a huge effect on other entries.

  • Instead, if we'd swapped rows initially, we'd have $$ \left[ \begin{array}{cc|c} 1 & -1 & 0\\ 10^{-k} & 1 & 1\\ \end{array} \right] $$ which reduces to $$ \left[ \begin{array}{cc|c} 1 & -1 & 0\\ 0 & 1+10^{-k} & 1\\ \end{array} \right] $$ which gets rounded to $$ \left[ \begin{array}{cc|c} 1 & -1 & 0\\ 0 & 1 & 1\\ \end{array} \right] $$

  • This gets reduced to: $$ \left[ \begin{array}{cc|c} 1 & 0 & 1\\ 0 & 1 & 1\\ \end{array} \right] $$ and the solution \((x_1,x_2) = (1,1)\).
 

Partial pivoting:

  • In this approach, when facing a choice of pivots, one finds the pivot with the largest absolute value and swaps that row.

  • This strategy has been found to be effective in practice.

  • This pivoting option is known as partial pivoting.

  • Changing the pivot strategy will change the REF so obtained, but interestingly, not the RREF.
 


5.9   Uniqueness of the RREF

 

First, note that our particular algorithm above is deterministic, and so, for any matrix, there's only one output.

However, an alternative RREF algorithm could use a different pivoting strategy.

Nonetheless, as we'll show, there's one and only one RREF for a matrix.
 

First, for illustration, let's violate the RREF rules:

  • Consider the (not augmented) matrix $$ \mat{ 1 & 1 & 1 & 0 & 3\\ -1 & 0 & 1 & 1 & -1\\ 0 & 1 & 2 & 1 & 2\\ 0 & 0 & 0 & 0 & 0 } $$

  • Suppose we violate the rules and start with the second column: $$ \mat{ 1 & {\bf 1} & 1 & 0 & 3\\ -1 & 0 & 1 & 1 & -1\\ 0 & 1 & 2 & 1 & 2\\ 0 & 0 & 0 & 0 & 0 } $$ Then \({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_1\) produces $$ \mat{ 1 & {\bf 1} & 1 & 0 & 3\\ -1 & 0 & 1 & 1 & -1\\ 0 & 0 & 1 & 1 & -1\\ 0 & 0 & 0 & 0 & 0 } $$
  • Next, suppose we use the third column for the next pivot: $$ \mat{ 1 & {\bf 1} & 1 & 0 & 3\\ -1 & 0 & {\bf 1} & 1 & -1\\ 0 & 0 & 1 & 1 & -1\\ 0 & 0 & 0 & 0 & 0 } $$ Which results, after two row ops, in $$ \mat{ 2 & {\bf 1} & 0 & -1 & 3\\ -1 & 0 & {\bf 1} & 1 & -1\\ 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 } $$

  • This is not in RREF form.

  • What should be clear is that the left-to-rightness of the RREF procedure matters.
          \(\rhd\) Pivots are created starting from the left and going only as far right as needed.

  • Thus, it's only the pivoting strategy that can possibly change an RREF.
 

Uniqueness:

  • If an RREF is full rank, it's already unique because it's the identity matrix \({\bf I}\).

  • So, any non-uniqueness, if it occurs, must occur in the non-pivot columns.

  • The intuition for why the RREF is unique can be seen by examining two successive pivot columns with a gap between, for example: $$ \mat{ \ldots & 0 & * & * & 0 & \ldots \\ \ldots & {\bf 1} & * & * & 0 & \ldots \\ \ldots & 0 & * & * & {\bf 1} & \ldots \\ \ldots & 0 & 0 & 0 & 0 & \ldots \\ \ldots & 0 & 0 & 0 & 0 & \ldots } $$

  • Note that no matter what pivoting strategy was used, it does not affect the zeroes in the columns in between the two pivot columns above.

  • Thus, the location of the second pivot is the same, even if a different pivoting strategy was used within that column (via row swaps).

  • This means that the pivots occur in the same columns, irrespective of pivoting strategy.

  • Then, all we need to worry about are the numbers in other columns. Are they the same if different pivoting is used?
    • These numbers are in non-pivot columns.
    • They are completely determined by the pivots to their left.
    • Once a pivot is determined and scaled to 1, the row ops used to eliminate numbers above the pivot are the same.
    • This means numbers in non-pivot columns will be the same, no matter what pivoting strategy is used.

  • A more formal but less intuitive proof can be found in Hefferon's book (pages 58-59).
 


5.10   Summary, some questions, and some tips

 

A high-level summary:

  • It is possible to systematically solve a large system of simultaneous linear equations.

  • There are three basic row operations: scale, swap, and add a multiple of another row.

  • Any of the three fundamental row operations can be expressed as a matrix that transforms the target matrix (the one row ops are applied to).

  • The series of row ops, when applied to the identity matrix \({\bf I}\) produce the inverse \({\bf A}^{-1}\) (provided a unique solution exists).

  • The augmented matrix is a convenient data structure to be able to apply row operations to both sides of the equations.

  • For the simplest \(n\times n\) case with a unique solution, the approach is straightforward
          \(\rhd\) Each column and each row will have a pivot.

  • There are special cases to be handled by expanding the basic algorithm.

  • For a square matrix \({\bf A}\), the following are all equivalent:
    • \({\bf A}\) has an inverse.
    • \(RREF({\bf A}) = {\bf I}\)
    • \({\bf Ax}={\bf 0}\) has \({\bf x}={\bf 0}\) has the only solution.
    • \({\bf Ax}={\bf b}\) has a unique solution.

  • One important implication: if \({\bf Ax}={\bf 0}\) has non-zero solution \({\bf x}\neq {\bf 0}\), the inverse does not exist.

  • For non-square matrices, all bets are off, but we can say this:
    • If there are fewer rows than columns (too few equations), there will very likely be multiple solutions.
    • If there are more rows than columns (too many equations), there will very likely be no solution.

  • In practice, there are ways of selecting pivots for improving computational solution.
 

Some unanswered questions:

  • In an RREF with fewer pivots than columns, what is special about the rows and columns that have pivots?

  • Even if the above approach works, is it the best way to solve \({\bf Ax}={\bf b}\)?

  • What options do we have for approximating a solution when no solution is possible?
 

Tips:

  • Working out examples by hand is tediously boring, but needed to "see what's going on".

  • It is especially important to reflect on the impact of each particular example as you proceed.

  • Take an earlier example, and then work through the algorithm step by step.


© 2016, Rahul Simha