Module objectives
There are few things more boring that solving simultaneous
linear equations. But ...
- We'll see that solving equations is the same thing
as "inverting a matrix".
Of all the linear algebra applications, this (solving
equations) is probably the most used.
- It turns out that the method we will study (Gaussian
elimination) is not actually used in practice but it
plays a key role in developing the theory.
- And, most importantly, in understanding the theory.
- Thinking through the details of an algorithm
will force us to handle cases with
no solutions, multiple solutions etc.
So, let's roll up our sleeves and dig in.
By the end of this module, you should be able to
- Work through a systematic approach to solving \({\bf A x} = {\bf b}\).
- Explain why matrix-matrix transformations lead to computing
the inverse.
- Describe and apply row operations on augmented matrices.
- Solve problems by hand based on the RREF and REF.
- Describe an algorithm that analyzes \({\bf A x} = {\bf b}\),
reporting a solution if one exists, or multiple solutions,
or knowing when none exists.
- Explain the terms pivot, pivot row, and pivot column.
But, first, a little practice exercise multiplying matrices.
In-Class Exercise 1:
Work out the following matrix products:
$$\eqb{
\mbox{(1)} & \mat{0 & 1\\1 & 0} \mat{1 & 3 \\ 4 & 2} \\
\mbox{(2)} & \mat{\alpha & 0\\0 & 1} \mat{1 & 3 \\ 4 & 2}\\
\mbox{(3)} & \mat{1 & 0\\-4 & 1} \mat{1 & 3 \\ 4 & 2} \\
}$$
5.0
Matrices that transform matrices
Matrices that transform matrices:
- Thus far, we've seen that a matrix multiplying a vector
transforms that vector into another vector.
- Thus, in \({\bf Ax} = {\bf b}\), the matrix \({\bf A}\)
transforms the vector \({\bf x}\) into the vector \({\bf b}\).
\(\rhd\)
\({\bf A}\) "acts" on \({\bf x}\) to produce \({\bf b}\).
- For example, the matrix \({\bf A} = \mat{1 & 3 \\ 4 & 2}\)
transforms \({\bf x}=(1.5, 2)\) into \((7.5,10)\) because
$$
\mat{1 & 3 \\ 4 & 2} \vectwo{1.5}{2} \eql \vectwo{7.5}{10}
$$
- In a similar manner, we could interpret the matrix multiplication
\({\bf B A} = {\bf C}\) as the matrix \({\bf B}\) transforming
the matrix \({\bf A}\) into the matrix \({\bf C}\).
- Thus, for example, \({\bf B}=\mat{1 & 0\\-4 & 1}\)
multiplied into \({\bf A}=\mat{1 & 3\\4 & 2}\)
produces \({\bf C}=\mat{1 & 3\\0 & -10}\)
$$
{\bf B A}
\eql
\mat{1 & 0\\-4 & 1}
\mat{1 & 3\\4 & 2}
\eql
\mat{1 & 3\\0 & -10}
\eql
{\bf C}
$$
We could say that multiplication by \({\bf B}\) transformed
\({\bf A}\) into \({\bf C}\)
- This view of matrix-matrix multiplication
will later be useful in understanding how the inverse
of a matrix is computed.
Some useful transformations:
- Consider the matrix \(\mat{0 & 1\\1 & 0}\) acting on
\(\mat{1 & 3 \\ 4 & 2}\):
$$
\mat{0 & 1\\1 & 0} \mat{1 & 3 \\ 4 & 2}
\eql
\mat{4 & 2 \\ 1 & 3}
$$
This has the effect of swapping the rows of the second matrix.
In-Class Exercise 2:
Design a matrix to swap rows 2 and 3 of a \(3\times 3\) matrix,
and show how it works on an example. Then design a matrix to
swap rows 3 and 2. What happens when you multiply the matrix
that does the 2-3 swap with the one that does the 3-2 swap?
What is the matrix
that swaps rows \(i\) and \(j\) of an \(n\times n\) matrix?
What happens when you multiply this matrix by itself?
- Next, consider the matrix \(\mat{\alpha & 0\\0 & 1}\)
applied to \(\mat{1 & 3 \\ 4 & 2}\):
$$
\mat{\alpha & 0\\0 & 1} \mat{1 & 3 \\ 4 & 2}
\eql
\mat{\alpha & 3\alpha \\ 4 & 2}
$$
This has the effect of multiplying the first row of
\(\mat{1 & 3 \\ 4 & 2}\) by \(\alpha\).
In-Class Exercise 3:
Design a matrix to divide row 3 of a \(3\times 3\) matrix
by 2 and show how it works on an example. What is the matrix
that divides row \(i\) of an \(n\times n\) matrix by the number
\(\beta\)? What is the inverse of this matrix?
- We'll now look at a slightly more complicated, but
extremely useful transformation.
- But before that, let's introduce some notation:
- Let \({\bf r_A}(i)\) denote the i-th row of matrix \({\bf A}\).
- Example: if
$$
{\bf A} \eql
\mat{1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9}
$$
then \({\bf r_A}(2) = (4,5,6)\) and \({\bf r_A}(3) = (7,8,9)\).
- \({\bf r_A}(i)\) is not really a vector, but since it's
an ordered collection of numbers, we'll occasionally treat
it like a vector for purposes of multiplication.
In-Class Exercise 4:
What is \({\bf r_A}(2) + {\bf r_A}(3) \) in the above example?
- Now for the important transformation.
Consider \(\mat{1 & 0\\-4 & 1}\)
applied to \(\mat{1 & 3 \\ 4 & 2}\):
$$
\mat{1 & 0\\-4 & 1} \mat{1 & 3 \\ 4 & 2}
\eql
\mat{1 & 3 \\ 0 & -10}
$$
- This has the effect of replacing row 2
with the sum of: row 2 and \(-4 \times\) row 1.
- We can separately work out the arithmetic as:
$$\eqb{
4 & \;\;\;2 & \;\;\;\;\;\;\;\mbox{row 2, notated as } r_{\bf A}(2)\\
-4 & \; -12 & \;\;\;\;\;\;\;\mbox{this is -4 times row 1
denoted by } -4 \times r_{\bf A}(1)\\\hline
0 & \;-10 & \;\;\;\;\;\;\;\mbox{the result, the new } r_{\bf A}(2)\\
}$$
- Thus, we are achieving
\({\bf r_A}(2) \leftarrow {\bf r_A}(2) - 4 {\bf r_A}(1)\)
- You can read
\({\bf r_A}(2) \leftarrow {\bf r_A}(2) - 4 {\bf r_A}(1)\)
as:
- (Left side) Replace row 2 with ...
- (On the right) Take the 2nd row, and subtract
4 times the first row.
- The result is a different 2nd row, after we do this.
In-Class Exercise 5:
What is the transformation matrix that achieves
\({\bf r_A}(1) \leftarrow {\bf r_A}(1) - 3 {\bf r_A}(2)\)
for a \(2\times 2\) matrix? What is the effect when applied
to \(\mat{1 & 3\\0 & 1}\)?
In-Class Exercise 6:
What is the transformation matrix that achieves
\({\bf r_A}(i) \leftarrow {\bf r_A}(i) + \alpha {\bf r_A}(j)\)
for an \(n\times n\) matrix? What is the inverse of this matrix?
- To summarize, we've seen three row operations achieved
through matrix multiplication:
- Row operation #1: swap two rows
- Row operation #2: multiply or divide a row by a constant.
- Row operation #3: replace a row by subtracting from it a multiple of another row.
- All three matrices (the ones that transform) have inverses
that "undo" the row operations, respectively.
What good are such transformations? Why would we want to transform
a matrix?
To see, let's first review inverse and identity:
- We will restrict our discussion to square matrices for now.
- The number 1 is called the multiplicative identity
for numbers because any number \(x\) times 1 is \(x\).
- The equivalent identity matrix \({\bf I}\)
has the property that, for any (multiply-compatible) matrix
\({\bf A}\)
$$
{\bf A I} \eql {\bf A}
$$
and
$$
{\bf I A} \eql {\bf A}
$$
- The identity matrix of size \(n \times n\) is:
$$
{\bf I} \eql
\mat{ 1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0\\
\vdots & \vdots & \vdots & \vdots\\
0 & 0 & \cdots & 1
}
$$
- Thus, in this \(2 \times 2\) example:
$$
\mat{1 & 0 \\ 0 & 1}
\mat{1 & 3 \\ 4 & 2}
\eql
\mat{1 & 3 \\ 4 & 2}
$$
and
$$
\mat{1 & 3 \\ 4 & 2}
\mat{1 & 0 \\ 0 & 1}
\eql
\mat{1 & 3 \\ 4 & 2}
$$
In-Class Exercise 7:
Prove that \({\bf I}\) as defined above is indeed
the multiplicative identity for matrices for any size \(n \times n\).
What is the geometric intuition for why this must be so?
- With numbers, the multiplicative inverse of a
number \(x\) is a number \(y\) such that \(yx = 1\).
- For regular number multiplication, we know that
if \(yx = 1\) then \(xy = 1\).
- That is, the number that multiplies into \(x\) to produce
the multiplicative identity.
- The notation \(x^{-1}\) is used to denote the multiplicative
inverse of the number \(x\).
- So, for a matrix \({\bf A}\), the inverse will be denoted
by \({\bf A^{-1}}\).
- The inverse of a square matrix
is defined as the matrix \({\bf A^{-1}}\) such that
$$
{\bf A^{-1} A} \eql {\bf I} \eql {\bf A A^{-1}}
$$
- For example, if \({\bf A} = \mat{1 & 3\\4 & 2}\),
then \({\bf A^{-1}}\) turns out to be \(\mat{-0.2 & 0.3\\0.4 & -0.1}\)
because
$$
\mat{-0.2 & 0.3\\0.4 & -0.1}
\mat{1 & 3\\4 & 2}
\eql
\mat{1 & 0 \\ 0 & 1}
$$
In-Class Exercise 8:
Confirm the above by doing the calculations by hand.
- The inverse is usually written as multiplying on the left
(pre-multiplying): \({\bf A^{-1} A}\).
- Note: we could have defined the inverse solely using
left multiplication:
- If \({\bf A^{-1} A} = {\bf I}\), then one can show that
\({\bf A A^{-1}} = {\bf I}\).
- This is not obvious or trivial to prove (try it).
- In case you were wondering, yes, there are equivalent
definitions of additive identity and additive
inverse, but these are rarely used.
- As we will see, not every square matrix has an inverse.
Now for an important observation:
- Suppose there exist matrices \({\bf R}_1, {\bf R}_2, \ldots,
{\bf R}_k\) such that
$$
{\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 {\bf A}
\eql
{\bf I}
$$
That is, a series of transformations that turn \({\bf A}\)
into the identity matrix \({\bf I}\).
- Then, let
$$
{\bf R} \defn {\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1
$$
- Which means
$$
{\bf R} {\bf A}
\eql
{\bf I}
$$
which by the definition of matrix inverse means that
$$
{\bf R}
\eql
{\bf A}^{-1}
$$
- Thus, interestingly, the product of the transformation
matrices is the inverse itself!
- Furthermore, we can see that
$$
{\bf R I }
\eql
{\bf A}^{-1} {\bf I}
\eql
{\bf A}^{-1}
$$
and so replacing the combined transformation with the sequence:
$$
{\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 {\bf I}
\eql
{\bf A}^{-1}
$$
- That is, applying the same transformations to the identity
matrix produces the inverse.
- Thus, if we can figure out which transformations reduce
a matrix to the identity, we can compute the inverse using
the same transformations applied to \({\bf I}\).
- So, for the all-important matrix-vector equation
\({\bf A x} = {\bf b}\), if we've computed \({\bf A}^{-1}\),
we see that
$$\eqb{
& {\bf A x} & \eql & {\bf b}\\
\Rightarrow \;\;\; & {\bf A}^{-1} {\bf A x} & \eql & {\bf A}^{-1} {\bf b}\\
\Rightarrow \;\;\;& {\bf I x} & \eql & {\bf A}^{-1} {\bf b}\\
\Rightarrow \;\;\; & {\bf x} & \eql & {\bf A}^{-1} {\bf b}\\
}$$
Note: you can read the double-arrow symbol on the left as "which implies that".
- That is, we will have solved our most important equation:
\({\bf Ax}={\bf b}\).
- Let's make this more concrete:
- Suppose we have the equations
$$\eqb{
2x_1 & - & 3x_2 & + & 2x_3 & \eql & 5\\
-2x_1 & + & x_2 & & & \eql & -1\\
x_1 & + & x_2 & - & x_3 & \eql & 0
}$$
- In matrix form, this becomes
$$
\mat{2 & -3 & 2\\
-2 & 1 & 0\\
1 & 1 & -1}
\vecthree{x_1}{x_2}{x_3}
\eql
\vecthree{5}{-1}{0}
$$
- Suppose we've found a way to compute the inverse
$$
\mat{0.5 & 0.5 & 1\\
1 & 2 & 2\\
1.5 & 2.5 & 2}
$$
- Then, the solution to the simultaneous linear equations is:
$$
\vecthree{x_1}{x_2}{x_3}
\eql
\mat{0.5 & 0.5 & 1\\
1 & 2 & 2\\
1.5 & 2.5 & 2}
\vecthree{5}{-1}{0}
$$
- So, it appears we have a way to solving \({\bf Ax}={\bf b}\).
- But there are some questions too:
- Exactly how do we decide which transformations to make?
- Is it guaranteed we'll be able to find transformations
to compute the inverse?
- Is this the fastest way?
- What about equations that don't have a solution or have
multiple solutions?
- What is this mysterious and guttural sounding "RREF"? And
what does it have to do with solving equations?
Before we get to these questions, let's revisit the traditional
way of solving the equations.
5.1
Solving by hand: some insights
Let's look at two examples for some insight.
Example 1:
- Suppose the equations are:
$$\eqb{
x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\
4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2)
}$$
In-Class Exercise 9:
Write the above equations in matrix form.
- There are many ways to proceed:
- We could eliminate \(x_1\) by multiplying the first equation
by 4.
- We could eliminate \(x_1\) by dividing the second equation
by 4.
- We could multiply the first by 2, the second by 3, and
eliminate \(x_2\).
- We will multiply the equation (1) by 4 and write equation
(2) below:
$$\eqb{
4x_1 & + & 12x_2 & \eql & 30 & \bigsp (3) \\
4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2)
}$$
- This gives us a different set of equations.
- What's most important to realize:
- We've changed the set of equations.
- But the solution to the new set is the same
as the solution to the original equations.
In-Class Exercise 10:
Why? What is the precise reasoning?
In-Class Exercise 11:
Write the new set in matrix form.
- We will now subtract equation (2) from equation (3) to
a new set of equations:
$$\eqb{
4x_1 & + & 12x_2 & \eql & 30 & \bigsp (3) \\
& & 10x_2 & \eql & 20 & \bigsp (4)
}$$
In-Class Exercise 12:
Write the new set (3), (4) in matrix form.
- At last, we have a single-variable equation
\(10x_2=20\) which we can trivially solve: \(x_2=2\).
- Then, we can go back and substitute \(x_2=2\) in any
of the equations (1), (2) or (3).
We could say we've solved the equations, but ...
we will do something unusual:
- First, let's write (1) and (4)
$$\eqb{
x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\
& & 10x_2 & \eql & 20 & \bigsp (4)
}$$
- Then, divide (4) by 10:
$$\eqb{
x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\
& & x_2 & \eql & 2 & \bigsp (5)
}$$
In-Class Exercise 13:
Write the new set (1), (5) in matrix form.
- Then, multiply (5) by 3:
$$\eqb{
x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\
& &3x_2 & \eql & 6 & \bigsp (6)
}$$
- Subtract (6) from (1):
$$\eqb{
x_1 & + & 0 & \eql & 1.5 & \bigsp (7) \\
0 & + &3x_2 & \eql & 6 & \bigsp (6)
}$$
- Now write (7) and (5)
$$\eqb{
x_1 & + & 0 & \eql & 1.5 & \bigsp (7) \\
0 & + & x_2 & \eql & 2 & \bigsp (5)
}$$
- We can read off the solution directly from these equations.
In-Class Exercise 14:
Write the new set (7), (5) in matrix form.
There's a minor twist that will be useful:
- Remember how we multiplied equation (1) to get:
$$\eqb{
4x_1 & + & 12x_2 & \eql & 30 & \bigsp (3) \\
4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2)
}$$
- We then replaced (2) with (3)-(2).
- This is the same as starting from
$$\eqb{
x_1 & + & 3x_2 & \eql & 7.5 & \bigsp (1) \\
4x_1 & + & 2x_2 & \eql & 10 & \bigsp (2)
}$$
and replacing (2) with (2) - 4\(\times\)(1).
- In other words, we're using the same row operations
as described earlier for transforming matrices.
- In this example, we did not need the row-swap operation.
Let's look at another, \(3\times 3\) example:
- The equations:
$$\eqb{
2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\
-2x_1 & \plss & x_2 & & & \eql -1\\
x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\
}$$
- We'll start with \(x_1\) and make its coefficient 1 in the first
equation:
- The coefficient is currently 2
$$\eqb{
{\bf 2}x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\
-2x_1 & \plss & x_2 & & & \eql -1\\
x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\
}$$
- We'll need to divide the row by 2:
$$\eqb{
x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
-2x_1 & \plss & x_2 & & & \eql -1\\
x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\
}$$
- Next, we'll eliminate \(x_1\) from row 2:
- To do this, add twice row 1.
- That is, \({\bf r}_2 \leftarrow {\bf r}_2 + 2{\bf r}_1\)
$$\eqb{
{\bf x_1} & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \miss & 2x_2 & \plss & 2x_3 & \eql 4\\
x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\
}$$
- Next, we'll eliminate \(x_1\) from row 3 using
\({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_1\)
$$\eqb{
{\bf x_1} & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \miss & 2x_2 & \plss & 2x_3 & \eql 4\\
0 & \plss & \frac{5}{2}x_2 & \miss & 2x_3 & \eql -\frac{5}{2}\\
}$$
Note: we've marked \(x_1\) in bold to draw attention to the
"column" below it.
In-Class Exercise 15:
Write down the corresponding five matrices in sequence:
the starting matrix, and then the resulting matrices after
each row operation.
- We'll now turn to \(x_2\)
$$\eqb{
x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \miss & {\bf 2x_2} & \plss & 2x_3 & \eql 4\\
0 & \plss & \frac{5}{2}x_2 & \miss & 2x_3 & \eql -\frac{5}{2}\\
}$$
And we'll now eliminate it from the rows below the second row.
- First, we will make the coefficient unity:
\({\bf r}_2 \leftarrow -\frac{1}{2}{\bf r}_2\)
$$\eqb{
x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\
0 & \plss & \frac{5}{2}x_2 & \miss & 2x_3 & \eql -\frac{5}{2}\\
}$$
- Next, we'll eliminate \(x_2\) from row 3:
\({\bf r}_3 \leftarrow {\bf r}_3 - \frac{5}{2} {\bf r}_2\)
$$\eqb{
x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\
0 & \plss & 0 & \plss & \frac{1}{2}x_3 & \eql \frac{5}{2}\\
}$$
- Next, we'll make the coefficient of \(x_3\) into 1:
$$\eqb{
x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \plss & x_2 & \miss & x_3 & \eql -2\\
0 & \plss & 0 & \plss & {\bf \frac{1}{2}x_3} & \eql \frac{5}{2}\\
}$$
We'll do this via
\({\bf r}_3 \leftarrow 2 {\bf r}_3\)
$$\eqb{
x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \plss & x_2 & \miss & x_3 & \eql -2\\
0 & \plss & 0 & \plss & {\bf x_3} & \eql 5\\
}$$
- At this point, we have solved for \(x_3\).
- We use \(x_3=5\) in row 2 and obtain \(x_2\).
Then, we could use both to get \(x_1\) from the first row.
- This approach is called backwards substitution:
- The last row gives us a single-variable equation with the solution.
- Working backwards, each equation going up has one unknown.
- But we will stubbornly plough onwards, changing more
coefficients.
In-Class Exercise 16:
Write down the corresponding three matrices in sequence
after applying the above three row operations.
- We'll now return to \(x_2\) and eliminate that from
all equations above row 2
$$\eqb{
x_1 & \miss & \frac{3}{2}x_2 & \plss & x_3 & \eql \frac{5}{2}\\
0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\
0 & \plss & 0 & \plss & x_3 & \eql 5\\
}$$
In this set, there's only one equation above (row 1)
\(\rhd\)
We need
\({\bf r}_1 \leftarrow {\bf r}_1 + \frac{3}{2} {\bf r}_2\)
Which gives us
$$\eqb{
x_1 & \plss & 0 & \miss & \frac{1}{2}x_3 & \eql -\frac{1}{2}\\
0 & \plss & {\bf x_2} & \miss & x_3 & \eql -2\\
0 & \plss & 0 & \plss & x_3 & \eql 5\\
}$$
- Next, we'll eliminate \(x_3\) from rows 1 and 2:
$$\eqb{
x_1 & \plss & 0 & \miss & \frac{1}{2}x_3 & \eql -\frac{1}{2}\\
0 & \plss & x_2 & \miss & x_3 & \eql -2\\
0 & \plss & 0 & \plss & {\bf x_3} & \eql 5\\
}$$
- Applying \({\bf r}_1 \leftarrow {\bf r}_1 + \frac{1}{2} {\bf
r}_3\) gives
$$\eqb{
x_1 & \plss & 0 & \plss & 0 & \eql 2\\
0 & \plss & x_2 & \miss & x_3 & \eql -2\\
0 & \plss & 0 & \plss & {\bf x_3} & \eql 5\\
}$$
- Then, applying
\({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\) gives
$$\eqb{
x_1 & \plss & 0 & \plss & 0 & \eql 2\\
0 & \plss & x_2 & \plss & 0 & \eql 3\\
0 & \plss & 0 & \plss & x_3 & \eql 5\\
}$$
- We can now just read off the solution on the right side.
In-Class Exercise 17:
Write down the corresponding three matrices in sequence
after applying the above three row operations.
Let's summarize the approach so far:
- Start with the coefficient of \(x_1\) and divide
to make the coefficient \(1\).
- Then, use row operations to eliminate \(x_1\) in the
first column, below the first row.
\(\rhd\)
Which is the same thing as making those coeffs \(0\).
- Then, go to \(x_2\) in row 2, column 2
- Divide to make its coefficient \(1\).
- Eliminate \(x_2\) in rows 3 onwards.
\(\rhd\)
Make the coefficients \(0\) in those rows (in that column).
- Then, \(x_3, x_4, \ldots\) etc.
- Note: each row operation is applied to the whole row,
including the right-hand-side of the equation.
This is the general idea, but it turns out, we'll need to tweak
it to address some issues
\(\rhd\)
For example, when the coefficient of some variable is already zero.
Before that, let's look at another example, but in matrix form.
5.2
Solving by hand using an augmented matrix
The augmented matrix:
- Consider an example of a row operation like the following:
- Apply
\({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\) to
$$\eqb{
x_1 & \plss & 0 & \plss & 0 & \eql 2\\
{\bf 0} & \plss & {\bf x_2} & \miss & {\bf x_3} & \eql {\bf -2}\\
0 & \plss & 0 & \plss & x_3 & \eql 5\\
}$$
to give
$$\eqb{
x_1 & \plss & 0 & \plss & 0 & \eql 2\\
{\bf 0} & \plss & {\bf x_2} & \plss & {\bf 0} & \eql {\bf 3}\\
0 & \plss & 0 & \plss & x_3 & \eql 5\\
}$$
- Here, the row operation applies to both sides of the
second equation.
- In matrix form, at the step before applying the row
operation, it would look like this:
$$
\mat{1 & 0 & 0\\
{\bf 0} & {\bf 1} & {\bf -1}\\
0 & 0 & 1}
\vecthree{x_1}{x_2}{x_3}
\eql
\vecthree{2}{{\bf -2}}{5}
$$
- Thus, for row operations,
we're really treating the right hand side numbers
the same way as we treat any column of the matrix.
- For convenience of both programming and visualization,
we'll put the right-hand-side into a column of a larger
matrix:
$$
\left[
\begin{array}{ccc|c}
1 & 0 & 0 & 2\\
{\bf 0} & {\bf 1} & {\bf -1} & {\bf -2}\\
0 & 0 & 1 & 5
\end{array}
\right]
$$
To make the last artificial column obvious, we'll draw a line
down the middle.
- Now applying
\({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\)
gives
$$
\left[
\begin{array}{ccc|c}
1 & 0 & 0 & 2\\
{\bf 0} & {\bf 1} & {\bf 0} & {\bf 3}\\
0 & 0 & 1 & 5
\end{array}
\right]
$$
- Definition: Given \({\bf Ax}={\bf b}\),
an initial augmented matrix is the matrix \({\bf A}_{+}\)
obtained by adding \({\bf b}\) as an additional,
rightmost column to \({\bf A}\).
- Definition: An augmented matrix is any
matrix obtained from row operations applied in sequence
to an initial augmented matrix.
About augmented matrices:
- They're defined for convenience in programming and for
explaining examples.
- They are not used as matrices to multiply into anything.
- Thus, they're more a data structure than a real matrix.
Let's use this notation and work through a complete example
with both the augmented matrix and the identity:
$$\eqb{
2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\
-2x_1 & \plss & x_2 & & & \eql -1\\
x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\
}$$
- Start with the augmented matrix and the identity side-by-side
$$
\left[
\begin{array}{ccc|c}
2 & -3 & 2 & 5\\
-2 & 1 & 0 & -1\\
1 & 1 & -1 & 0
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1}
$$
- Apply \({\bf r}_1 \leftarrow \frac{{\bf r}_1}{2} \) to
both matrices
$$
\left[
\begin{array}{ccc|c}
1 & -1.5 & 1 & 2.5\\
-2 & 1 & 0 & -1\\
1 & 1 & -1 & 0
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{0.5 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1}
$$
- Apply \({\bf r}_2 \leftarrow {\bf r}_2 + 2{\bf r}_1\):
$$
\left[
\begin{array}{ccc|c}
1 & -1.5 & 1 & 2.5\\
0 & -2 & 2 & 4\\
1 & 1 & -1 & 0
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{0.5 & 0 & 0 \\
1 & 1 & 0 \\
0 & 0 & 1}
$$
- Apply \({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_1\):
$$
\left[
\begin{array}{ccc|c}
1 & -1.5 & 1 & 2.5\\
0 & -2 & 2 & 4\\
0 & 2.5 & -2& -2.5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{0.5 & 0 & 0 \\
1 & 1 & 0 \\
-0.5 & 0 & 1}
$$
- Apply \({\bf r}_2 \leftarrow \frac{{\bf r}_2}{-2}\):
$$
\left[
\begin{array}{ccc|c}
1 & -1.5 & 1 & 2.5\\
0 & {\bf 1} & -1 & -2\\
0 & 2.5 & -2 & -2.5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{0.5 & 0 & 0 \\
-0.5 & -0.5 & 0 \\
-0.5 & 0 & 1}
$$
- To set zeroes below the element \({\bf 1}\),
apply \({\bf r}_3 \leftarrow {\bf r}_3 - 2.5{\bf r}_2\):
$$
\left[
\begin{array}{ccc|c}
1 & -1.5 & 1 & 2.5\\
0 & 1 & -1 & -2\\
0 & 0 & 0.5 & 2.5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{0.5 & 0 & 0 \\
-0.5 & -0.5 & 0 \\
0.75 & 1.25 & 1}
$$
- Next, set the coefficient in the third column, third row
to be \({\bf 1}\) with \({\bf r}_3 \leftarrow 2{\bf r}_3\):
$$
\left[
\begin{array}{ccc|c}
1 & -1.5 & 1 & 2.5\\
0 & 1 & -1 & -2\\
0 & 0 & {\bf 1} & 5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{0.5 & 0 & 0 \\
-0.5 & -0.5 & 0 \\
1.5 & 2.5 & 2}
$$
- There are no further rows. Observe that both boldfaced 1's
have non-zeroes above them:
$$
\left[
\begin{array}{ccc|c}
1 & -1.5 & 1 & 2.5\\
0 & {\bf 1} & -1 & -2\\
0 & 0 & {\bf 1} & 5
\end{array}
\right]
$$
We will use row operations to make these values zero as well.
- Start with the 1 in row 2, column 2 and
apply \({\bf r}_1 \leftarrow {\bf r}_1 + 1.5{\bf r}_2\):
$$
\left[
\begin{array}{ccc|c}
1 & 0 & -0.5 & -0.5\\
0 & {\bf 1} & -1 & -2\\
0 & 0 & 1 & 5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{-0.25 & -0.75 & 0 \\
-0.5 & -0.5 & 0 \\
1.5 & 2.5 & 2}
$$
- Now examine the entries above the 1 in row 3, column 3:
$$
\left[
\begin{array}{ccc|c}
1 & 0 & -0.5 & -0.5\\
0 & 1 & -1 & -2\\
0 & 0 & {\bf 1} & 5
\end{array}
\right]
$$
- Apply \({\bf r}_1 \leftarrow {\bf r}_1 + 0.5{\bf r}_3\):
$$
\left[
\begin{array}{ccc|c}
1 & 0 & 0 & 2\\
0 & 1 & -1 & -2\\
0 & 0 & {\bf 1} & 5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{-0.5 & 0.5 & 1 \\
-0.5 & -0.5 & 0 \\
1.5 & 2.5 & 2}
$$
- Next, zero out the entry in row 2, column 3 by
applying
\({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_3\):
$$
\left[
\begin{array}{ccc|c}
1 & 0 & 0 & 2\\
0 & 1 & 0 & 3\\
0 & 0 & {\bf 1} & 5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{-0.5 & 0.5 & 1 \\
1 & 2 & 2 \\
1.5 & 2.5 & 2}
$$
- This completes the process.
- We are left with the identity matrix plus an augmented column.
$$
\left[
\begin{array}{ccc|c}
1 & 0 & 0 & 2\\
0 & 1 & 0 & 3\\
0 & 0 & 1 & 5
\end{array}
\right]
$$
This has the solution, as we can see from breaking apart the augmentation,
and rewriting in the form \({\bf Ax} = {\bf b}\):
$$
\mat{
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1}
\;
\vecthree{x_1}{x_2}{x_3}
\eql
\vecthree{2}{3}{5}
$$
Multiplying, we get the solution:
$$
\vecthree{x_1}{x_2}{x_3}
\eql
\vecthree{2}{3}{5}
$$
- The other matrix to which we applied row operations is
more interesting:
$$
\mat{-0.5 & 0.5 & 1 \\
1 & 2 & 2 \\
1.5 & 2.5 & 2}
$$
This is the inverse \({\bf A}^{-1}\).
- Confirm by computing
$$
{\bf A}^{-1} {\bf A}
\eql
\mat{-0.5 & 0.5 & 1 \\
1 & 2 & 2 \\
1.5 & 2.5 & 2}
\mat{2 & -3 & 2 \\
-2 & 1 & 0 \\
1 & 1 & -1}
\eql
\mat{1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1}
$$
That is, \({\bf A}^{-1} {\bf A} = {\bf I}\), as we would expect.
- We can also confirm getting the solution by applying the
inverse to the right-hand-side of the equations:
$$
{\bf A}^{-1} {\bf b}
\eql
\mat{-0.5 & 0.5 & 1 \\
1 & 2 & 2 \\
1.5 & 2.5 & 2}
\vecthree{5}{-1}{0}
\eql
\vecthree{2}{3}{5}
$$
- What's really useful, of course, is that once we've
computed the inverse, we can apply it to a different
right-hand-side:
- So, if the right-hand-side of the equations changed to:
$$\eqb{
2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql {\bf 0}\\
-2x_1 & \plss & x_2 & & & \eql {\bf 2}\\
x_1 & \plss & x_2 & \miss & x_3 & \eql {\bf 1}\\
}$$
- We would then simply compute
$$
{\bf A}^{-1} \vecthree{0}{2}{1}
\eql
\mat{-0.5 & 0.5 & 1 \\
1 & 2 & 2 \\
1.5 & 2.5 & 2}
\vecthree{0}{2}{1}
\eql
\vecthree{2}{6}{7}
$$
to solve these new equations.
To summarize:
- Got equations?
$$\eqb{
2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\
-2x_1 & \plss & x_2 & & & \eql -1\\
x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\
}$$
- Write 'em in augmented-matrix form alongside \({\bf I}\):
$$
\left[
\begin{array}{ccc|c}
2 & -3 & 2 & 5\\
-2 & 1 & 0 & -1\\
1 & 1 & -1 & 0
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1}
$$
- Apply row operations to both matrices trying to make
the left one the identity matrix (except for the augmented column):
$$
\left[
\begin{array}{ccc|c}
1 & 0 & 0 & 2\\
0 & 1 & 0 & 3\\
0 & 0 & 1 & 5
\end{array}
\right]
\;\;\;\;\;\;\;\;\;\;\;\;
\mat{-0.5 & 0.5 & 1 \\
1 & 2 & 2 \\
1.5 & 2.5 & 2}
$$
- We can read off the solution from the left, and have
the inverse handy (on the right) for future use.
5.3
Pivots, REF, RREF
Let's work through a \(4 \times 4\) example:
- Consider this set of equations:
$$
\mat{1 & 2 & 0 & -5\\
-1 & -1 & 1 & 4\\
3 & 7 & 2 & -14\\
-2 & -2 & 4 & 13}
\vecfour{x_1}{x_2}{x_3}{x_4}
\eql
\vecfour{-5}{6}{-9}{23}
$$
- First, we form the initial augmented matrix:
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & -5\\
-1 & -1 & 1 & 4 & 6\\
3 & 7 & 2 & -14 & -9\\
-2 & -2 & 4 & 13 & 23
\end{array}
\right]
$$
- Now we look at the first element in row 1 and
divide to make it \(1\)
$$
\left[
\begin{array}{cccc|c}
{\bf 1} & 2 & 0 & -5 & -5\\
-1 & -1 & 1 & 4 & 6\\
3 & 7 & 2 & -14 & -9\\
-2 & -2 & 4 & 13 & 23
\end{array}
\right]
$$
Clearly, it's \(1\) and so, really, we're dividing the
whole row by \(1\).
- Once it's been made \(1\) the element will be called
a pivot:
Definition: a pivot is a coefficient that's
been reduced to \(1\) through division or multiplication
before other row ops are performed in that column.
- Now use the pivot to zero out the elements in its column
that are below the pivot:
- \({\bf r}_2 \leftarrow {\bf r}_2 + {\bf r}_1\) for
row 2.
- \({\bf r}_3 \leftarrow {\bf r}_3 - 3{\bf r}_1\) for
row 3.
- \({\bf r}_4 \leftarrow {\bf r}_4 + 2{\bf r}_1\) for
row 4.
This produces:
$$
\left[
\begin{array}{cccc|c}
{\bf 1} & 2 & 0 & -5 & -5\\
0 & 1 & 1 & -1 & 1\\
0 & 1 & 2 & 1 & 6\\
0 & 2 & 4 & 3 & 13
\end{array}
\right]
$$
- The next pivot is in row 2, column 2:
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & 2\\
0 & {\bf 1} & 1 & -1 & 1\\
0 & 1 & 2 & 1 & 6\\
0 & 2 & 4 & 3 & 13
\end{array}
\right]
$$
This too doesn't need to be divided.
- We now need to create zeroes below the pivot in the same column.
- \({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_2\) for row 3.
- \({\bf r}_4 \leftarrow {\bf r}_4 - 2{\bf r}_2\) for row 4.
Which produces
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & -5\\
0 & {\bf 1} & 1 & -1 & 1\\
0 & 0 & 1 & 2 & 5\\
0 & 0 & 2 & 5 & 11
\end{array}
\right]
$$
- The next pivot is in row 3, column 3:
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & 2\\
0 & 1 & 1 & -1 & 1\\
0 & 0 & {\bf 1} & 2 & 5\\
0 & 0 & 2 & 5 & 11
\end{array}
\right]
$$
Again, we got lucky and did not have to divide through.
- We need the single row op
\({\bf r}_4 \leftarrow {\bf r}_4 - 2{\bf r}_3\) for row 4,
which produces
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & -5\\
0 & 1 & 1 & -1 & 1\\
0 & 0 & {\bf 1} & 2 & 5\\
0 & 0 & 0 & 1 & 1
\end{array}
\right]
$$
- Finally, the 4th column already has a 1 in the 4th row:
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & -5\\
0 & 1 & 1 & -1 & 1\\
0 & 0 & 1 & 2 & 5\\
0 & 0 & 0 & {\bf 1} & 1
\end{array}
\right]
$$
- In this form, the matrix has the following properties:
- All pivots are 1.
- All elements below a pivot in its column are zero.
- One can work backwards to solve the equations.
This form, an intermediate step before the final "identity" like
form is called the Reduced Echelon Form (REF).
- We'll now continue, and produce zeroes above each pivot.
- The first pivot has nothing above:
$$
\left[
\begin{array}{cccc|c}
{\bf 1} & 2 & 0 & -5 & -5\\
0 & 1 & 1 & -1 & 1\\
0 & 0 & 1 & 2 & 5\\
0 & 0 & 0 & 1 & 1
\end{array}
\right]
$$
- Next, look at the second pivot:
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & -5\\
0 & {\bf 1} & 1 & -1 & 1\\
0 & 0 & 1 & 2 & 5\\
0 & 0 & 0 & 1 & 1
\end{array}
\right]
$$
The row op
\({\bf r}_1 \leftarrow {\bf r}_1 - 2{\bf r}_2\)
produces
$$
\left[
\begin{array}{cccc|c}
1 & 0 & -2 & -3 & -7\\
0 & {\bf 1} & 1 & -1 & 1\\
0 & 0 & 1 & 2 & 5\\
0 & 0 & 0 & 1 & 1
\end{array}
\right]
$$
- The next pivot is in row 3, column 3:
$$
\left[
\begin{array}{cccc|c}
1 & 0 & -2 & -3 & -7\\
0 & 1 & 1 & -1 & 1\\
0 & 0 & {\bf 1} & 2 & 5\\
0 & 0 & 0 & 1 & 1
\end{array}
\right]
$$
The row ops
\({\bf r}_1 \leftarrow {\bf r}_1 + 2{\bf r}_3\)
and
\({\bf r}_2 \leftarrow {\bf r}_2 - {\bf r}_3\)
produce
$$
\left[
\begin{array}{cccc|c}
1 & 0 & 0 & 1 & 3\\
0 & 1 & 0 & -3 & -4\\
0 & 0 & {\bf 1} & 2 & 5\\
0 & 0 & 0 & 1 & 1
\end{array}
\right]
$$
- Finally, applying
\({\bf r}_1 \leftarrow {\bf r}_1 - {\bf r}_4\),
\({\bf r}_2 \leftarrow {\bf r}_2 + 3{\bf r}_4\)
and
\({\bf r}_3 \leftarrow {\bf r}_3 - 2{\bf r}_4\)
produces
$$
\left[
\begin{array}{cccc|c}
1 & 0 & 0 & 0 & 2\\
0 & 1 & 0 & 0 & -1\\
0 & 0 & 1 & 0 & 3\\
0 & 0 & 0 & {\bf 1} & 1
\end{array}
\right]
$$
- The augmented column now contains the solution
\((x_1,x_2,x_3,x_4) = (2,-1,3,1)\).
- This final form of the augmented matrix is called
the Row Reduced Echelon Form (RREF) of the augmented matrix.
- If we remove the augmented (last) column of the augmented
RREF, we will have the RREF of original (not augmented) matrix.
- For emphasis, let's identify all the pivots:
$$
\left[
\begin{array}{cccc|c}
{\bf 1} & 0 & 0 & 0 & 2\\
0 & {\bf 1} & 0 & 0 & -1\\
0 & 0 & {\bf 1} & 0 & 3\\
0 & 0 & 0 & {\bf 1} & 1
\end{array}
\right]
$$
Ignoring the augmented column,
every row has exactly one pivot, as does every column.
- In this particular \(4\times 4\) RREF, there
are exactly \(4\) pivots.
In-Class Exercise 18:
Apply the same operations in the same order to the
\(4\times 4\) identity matrix to compute the
inverse of the coefficient matrix. Then
enter the inverse matrix in
FourVariableExample.java
and check that \({\bf A}^{-1} {\bf A} = {\bf I}\).
In-Class Exercise 19:
Apply the method above, step-by-step, to solve
$$
\mat{2 & 1 & 1 & 3\\
1 & -1 & 1 & 0\\
0 & 0 & 2 & 1\\
1 & 0 & 1 & 0}
\vecfour{x_1}{x_2}{x_3}{x_4}
\eql
\vecfour{6}{-2}{-2}{1}
$$
Enter your solution in
FourVariableExample2.java
and check that \({\bf A x} = {\bf b}\).
How many pivots were identified?
Let's summarize the approach through pseudocode:
1. A+ = computeAugmentedMatrix (A, b)
2. A-1 = identity matrix of size n
3. for p=1 to n
// First, make the coefficient of the pivot equal to 1
4. α = A+[p][p]
5. for j=1 to n
6. A+[p][j] = A+[p][j] / α
7. A-1[p][j] = A-1[p][j] / α
8. endfor
// Zero out the coefficients below the pivot
9. for r=p+1 to n
10. apply the appropriate row-operation to row r of A+
11. do the same for A-1
12. endfor
13. endfor
// This produces the REF in A+, but nothing useful (yet) in A-1
// Continue to the RREF ...
14. for p=1 to n
// Zero out the coefficients above the pivot
15. for r=p-1 to 1
16. apply the appropriate row-operation to row r of A+
17. do the same for A-1
18. endfor
19. endfor
20. x = last column of A+
21. return x, A-1
A question arises:
- Why compute the inverse at all, when we are already computing
the solution \({\bf x}\)?
- In many applications, it turns out, the coefficients remain
the same while the right-hand-side changes.
- Then, we compute the inverse just once, and apply it to
each different right-hand-side: \({\bf x} = {\bf A}^{-1} {\bf b}\).
Thus, it appears we have an algorithm to solve \({\bf A x} = {\bf b}\).
Unfortunately, we need to refine the algorithm to handle a few
cases:
- When a pivot happens to be zero.
- When no solution exists.
- When multiple solutions solve the problem.
5.4
Handling special cases: no-solution, multiple-solutions
Row swaps:
- Suppose, after row operations for the first column, we
we end up with
$$
\left[
\begin{array}{ccc|c}
{\bf 1} & 2 & -1 & 3\\
0 & 0 & 2 & 10\\
0 & 3 & -1 & 4
\end{array}
\right]
$$
Here, the pivot is 1, and the column below has zeroes, as desired.
- The next pivot should be in row 2, column 2:
$$
\left[
\begin{array}{ccc|c}
1 & 2 & -1 & 3\\
0 & {\bf 0} & 2 & 10\\
0 & 3 & -1 & 4
\end{array}
\right]
$$
But no multiplier can remove the 3 that's below this pivot.
- In this situation, we look further below in the column and
swap the nearest row below that has a non-zero element in
the same column:
$$
\left[
\begin{array}{ccc|c}
1 & 2 & -1 & 3\\
0 & {\bf 3} & -1 & 4\\
0 & 0 & 2 & 10
\end{array}
\right]
$$
- As we've seen, swapping rows has no effect on the solution.
- Thus, we're going to include the possibility of
looking further below in the next column to find the next pivot.
- What happens if all the elements below (in the next
column) are zero?
\(\rhd\)
That column will have no pivot.
The case when a column has no pivot:
- Consider this example:
$$
\left[
\begin{array}{ccc|c}
1 & 2 & -1 & 3\\
0 & {\bf 0} & 2 & 10\\
0 & {\bf 0} & -1 & 4
\end{array}
\right]
$$
- The first column's pivot has been found.
- The second column has no pivot because, below
row 1, all the elements are zero.
- In this case, there will be no pivot in the 2nd column.
We move onto the next column:
$$
\left[
\begin{array}{ccc|c}
1 & 2 & -1 & 3\\
0 & 0 & {\bf 2} & 10\\
0 & 0 & -1 & 4
\end{array}
\right]
$$
- We still continue and complete the RREF, allowing that
some columns will skipped and won't have pivots.
What happens when there is no solution:
In-Class Exercise 21:
Apply row operations to
$$
\left[
\begin{array}{ccc|c}
2 & -3 & 2 & 5\\
-3 & 7 & -5 & -9\\
1 & 1 & -1 & 0
\end{array}
\right]
$$
Identify the pivots.
Multiple solutions:
- Suppose we end up with
$$
\left[
\begin{array}{ccc|c}
1 & 2 & -1 & 3\\
0 & {\bf 1} & 2 & 13\\
0 & 2 & 4 & 26
\end{array}
\right]
$$
- Now applying
\({\bf r}_3 \leftarrow {\bf r}_3 - 2{\bf r}_2\)
we get
$$
\left[
\begin{array}{ccc|c}
1 & 2 & -1 & 3\\
0 & {\bf 1} & 2 & 13\\
0 & 0 & 0 & 0
\end{array}
\right]
$$
We've now run out of rows.
- Nonetheless, continuing, let's zero out above the pivot
by applying
\({\bf r}_1 \leftarrow {\bf r}_1 - 2{\bf r}_2\)
to get
$$
\left[
\begin{array}{ccc|c}
1 & 0 & -5 & -23\\
0 & {\bf 1} & 2 & 13\\
0 & 0 & 0 & 0
\end{array}
\right]
$$
- To get some insight, let's write out the equations:
$$\eqb{
{\bf x_1} & & & \miss & 5x_3 & \eql -23\\
& \plss & {\bf x_2} & \plss & 2x_3 & \eql 13\\
}$$
Here, we've emphasized the variables corresponding to the
two pivots.
- Observe that setting \(x_3=0\) gives a solution:
\((x_1,x_2,x_3) = (-23,13,0)\).
- So does \(x_3=1\), which gives a different solution:
\((x_1,x_2,x_3) = (-18,11,1)\).
- Any value of \(x_3\) results in a solution.
- Terminology:
- Variables like \(x_3\) that were not pivots are called
free variables.
- Variables corresponding to pivots are called basic variables.
- In a system with multiple solutions, all each setting of the
free variables will give a different solution for the basic variables.
In-Class Exercise 22:
Apply row operations to
$$
\left[
\begin{array}{ccc|c}
2 & -3 & 2 & 5\\
-3 & 7 & -5 & -10\\
1 & 1 & -1 & 0
\end{array}
\right]
$$
Identify the pivots and one possible solution.
A couple more special cases:
The number of pivots of a matrix seems like a fundamental
proprerty of the matrix, and it is:
- Definition: the number of pivots in a matrix is called
the rank of the matrix.
For now, we'll just state the definition. The next module will show
how critical this is.
Summary of the RREF procedure:
- If a pivot is zero, try to swap a row below with a non-zero
element in the same columns.
- If the next column does not have any candidate pivot, look
at columns further to the right.
\(\rhd\)
This means the skipped column will not have a pivot
- If any row has all zeroes except for the last column, we have
a contradiction
\(\rhd\)
There is no solution.
- If there is no contradiction, but a row has all zeroes
there might be multiple solutions.
- The non-pivot variables, which are free variables, typically
set to zero.
- The pivot variables are the basic variables, whose solution
depends on setting the free variables.
- The columns with pivots will turn out to be special later
\(\rhd\)
They are the linearly independent columns.
In-Class Exercise 24:
Why is true that a non-pivot row has all zeroes
in that row except possibly the last (augmented) one?
5.5
An algorithm
Let's now refine our RREF-based algorithm to handle the special cases:
Algorithm: computeRREF (A, b)
Input:: matrix A, vector b
1. A+ = computeAugmentedMatrix (A, b)
2. A-1 = identity matrix of size n
3. currentRow = 1, currentCol = 1
// Now search for the next column that has a non-zero element
// below the currentRow
4. (r,c) = findNextPivot (currentRow, currentCol)
5. while r > 0 and c > 0
// Swap into currentRow if needed
6. if r > currentRow
7. swap (r, currentRow)
8. endif
9. currentCol = c
// The next pivot is now at (currentRow, currentCol)
10. isPivotColumn[currentCol] = true
11. numPivots = numPivots + 1
// Make the coefficient of the pivot equal to 1
12. α = A+[currentRow][currentCol]
13. for j=1 to n
14. A+[currentRow][j] = A+[currentRow][j] / α
15. A-1[currentRow][j] = A-1[currentRow][j] / α
16. endfor
// Zero out the coefficients below the pivot
17. for R=currentRow+1 to n
18. apply the appropriate row-operation to row R of A+
19. do the same for A-1
20. endfor
// All pivots will be in successive rows from the top, but
// need not occur in successive columns.
21. currentRow = currentRow + 1
22. currentCol = currentCol + 1
23. (r,c) = findNextPivot (currentRow, currentCol)
24. endwhile
// This produces the REF in A+, but nothing useful (yet) in A-1
// Continue to the RREF ...
25. for k=1 to numPivots
// Zero out the coefficients above the pivot
26. (rp,cp) = findPivot (k)
27. for r=rp-1 to 1
28. apply the appropriate row-operation to row r of A+
29. do the same for A-1
30. endfor
31. endfor
// Now analyze RREF
32. if existsContradictionRow ()
33. return no solution
34. else if existsAllZeroRows ()
35. // ... handle multiple solutions ...
36. else
37. x = last column of A+
38. return x, A-1
40. endif
Some terminology and further definitions:
- The term Gaussian elimination is used to describe
the combination of:
- Applying row ops to find the REF (steps 1-24 above).
- Then, using backwards substitution to actually
produce a solution.
- The term Gauss-Jordan elimination is used to
describe the full sequence of steps leading to the RREF.
- Now for an important observation about computing the RREF:
- None of the row operation decisions depended on
the last, augmented column.
- Thus, we would apply the same row ops no matter
what the value of \({\bf b}\).
- What this means: we could compute the RREF of the matrix
\({\bf A}\) without augmentation.
- Of course, we would not have a solution without applying
the same operations to the augmented column.
- For some of the theoretical arguments, it will be useful
to consider the RREF of the matrix without augmentation.
- Notation:
- \({\bf A}^\prime = \mbox{RREF}({\bf A})\), the RREF of the
non-augmented \({\bf A}\).
- \({\bf A}_{+}^\prime = \mbox{RREF}({\bf A}_{+})\), the RREF of the
augmented matrix \({\bf A}_{+}\).
- Recall, we used the notation \({\bf A}_{+}\) for the
augmented matrix.
- All of the theoretical arguments below will refer
to the non-augmented RREF.
\(\rhd\)
We will explicitly use the term augmented when needed.
- Definition: A matrix, following Gauss-Jordan
elimination without augmentation,
is in Row Reduced Echelon Form (RREF) if:
- No pivot shares a row or column with another pivot.
- All entries below and above a pivot are zero.
- Except for the first one, a pivot's column is to the
right of the previous row's pivot's column.
- All the pivot rows are bunched together at the top.
- The remaining non-pivot rows are all zero.
- An example:
$$
\left[
\begin{array}{cccccc}
{\bf 1} & * & 0 & * & * & 0\\
0 & 0 & {\bf 1} & * & * & 0\\
0 & 0 & 0 & 0 & 0 & {\bf 1}\\
0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0\\
\end{array}
\right]
$$
Note:
- Three pivots above are in columns 1, 3, and 6.
- All other elements in pivot columns are zero.
- Because we search for pivots in the next row, skipping
past zero elements, all elements to the left of a pivot
are zero (otherwise, we'd have picked one of them as pivot).
- Definition: An \(n\times n\)
RREF matrix has full-rank if every row has a pivot.
\(\rhd\)
A full-rank RREF is the identity matrix, i.e.,
\({\bf A}^\prime = {\bf I}\).
- Definition:
We will use the term partial-rank for a square RREF
matrix that is not full-rank.
\(\rhd\)
A partial-rank RREF has at least one all-zero row.
- Finally, some linear algebra jargon:
Definition: a square
matrix \({\bf A}\) that has an inverse
\({\bf A}^{-1}\) where \({\bf A}^{-1} {\bf A} = {\bf I} = {\bf A}{\bf A}^{-1}\)
is called a nonsingular matrix.
- Thus, a singular matrix has no inverse.
- Sometimes we say a matrix is invertible
if it has an inverse.
5.6
Why it works: some theory for \(n \times n\) matrices
How do we know (prove) that the above approach will work
for every \({\bf Ax}={\bf b}\) problem?
What we'd like to prove, at the very least:
- The RREF algorithm produces a matrix that satisfies
the RREF definition.
- When the RREF completes with a pivot on every row,
the same row operations applied to \({\bf I}\)
produce the inverse.
- When the RREF fails to be full-rank
(that is, there's least one all-zero row), the inverse does not exist.
- Can \({\bf A}\) have more than one inverse, and if so,
which one is computed by the procedure?
It turns out, there is a nice relationship between the
existence of an inverse and uniqueness of solution:
- \({\bf x}={\bf A}^{-1}{\bf b}\) is the unique
solution when the inverse exists.
- If a unique solution exists, so does the inverse.
- The inverse is itself unique (so, we say "the" inverse
instead of "an" inverse).
Let's do the easy ones first:
- Proposition 5.1:
The RREF algorithm produces a matrix that satisfies
the RREF definition.
Proof:
There are five aspects to the definition, as we've seen:
- No pivot shares a row or column with another pivot.
- All entries below and above a pivot are zero.
- Except for the first one, a pivot's column is to the
right of the previous row's pivot's column.
- All the pivot rows are bunched together at the top.
- The remaining non-pivot rows are all zero.
Clearly, just by the way the algorithm works,
none of the pivots share a row or column, and every successive
pivot occurs to the right of the previous one. By row
swapping, all the pivot rows are bunched at the top.
All that remains is to prove that the non-pivot rows have
only zeroes. Suppose a non-pivot row has a non-pivot element
\(c \neq 0\) in column \(k\), as in this example:
$$
\left[
\begin{array}{cccccc}
{\bf 1} & * & 0 & * & * & 0\\
0 & 0 & {\bf 1} & * & * & 0\\
0 & 0 & 0 & 0 & 0 & {\bf 1}\\
0 & 0 & 0 & 0 & d & 0\\
0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & c & 0\\
\end{array}
\right]
$$
When it was column \(k\)'s
turn to look for a pivot, element \(c\) was ignored
as a pivot. This means there was a pivot \(d\) above it.
But that means, the pivot \(d\) would have
eliminated \(c\) through row reduction. Therefore we have
a contradiction (to the purported existence of \(c\)).
\(\Box\)
- Proposition 5.2:
When the RREF completes with a pivot on every row,
the same row operations applied to \({\bf I}\)
produces the inverse.
Proof:
- We already proved this, but let's repeat for clarity.
- If the RREF completes with a full set of pivots, this means
there were row operations and corresponding
row-operation matrices \({\bf R}_1, {\bf R}_2, \ldots,
{\bf R}_k\) such that
$$
{\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1 {\bf A}
\eql
{\bf I}
$$
- Let
$$
{\bf R} \defn {\bf R}_k {\bf R}_{k-1} \cdots {\bf R}_2 {\bf R}_1
$$
- Which means
$$
{\bf R} {\bf A}
\eql
{\bf I}
$$
And so, \({\bf R} \eql {\bf A}^{-1}\). Technically, since at this
juncture we don't yet know that the inverse is unique,
we should say that \({\bf R}\) is an inverse.
- Next multiply each side by the identity to see that
\(
{\bf R} {\bf I}
\eql
{\bf A}^{-1}
\)
\(\Box\)
- Proposition 5.3: \({\bf x}={\bf A}^{-1}{\bf b}\) is the unique
solution when the inverse exists.
Proof:
If there are two solutions \({\bf Ax}={\bf b}\) and
\({\bf Ay}={\bf b}\), then because the inverse exists
(as assumed), we'll multiply both sides by
the inverse:
\({\bf A}^{-1}{\bf Ax}={\bf A}^{-1}{\bf b}\)
and
\({\bf A}^{-1}{\bf Ay}={\bf A}^{-1}{\bf b}\).
Which means
\({\bf I x}={\bf A}^{-1}{\bf b}\)
and
\({\bf I y}={\bf A}^{-1}{\bf b}\).
Or both \({\bf x}\) and \({\bf y}\) are equal (to
\({\bf A}^{-1}{\bf b}\)).
\(\Box\)
- Proposition 5.4: The inverse is unique.
Proof:
Suppose there are two matrices \({\bf B}\) and \({\bf C}\) such
that \({\bf BA} = {\bf I}\) and \({\bf CA} = {\bf I}\).
Then,
$$\eqb{
{\bf B}
& \eql &
{\bf B I}\\
& \eql &
{\bf B} ({\bf AC}) \\
& \eql &
({\bf BA}) {\bf C}\\
& \eql &
{\bf I} {\bf C}\\
& \eql &
{\bf C}\\
}$$
Notice that we replaced \({\bf I}\) with \({\bf AC}\) above,
which assumes that \({\bf C}\) multiplying on the right
is an inverse. If our definition of inverse was only a left
inverse, we'd need an alternative proof.
\(\Box\)
Now for the harder ones.
It will help to consider the special case \({\bf b}={\bf 0}\):
- Recall: \({\bf 0}\) is the zero vector:
\({\bf 0} = (0,0,\ldots,0)\)
- The equation \({\bf A x} = {\bf 0}\) is of special
importance in the theory.
- First, observe that \({\bf A x} = {\bf 0}\) always
has at least one solution, \({\bf x} = {\bf 0}\), for
any matrix \({\bf A}\).
- Proposition 5.5:
Consider the equation \({\bf A}^\prime {\bf x} = {\bf 0}\)
where \({\bf A}^\prime\) is an RREF matrix. Then:
- if \({\bf A}^\prime\) is full-rank, \({\bf x}={\bf 0}\) is the only
solution.
- if \({\bf A}^\prime\) is not full-rank, there is at least
one non-zero solution \({\bf x} \neq {\bf 0}\)
In-Class Exercise 25:
First, create example \(3\times 3\) RREFs (just the final RREFs, without the
starting equations) for each of the two cases. Show how
Proposition 5.5 is true for your example RREFs.
Then prove Proposition 5.5.
We'll now prove the remaining results:
- Proposition 5.6:
When the RREF fails to be full-rank
because of zero rows, the inverse does not exist.
Proof:
Suppose, to the contrary, that the inverse \({\bf A}^{-1}\) exists.
Then consider \({\bf Ax} = {\bf 0}\).
$$\eqb{
\implies & {\bf A}^{-1} {\bf Ax} & \eql & {\bf A}^{-1} {\bf 0}\\
\implies & {\bf I x} & \eql & {\bf 0}\\
\implies & {\bf x} & \eql & {\bf 0}\\
}$$
Thus, if \({\bf x}\neq {\bf 0}\) is a solution, the inverse
does not exist. But proposition 5.5 showed that the only time
\({\bf x}\neq {\bf 0}\) is when the RREF is not full-rank.
Thus, if it's not full-rank, the inverse does not exist.
\(\Box\)
- The first part of Proposition 5.6 can be strengthened
and will shortly be useful.
- The proof of 5.6 showed that if \({\bf A}^{-1}\) exists
then \({\bf Ax}={\bf 0}\) has the unique solution \({\bf x}={\bf 0}\)
- The other direction turns out to be true as well:
if \({\bf Ax}={\bf 0}\) has the unique solution \({\bf
x}={\bf 0}\) then \({\bf A}^{-1}\) exists.
- Proposition 5.7:
The inverse \({\bf A}^{-1}\) exists if and only if \({\bf Ax}={\bf 0}\)
has the unique solution \({\bf x}={\bf 0}\).
Proof:
Proposition 5.6 proved the forward direction. Let's now
take as a given that \({\bf Ax}={\bf 0}\)
has the unique solution \({\bf x}={\bf 0}\).
The RREF
process applied to \({\bf Ax}={\bf 0}\) results in an RREF
\({\bf A}^\prime\) with \({\bf 0}\) in the augmented column
because no row reduction changes an all-zero column.
If \({\bf A}^\prime\) were not full-rank, then by Proposition 5.5,
there would be a non-zero solution, so \({\bf A}^\prime\)
is full-rank. Then, because
\({\bf A}^\prime\) is full-rank, by Proposition 5.2 above,
the inverse exists.
\(\Box\)
- Proposition 5.8:
If \({\bf A}\) does have an inverse, the procedure
always finds it.
Proof:
What we need to worry about is the case where there is an inverse
but somehow the algorithm finds an RREF that's not full-rank.
But Proposition 5.6 above showed that if the RREF is not
full-rank, the inverse does not exist. We also showed that
the algorithm always produces an RREF (Proposition 5.1), which
means that the only RREF produced when an inverse exists is
the full-rank one.
\(\Box\)
- Proposition 5.9
If a unique solution to \({\bf Ax}={\bf b}\) exists, the
inverse \({\bf A}^{-1}\) exists.
Proof:
Suppose the unique solution to \({\bf Ax}={\bf b}\)
is \({\bf x}^\prime\).
From Proposition 5.7, if the inverse does not exist, then we know that
\({\bf Ax}={\bf 0}\)
has a non-zero solution \({\bf y}\neq {\bf 0}\).
Let \({\bf z} = {\bf x}^\prime + {\bf y}\).
Because \({\bf y}\neq {\bf 0}\), it must be that
\({\bf z} \neq {\bf x}^\prime\).
Now,
$$\eqb{
{\bf A z}
& \eql &
{\bf A} ( {\bf x}^\prime + {\bf y} ) \\
& \eql &
{\bf A} {\bf x}^\prime + {\bf A}{\bf y} \\
& \eql &
{\bf b} + {\bf A}{\bf y} \\
& \eql &
{\bf b} + {\bf 0} \\
& \eql &
{\bf b}
}$$
Which means that \({\bf z}\) is a different (from
the supposed unique) solution
to \({\bf Ax}={\bf b}\), a contradiction.
The contradiction followed from assuming the inverse did not
exist, which means that it does.
\(\Box\)
There's a neat implication of the above results:
- All of the following are equivalent (each one implies any of
the others):
- \({\bf A}^{-1}\) exists.
- \({\bf Ax}={\bf b}\) has a unique solution.
- \({\bf Ax}={\bf 0}\) has the unique solution \({\bf x}={\bf 0}\).
- Each row and column of the RREF has exactly one pivot, with
all other entries zero
\(\rhd\)
That is, \(RREF({\bf A}) = {\bf I}\).
Finally, some meta-level observations about the proofs above:
- One wonders why the proofs seemed so roundabout, and why
solutions to \({\bf Ax}={\bf 0}\) were important.
- There are a couple of key ideas.
- First, the uniqueness of solution to the RREF equation
\({\bf A^\prime x}={\bf 0}\) (in 5.5) is reasoned directly from the RREF.
- Because row ops don't change the solution, this means
the same result applies to \({\bf A x}={\bf 0}\).
- Also, the uniqueness tells us whether or not the inverse
exists since uniqueness implies the RREF=\({\bf I}\).
- Lastly, uniqueness of the \({\bf A x}={\bf 0}\) solution
translates into uniqueness of \({\bf A x}={\bf b}\), via
the algebraic argument in Proposition 5.9.
5.7
Non-square matrices
Intuition suggests the following:
- If there are too few equations (more columns than rows),
we are likely to have multiple solutions.
- If there are too many equations, it's unlikely we'll find
a solution (because there are too many equations to satisfy).
This is generally the case but there are exceptions:
- We saw earlier that
$$
\mat{2 & 0\\
0 & 3\\
1 & 1\\
3 & 0}
\vectwo{x_1}{x_2}
\eql
\vecfour{4}{9}{5}{6}
$$
has a solution \((x_1,x_2)=(2,3)\).
- Observe that the 3rd and 4th rows can be derived from
linear combinations of the first two.
- This suggests redundancy in the equations but
consistency (since a solution exists).
What can we say about non-square matrices and our solution approach?
- The RREF algorithm will still run to completion.
- When an augmented matrix has a contradiction, there will
be no solution.
- If the number of pivots is exactly the same as the number
of variables (with the other rows zero), there is a unique solution
\(\rhd\)
See the example above.
- If there are fewer pivots than variables, there are multiple
solutions with basic and free variables.
\(\rhd\)
Basic variables correspond to the pivot columns.
Let's look at an example:
5.8
Alternative pivoting
We don't always have to pick the very next row when seeking
a pivot:
- In the \(4\times 4\) example earlier, after
dealing with the first pivot in column 1, we arrived at:
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & 2\\
0 & {\bf 1} & 1 & -1 & 1\\
0 & 1 & 2 & 1 & 6\\
0 & 2 & 4 & 3 & 13
\end{array}
\right]
$$
- Now, instead of the above pivot in the second row, we could
swap any other row into second place, e.g.,
$$
\left[
\begin{array}{cccc|c}
1 & 2 & 0 & -5 & 2\\
0 & {\bf 2} & 4 & 3 & 13\\
0 & 1 & 2 & 1 & 6\\
0 & 1 & 1 & -1 & 1\\
\end{array}
\right]
$$
- In fact, any choice of a non-zero number in the pivot
column that's below the original choice will do.
- This choice leads to the question: are some pivot
choices better than others?
Consider this well-known example:
- Suppose we have the equations
$$\eqb{
10^{-k} x_1 & \plss & x_2 & \eql 1\\
x_1 & \miss & x_2 & \eql 0
}$$
- Solving by hand gives \(x_1 = x_2 = 1/(1+10^{-k}) \approx 1\),
if \(k\) is large.
- But finite arithmetic can cause a problem, as we'll see.
- The starting point is the augmented matrix
$$
\left[
\begin{array}{cc|c}
10^{-k} & 1 & 1\\
1 & -1 & 0
\end{array}
\right]
$$
- The first row scaled results in
$$
\left[
\begin{array}{cc|c}
{\bf 1} & 10^k & 10^k\\
1 & -1 & 0
\end{array}
\right]
$$
- Zeroing out the 1 below the pivot
$$
\left[
\begin{array}{cc|c}
{\bf 1} & 10^k & 10^k\\
0 & -1-10^k & -10^k
\end{array}
\right]
$$
For large \(k\), this will be represented by
$$
\left[
\begin{array}{cc|c}
{\bf 1} & 10^k & 10^k\\
0 & -10^k & -10^k
\end{array}
\right]
$$
which reduces to
$$
\left[
\begin{array}{cc|c}
{\bf 1} & 10^k & 10^k\\
0 & 1 & 1
\end{array}
\right]
$$
- This gets solved as \((x_1,x_2) = (0,1)\), drastically
different.
- The problem was that a tiny pivot, through scaling,
resulted in a huge effect on other entries.
- Instead, if we'd swapped rows initially, we'd have
$$
\left[
\begin{array}{cc|c}
1 & -1 & 0\\
10^{-k} & 1 & 1\\
\end{array}
\right]
$$
which reduces to
$$
\left[
\begin{array}{cc|c}
1 & -1 & 0\\
0 & 1+10^{-k} & 1\\
\end{array}
\right]
$$
which gets rounded to
$$
\left[
\begin{array}{cc|c}
1 & -1 & 0\\
0 & 1 & 1\\
\end{array}
\right]
$$
- This gets reduced to:
$$
\left[
\begin{array}{cc|c}
1 & 0 & 1\\
0 & 1 & 1\\
\end{array}
\right]
$$
and the solution \((x_1,x_2) = (1,1)\).
Partial pivoting:
- In this approach, when facing a choice of pivots, one finds
the pivot with the largest absolute value and swaps that row.
- This strategy has been found to be effective in practice.
- This pivoting option is known as partial pivoting.
- Changing the pivot strategy will change the REF so obtained,
but interestingly, not the RREF.
5.9
Uniqueness of the RREF
First, note that our particular algorithm above is deterministic,
and so, for any matrix, there's only one output.
However, an alternative RREF algorithm could use a different
pivoting strategy.
Nonetheless, as we'll show, there's one and only one RREF
for a matrix.
First, for illustration, let's violate the RREF rules:
- Consider the (not augmented) matrix
$$
\mat{
1 & 1 & 1 & 0 & 3\\
-1 & 0 & 1 & 1 & -1\\
0 & 1 & 2 & 1 & 2\\
0 & 0 & 0 & 0 & 0
}
$$
- Suppose we violate the rules and start with the
second column:
$$
\mat{
1 & {\bf 1} & 1 & 0 & 3\\
-1 & 0 & 1 & 1 & -1\\
0 & 1 & 2 & 1 & 2\\
0 & 0 & 0 & 0 & 0
}
$$
Then \({\bf r}_3 \leftarrow {\bf r}_3 - {\bf r}_1\) produces
$$
\mat{
1 & {\bf 1} & 1 & 0 & 3\\
-1 & 0 & 1 & 1 & -1\\
0 & 0 & 1 & 1 & -1\\
0 & 0 & 0 & 0 & 0
}
$$
- Next, suppose we use the third column for the next pivot:
$$
\mat{
1 & {\bf 1} & 1 & 0 & 3\\
-1 & 0 & {\bf 1} & 1 & -1\\
0 & 0 & 1 & 1 & -1\\
0 & 0 & 0 & 0 & 0
}
$$
Which results, after two row ops, in
$$
\mat{
2 & {\bf 1} & 0 & -1 & 3\\
-1 & 0 & {\bf 1} & 1 & -1\\
0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0
}
$$
- This is not in RREF form.
- What should be clear is that the left-to-rightness of
the RREF procedure matters.
\(\rhd\)
Pivots are created starting from the left and going only
as far right as needed.
- Thus, it's only the pivoting strategy that can possibly
change an RREF.
Uniqueness:
- If an RREF is full rank, it's already unique because it's
the identity matrix \({\bf I}\).
- So, any non-uniqueness, if it occurs, must occur
in the non-pivot columns.
- The intuition for why the RREF is unique can be seen by
examining two successive pivot columns with a gap between,
for example:
$$
\mat{
\ldots & 0 & * & * & 0 & \ldots \\
\ldots & {\bf 1} & * & * & 0 & \ldots \\
\ldots & 0 & * & * & {\bf 1} & \ldots \\
\ldots & 0 & 0 & 0 & 0 & \ldots \\
\ldots & 0 & 0 & 0 & 0 & \ldots
}
$$
- Note that no matter what pivoting strategy was used, it
does not affect the zeroes in the columns in between the
two pivot columns above.
- Thus, the location of the second pivot is the
same, even if a different pivoting strategy was used
within that column (via row swaps).
- This means that the pivots occur in the same columns,
irrespective of pivoting strategy.
- Then, all we need to worry about are the numbers in other
columns. Are they the same if different pivoting is used?
- These numbers are in non-pivot columns.
- They are completely determined by the pivots to their left.
- Once a pivot is determined and scaled to 1, the row ops
used to eliminate numbers above the pivot are the same.
- This means numbers in non-pivot columns will be the same, no
matter what pivoting strategy is used.
- A more formal but less intuitive proof can be found
in Hefferon's book (pages 58-59).
5.10
Summary, some questions, and some tips
A high-level summary:
- It is possible to systematically solve a large system of
simultaneous linear equations.
- There are three basic row operations: scale, swap, and
add a multiple of another row.
- Any of the three fundamental row operations can be expressed
as a matrix that transforms the target matrix (the one row ops are
applied to).
- The series of row ops, when applied to the identity
matrix \({\bf I}\) produce the inverse \({\bf A}^{-1}\)
(provided a unique solution exists).
- The augmented matrix is a convenient data structure to be
able to apply row operations to both sides of the equations.
- For the simplest \(n\times n\) case with a unique solution,
the approach is straightforward
\(\rhd\)
Each column and each row will have a pivot.
- There are special cases to be handled by expanding the
basic algorithm.
- For a square matrix \({\bf A}\), the following are all equivalent:
- \({\bf A}\) has an inverse.
- \(RREF({\bf A}) = {\bf I}\)
- \({\bf Ax}={\bf 0}\) has \({\bf x}={\bf 0}\) has the only solution.
- \({\bf Ax}={\bf b}\) has a unique solution.
- One important implication: if
\({\bf Ax}={\bf 0}\) has non-zero solution \({\bf x}\neq {\bf
0}\), the inverse does not exist.
- For non-square matrices, all bets are off, but we can say
this:
- If there are fewer rows than columns (too few equations),
there will very likely be multiple solutions.
- If there are more rows than columns (too many equations),
there will very likely be no solution.
- In practice, there are ways of selecting pivots for
improving computational solution.
Some unanswered questions:
- In an RREF with fewer pivots than columns, what is special
about the rows and columns that have pivots?
- Even if the above approach works, is it the best way
to solve \({\bf Ax}={\bf b}\)?
- What options do we have for approximating a solution
when no solution is possible?
Tips:
- Working out examples by hand is tediously boring, but
needed to "see what's going on".
- It is especially important to reflect on the impact
of each particular example as you proceed.
- Take an earlier example, and then work through the algorithm
step by step.