In-Class Exercise 18:
Apply the same operations in the same order to the
4×4 identity matrix to compute the
inverse of the coefficient matrix. Then
enter the inverse matrix in
FourVariableExample.java
and check that A−1A=I.
In-Class Exercise 19:
Apply the method above, step-by-step, to solve
[21131−11000211010][x1x2x3x4]=[6−2−21]
Enter your solution in
FourVariableExample2.java
and check that Ax=b.
How many pivots were identified?
Let's summarize the approach through pseudocode:
1. A+ = computeAugmentedMatrix (A, b)
2. A-1 = identity matrix of size n
3. for p=1 to n
// First, make the coefficient of the pivot equal to 1
4. α = A+[p][p]
5. for j=1 to n
6. A+[p][j] = A+[p][j] / α
7. A-1[p][j] = A-1[p][j] / α
8. endfor
// Zero out the coefficients below the pivot
9. for r=p+1 to n
10. apply the appropriate row-operation to row r of A+
11. do the same for A-1
12. endfor
13. endfor
// This produces the REF in A+, but nothing useful (yet) in A-1
// Continue to the RREF ...
14. for p=1 to n
// Zero out the coefficients above the pivot
15. for r=p-1 to 1
16. apply the appropriate row-operation to row r of A+
17. do the same for A-1
18. endfor
19. endfor
20. x = last column of A+
21. return x, A-1
A question arises:
- Why compute the inverse at all, when we are already computing
the solution x?
- In many applications, it turns out, the coefficients remain
the same while the right-hand-side changes.
- Then, we compute the inverse just once, and apply it to
each different right-hand-side: x=A−1b.
Thus, it appears we have an algorithm to solve Ax=b.
Unfortunately, we need to refine the algorithm to handle a few
cases:
- When a pivot happens to be zero.
- When no solution exists.
- When multiple solutions solve the problem.
5.4
Handling special cases: no-solution, multiple-solutions
Row swaps:
- Suppose, after row operations for the first column, we
we end up with
[12−130021003−14]
Here, the pivot is 1, and the column below has zeroes, as desired.
- The next pivot should be in row 2, column 2:
[12−130021003−14]
But no multiplier can remove the 3 that's below this pivot.
- In this situation, we look further below in the column and
swap the nearest row below that has a non-zero element in
the same column:
[12−1303−1400210]
- As we've seen, swapping rows has no effect on the solution.
- Thus, we're going to include the possibility of
looking further below in the next column to find the next pivot.
- What happens if all the elements below (in the next
column) are zero?
⊳
That column will have no pivot.
The case when a column has no pivot:
- Consider this example:
[12−130021000−14]
- The first column's pivot has been found.
- The second column has no pivot because, below
row 1, all the elements are zero.
- In this case, there will be no pivot in the 2nd column.
We move onto the next column:
[12−130021000−14]
- We still continue and complete the RREF, allowing that
some columns will skipped and won't have pivots.
What happens when there is no solution:
In-Class Exercise 21:
Apply row operations to
[2−325−37−5−911−10]
Identify the pivots.
Multiple solutions:
- Suppose we end up with
[12−130121302426]
- Now applying
r3←r3−2r2
we get
[12−13012130000]
We've now run out of rows.
- Nonetheless, continuing, let's zero out above the pivot
by applying
r1←r1−2r2
to get
[10−5−23012130000]
- To get some insight, let's write out the equations:
x1−5x3=−23+x2+2x3=13
Here, we've emphasized the variables corresponding to the
two pivots.
- Observe that setting x3=0 gives a solution:
(x1,x2,x3)=(−23,13,0).
- So does x3=1, which gives a different solution:
(x1,x2,x3)=(−18,11,1).
- Any value of x3 results in a solution.
- Terminology:
- Variables like x3 that were not pivots are called
free variables.
- Variables corresponding to pivots are called basic variables.
- In a system with multiple solutions, all each setting of the
free variables will give a different solution for the basic variables.
In-Class Exercise 22:
Apply row operations to
[2−325−37−5−1011−10]
Identify the pivots and one possible solution.
A couple more special cases:
The number of pivots of a matrix seems like a fundamental
proprerty of the matrix, and it is:
- Definition: the number of pivots in a matrix is called
the rank of the matrix.
For now, we'll just state the definition. The next module will show
how critical this is.
Summary of the RREF procedure:
- If a pivot is zero, try to swap a row below with a non-zero
element in the same columns.
- If the next column does not have any candidate pivot, look
at columns further to the right.
⊳
This means the skipped column will not have a pivot
- If any row has all zeroes except for the last column, we have
a contradiction
⊳
There is no solution.
- If there is no contradiction, but a row has all zeroes
there might be multiple solutions.
- The non-pivot variables, which are free variables, typically
set to zero.
- The pivot variables are the basic variables, whose solution
depends on setting the free variables.
- The columns with pivots will turn out to be special later
⊳
They are the linearly independent columns.
In-Class Exercise 24:
Why is true that a non-pivot row has all zeroes
in that row except possibly the last (augmented) one?
5.5
An algorithm
Let's now refine our RREF-based algorithm to handle the special cases:
Algorithm: computeRREF (A, b)
Input:: matrix A, vector b
1. A+ = computeAugmentedMatrix (A, b)
2. A-1 = identity matrix of size n
3. currentRow = 1, currentCol = 1
// Now search for the next column that has a non-zero element
// below the currentRow
4. (r,c) = findNextPivot (currentRow, currentCol)
5. while r > 0 and c > 0
// Swap into currentRow if needed
6. if r > currentRow
7. swap (r, currentRow)
8. endif
9. currentCol = c
// The next pivot is now at (currentRow, currentCol)
10. isPivotColumn[currentCol] = true
11. numPivots = numPivots + 1
// Make the coefficient of the pivot equal to 1
12. α = A+[currentRow][currentCol]
13. for j=1 to n
14. A+[currentRow][j] = A+[currentRow][j] / α
15. A-1[currentRow][j] = A-1[currentRow][j] / α
16. endfor
// Zero out the coefficients below the pivot
17. for R=currentRow+1 to n
18. apply the appropriate row-operation to row R of A+
19. do the same for A-1
20. endfor
// All pivots will be in successive rows from the top, but
// need not occur in successive columns.
21. currentRow = currentRow + 1
22. currentCol = currentCol + 1
23. (r,c) = findNextPivot (currentRow, currentCol)
24. endwhile
// This produces the REF in A+, but nothing useful (yet) in A-1
// Continue to the RREF ...
25. for k=1 to numPivots
// Zero out the coefficients above the pivot
26. (rp,cp) = findPivot (k)
27. for r=rp-1 to 1
28. apply the appropriate row-operation to row r of A+
29. do the same for A-1
30. endfor
31. endfor
// Now analyze RREF
32. if existsContradictionRow ()
33. return no solution
34. else if existsAllZeroRows ()
35. // ... handle multiple solutions ...
36. else
37. x = last column of A+
38. return x, A-1
40. endif
Some terminology and further definitions:
- The term Gaussian elimination is used to describe
the combination of:
- Applying row ops to find the REF (steps 1-24 above).
- Then, using backwards substitution to actually
produce a solution.
- The term Gauss-Jordan elimination is used to
describe the full sequence of steps leading to the RREF.
- Now for an important observation about computing the RREF:
- None of the row operation decisions depended on
the last, augmented column.
- Thus, we would apply the same row ops no matter
what the value of b.
- What this means: we could compute the RREF of the matrix
A without augmentation.
- Of course, we would not have a solution without applying
the same operations to the augmented column.
- For some of the theoretical arguments, it will be useful
to consider the RREF of the matrix without augmentation.
- Notation:
- A′=RREF(A), the RREF of the
non-augmented A.
- A′+=RREF(A+), the RREF of the
augmented matrix A+.
- Recall, we used the notation A+ for the
augmented matrix.
- All of the theoretical arguments below will refer
to the non-augmented RREF.
⊳
We will explicitly use the term augmented when needed.
- Definition: A matrix, following Gauss-Jordan
elimination without augmentation,
is in Row Reduced Echelon Form (RREF) if:
- No pivot shares a row or column with another pivot.
- All entries below and above a pivot are zero.
- Except for the first one, a pivot's column is to the
right of the previous row's pivot's column.
- All the pivot rows are bunched together at the top.
- The remaining non-pivot rows are all zero.
- An example:
[1∗0∗∗0001∗∗0000001000000000000000000]
Note:
- Three pivots above are in columns 1, 3, and 6.
- All other elements in pivot columns are zero.
- Because we search for pivots in the next row, skipping
past zero elements, all elements to the left of a pivot
are zero (otherwise, we'd have picked one of them as pivot).
- Definition: An n×n
RREF matrix has full-rank if every row has a pivot.
⊳
A full-rank RREF is the identity matrix, i.e.,
A′=I.
- Definition:
We will use the term partial-rank for a square RREF
matrix that is not full-rank.
⊳
A partial-rank RREF has at least one all-zero row.
- Finally, some linear algebra jargon:
Definition: a square
matrix A that has an inverse
A−1 where A−1A=I=AA−1
is called a nonsingular matrix.
- Thus, a singular matrix has no inverse.
- Sometimes we say a matrix is invertible
if it has an inverse.
5.6
Why it works: some theory for n×n matrices
How do we know (prove) that the above approach will work
for every Ax=b problem?
What we'd like to prove, at the very least:
- The RREF algorithm produces a matrix that satisfies
the RREF definition.
- When the RREF completes with a pivot on every row,
the same row operations applied to I
produce the inverse.
- When the RREF fails to be full-rank
(that is, there's least one all-zero row), the inverse does not exist.
- Can A have more than one inverse, and if so,
which one is computed by the procedure?
It turns out, there is a nice relationship between the
existence of an inverse and uniqueness of solution:
- x=A−1b is the unique
solution when the inverse exists.
- If a unique solution exists, so does the inverse.
- The inverse is itself unique (so, we say "the" inverse
instead of "an" inverse).
Let's do the easy ones first:
- Proposition 5.1:
The RREF algorithm produces a matrix that satisfies
the RREF definition.
Proof:
There are five aspects to the definition, as we've seen:
- No pivot shares a row or column with another pivot.
- All entries below and above a pivot are zero.
- Except for the first one, a pivot's column is to the
right of the previous row's pivot's column.
- All the pivot rows are bunched together at the top.
- The remaining non-pivot rows are all zero.
Clearly, just by the way the algorithm works,
none of the pivots share a row or column, and every successive
pivot occurs to the right of the previous one. By row
swapping, all the pivot rows are bunched at the top.
All that remains is to prove that the non-pivot rows have
only zeroes. Suppose a non-pivot row has a non-pivot element
c≠0 in column k, as in this example:
[1∗0∗∗0001∗∗00000010000d00000000000c0]
When it was column k's
turn to look for a pivot, element c was ignored
as a pivot. This means there was a pivot d above it.
But that means, the pivot d would have
eliminated c through row reduction. Therefore we have
a contradiction (to the purported existence of c).
◻
- Proposition 5.2:
When the RREF completes with a pivot on every row,
the same row operations applied to I
produces the inverse.
Proof:
- We already proved this, but let's repeat for clarity.
- If the RREF completes with a full set of pivots, this means
there were row operations and corresponding
row-operation matrices R1,R2,…,Rk such that
RkRk−1⋯R2R1A=I
- Let
R△=RkRk−1⋯R2R1
- Which means
RA=I
And so, R=A−1. Technically, since at this
juncture we don't yet know that the inverse is unique,
we should say that R is an inverse.
- Next multiply each side by the identity to see that
RI=A−1
◻
- Proposition 5.3: x=A−1b is the unique
solution when the inverse exists.
Proof:
If there are two solutions Ax=b and
Ay=b, then because the inverse exists
(as assumed), we'll multiply both sides by
the inverse:
A−1Ax=A−1b
and
A−1Ay=A−1b.
Which means
Ix=A−1b
and
Iy=A−1b.
Or both x and y are equal (to
A−1b).
◻
- Proposition 5.4: The inverse is unique.
Proof:
Suppose there are two matrices B and C such
that BA=I and CA=I.
Then,
B=BI=B(AC)=(BA)C=IC=C
Notice that we replaced I with AC above,
which assumes that C multiplying on the right
is an inverse. If our definition of inverse was only a left
inverse, we'd need an alternative proof.
◻
Now for the harder ones.
It will help to consider the special case b=0:
- Recall: 0 is the zero vector:
0=(0,0,…,0)
- The equation Ax=0 is of special
importance in the theory.
- First, observe that Ax=0 always
has at least one solution, x=0, for
any matrix A.
- Proposition 5.5:
Consider the equation A′x=0
where A′ is an RREF matrix. Then:
- if A′ is full-rank, x=0 is the only
solution.
- if A′ is not full-rank, there is at least
one non-zero solution x≠0
In-Class Exercise 25:
First, create example 3×3 RREFs (just the final RREFs, without the
starting equations) for each of the two cases. Show how
Proposition 5.5 is true for your example RREFs.
Then prove Proposition 5.5.
We'll now prove the remaining results:
- Proposition 5.6:
When the RREF fails to be full-rank
because of zero rows, the inverse does not exist.
Proof:
Suppose, to the contrary, that the inverse A−1 exists.
Then consider Ax=0.
⇒A−1Ax=A−10⇒Ix=0⇒x=0
Thus, if x≠0 is a solution, the inverse
does not exist. But proposition 5.5 showed that the only time
x≠0 is when the RREF is not full-rank.
Thus, if it's not full-rank, the inverse does not exist.
◻
- The first part of Proposition 5.6 can be strengthened
and will shortly be useful.
- The proof of 5.6 showed that if A−1 exists
then Ax=0 has the unique solution x=0
- The other direction turns out to be true as well:
if Ax=0 has the unique solution x=0 then A−1 exists.
- Proposition 5.7:
The inverse A−1 exists if and only if Ax=0
has the unique solution x=0.
Proof:
Proposition 5.6 proved the forward direction. Let's now
take as a given that Ax=0
has the unique solution x=0.
The RREF
process applied to Ax=0 results in an RREF
A′ with 0 in the augmented column
because no row reduction changes an all-zero column.
If A′ were not full-rank, then by Proposition 5.5,
there would be a non-zero solution, so A′
is full-rank. Then, because
A′ is full-rank, by Proposition 5.2 above,
the inverse exists.
◻
- Proposition 5.8:
If A does have an inverse, the procedure
always finds it.
Proof:
What we need to worry about is the case where there is an inverse
but somehow the algorithm finds an RREF that's not full-rank.
But Proposition 5.6 above showed that if the RREF is not
full-rank, the inverse does not exist. We also showed that
the algorithm always produces an RREF (Proposition 5.1), which
means that the only RREF produced when an inverse exists is
the full-rank one.
◻
- Proposition 5.9
If a unique solution to Ax=b exists, the
inverse A−1 exists.
Proof:
Suppose the unique solution to Ax=b
is x′.
From Proposition 5.7, if the inverse does not exist, then we know that
Ax=0
has a non-zero solution y≠0.
Let z=x′+y.
Because y≠0, it must be that
z≠x′.
Now,
Az=A(x′+y)=Ax′+Ay=b+Ay=b+0=b
Which means that z is a different (from
the supposed unique) solution
to Ax=b, a contradiction.
The contradiction followed from assuming the inverse did not
exist, which means that it does.
◻
There's a neat implication of the above results:
- All of the following are equivalent (each one implies any of
the others):
- A−1 exists.
- Ax=b has a unique solution.
- Ax=0 has the unique solution x=0.
- Each row and column of the RREF has exactly one pivot, with
all other entries zero
⊳
That is, RREF(A)=I.
Finally, some meta-level observations about the proofs above:
- One wonders why the proofs seemed so roundabout, and why
solutions to Ax=0 were important.
- There are a couple of key ideas.
- First, the uniqueness of solution to the RREF equation
A′x=0 (in 5.5) is reasoned directly from the RREF.
- Because row ops don't change the solution, this means
the same result applies to Ax=0.
- Also, the uniqueness tells us whether or not the inverse
exists since uniqueness implies the RREF=I.
- Lastly, uniqueness of the Ax=0 solution
translates into uniqueness of Ax=b, via
the algebraic argument in Proposition 5.9.
5.7
Non-square matrices
Intuition suggests the following:
- If there are too few equations (more columns than rows),
we are likely to have multiple solutions.
- If there are too many equations, it's unlikely we'll find
a solution (because there are too many equations to satisfy).
This is generally the case but there are exceptions:
- We saw earlier that
[20031130][x1x2]=[4956]
has a solution (x1,x2)=(2,3).
- Observe that the 3rd and 4th rows can be derived from
linear combinations of the first two.
- This suggests redundancy in the equations but
consistency (since a solution exists).
What can we say about non-square matrices and our solution approach?
- The RREF algorithm will still run to completion.
- When an augmented matrix has a contradiction, there will
be no solution.
- If the number of pivots is exactly the same as the number
of variables (with the other rows zero), there is a unique solution
⊳
See the example above.
- If there are fewer pivots than variables, there are multiple
solutions with basic and free variables.
⊳
Basic variables correspond to the pivot columns.
Let's look at an example:
5.8
Alternative pivoting
We don't always have to pick the very next row when seeking
a pivot:
- In the 4×4 example earlier, after
dealing with the first pivot in column 1, we arrived at:
[120−52011−1101216024313]
- Now, instead of the above pivot in the second row, we could
swap any other row into second place, e.g.,
[120−5202431301216011−11]
- In fact, any choice of a non-zero number in the pivot
column that's below the original choice will do.
- This choice leads to the question: are some pivot
choices better than others?
Consider this well-known example:
- Suppose we have the equations
10−kx1+x2=1x1−x2=0
- Solving by hand gives x1=x2=1/(1+10−k)≈1,
if k is large.
- But finite arithmetic can cause a problem, as we'll see.
- The starting point is the augmented matrix
[10−k111−10]
- The first row scaled results in
[110k10k1−10]
- Zeroing out the 1 below the pivot
[110k10k0−1−10k−10k]
For large k, this will be represented by
[110k10k0−10k−10k]
which reduces to
[110k10k011]
- This gets solved as (x1,x2)=(0,1), drastically
different.
- The problem was that a tiny pivot, through scaling,
resulted in a huge effect on other entries.
- Instead, if we'd swapped rows initially, we'd have
[1−1010−k11]
which reduces to
[1−1001+10−k1]
which gets rounded to
[1−10011]
- This gets reduced to:
[101011]
and the solution (x1,x2)=(1,1).
Partial pivoting:
- In this approach, when facing a choice of pivots, one finds
the pivot with the largest absolute value and swaps that row.
- This strategy has been found to be effective in practice.
- This pivoting option is known as partial pivoting.
- Changing the pivot strategy will change the REF so obtained,
but interestingly, not the RREF.
5.9
Uniqueness of the RREF
First, note that our particular algorithm above is deterministic,
and so, for any matrix, there's only one output.
However, an alternative RREF algorithm could use a different
pivoting strategy.
Nonetheless, as we'll show, there's one and only one RREF
for a matrix.
First, for illustration, let's violate the RREF rules:
- Consider the (not augmented) matrix
[11103−1011−10121200000]
- Suppose we violate the rules and start with the
second column:
[11103−1011−10121200000]
Then r3←r3−r1 produces
[11103−1011−10011−100000]
- Next, suppose we use the third column for the next pivot:
[11103−1011−10011−100000]
Which results, after two row ops, in
[210−13−1011−10000000000]
- This is not in RREF form.
- What should be clear is that the left-to-rightness of
the RREF procedure matters.
⊳
Pivots are created starting from the left and going only
as far right as needed.
- Thus, it's only the pivoting strategy that can possibly
change an RREF.
Uniqueness:
- If an RREF is full rank, it's already unique because it's
the identity matrix I.
- So, any non-uniqueness, if it occurs, must occur
in the non-pivot columns.
- The intuition for why the RREF is unique can be seen by
examining two successive pivot columns with a gap between,
for example:
[…0∗∗0……1∗∗0……0∗∗1……0000……0000…]
- Note that no matter what pivoting strategy was used, it
does not affect the zeroes in the columns in between the
two pivot columns above.
- Thus, the location of the second pivot is the
same, even if a different pivoting strategy was used
within that column (via row swaps).
- This means that the pivots occur in the same columns,
irrespective of pivoting strategy.
- Then, all we need to worry about are the numbers in other
columns. Are they the same if different pivoting is used?
- These numbers are in non-pivot columns.
- They are completely determined by the pivots to their left.
- Once a pivot is determined and scaled to 1, the row ops
used to eliminate numbers above the pivot are the same.
- This means numbers in non-pivot columns will be the same, no
matter what pivoting strategy is used.
- A more formal but less intuitive proof can be found
in Hefferon's book (pages 58-59).
5.10
Summary, some questions, and some tips
A high-level summary:
- It is possible to systematically solve a large system of
simultaneous linear equations.
- There are three basic row operations: scale, swap, and
add a multiple of another row.
- Any of the three fundamental row operations can be expressed
as a matrix that transforms the target matrix (the one row ops are
applied to).
- The series of row ops, when applied to the identity
matrix I produce the inverse A−1
(provided a unique solution exists).
- The augmented matrix is a convenient data structure to be
able to apply row operations to both sides of the equations.
- For the simplest n×n case with a unique solution,
the approach is straightforward
⊳
Each column and each row will have a pivot.
- There are special cases to be handled by expanding the
basic algorithm.
- For a square matrix A, the following are all equivalent:
- A has an inverse.
- RREF(A)=I
- Ax=0 has x=0 has the only solution.
- Ax=b has a unique solution.
- One important implication: if
Ax=0 has non-zero solution x≠0, the inverse does not exist.
- For non-square matrices, all bets are off, but we can say
this:
- If there are fewer rows than columns (too few equations),
there will very likely be multiple solutions.
- If there are more rows than columns (too many equations),
there will very likely be no solution.
- In practice, there are ways of selecting pivots for
improving computational solution.
Some unanswered questions:
- In an RREF with fewer pivots than columns, what is special
about the rows and columns that have pivots?
- Even if the above approach works, is it the best way
to solve Ax=b?
- What options do we have for approximating a solution
when no solution is possible?
Tips:
- Working out examples by hand is tediously boring, but
needed to "see what's going on".
- It is especially important to reflect on the impact
of each particular example as you proceed.
- Take an earlier example, and then work through the algorithm
step by step.