\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecfour}[4]{\left[\begin{array}{r} #1 \\ #2\\ #3\\ #4\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \newcommand{\bigsp}{\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\plss}{\;\;+\;\;} \newcommand{\miss}{\;\;-\;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \)

Review: Part II

Part I Part III

r.5 Matrix-matrix multiplication and what it means

Multiplying one matrix by another produces a matrix, but why is that and what good is it?

There are generally four contexts where we see matrix-matrix multiplication:

Context #1. When the matrices represent vector-transformations and they multiply to create one matrix that does the combined transformation.

Context #2. When the associative rule is used to simplify terms.

Context #3. When a matrix's inverse multiplies into it to produce ${\bf I}$

Context #4. When one matrix transforms another matrix (which is useful in understanding row operations).

Before proceeding, let's address multiply-compatibility for matrix-matrix and matrix-vector multiplication:

Two matrices ${\bf A}_{m\times n}$ and ${\bf B}_{n\times k}$ can be multiplied as ${\bf AB}$ to produce ${\bf C}_{m\times k}$. $$ {\bf A}_{m\times n} {\bf B}_{n\times k} \eql {\bf C}_{m\times k} $$
Thus, ${\bf B}$ should have as many rows as ${\bf A}$ has columns.
The same rule works for matrix-vector compatibility if we treat a column vector of dimension $n$ as having $n$ rows: $$ {\bf A}_{m\times n} {\bf x}_{n\times 1} \eql {\bf b}_{m\times 1} $$

Let's start with the first context:

Suppose ${\bf A}$ is a matrix that rotates a vector by $60^\circ$, and ${\bf B}$ is a matrix that reflects a vector about the origin.
Applying ${\bf A}$ to ${\bf x}=(2,3)$ gives $$\eqb{ {\bf Ax} & \eql & \mat{\cos(60) & -\sin(60)\\ \sin(60) & \cos(60)} \vectwo{2}{3} \\ & \eql & \mat{0.5 & -0.866\\ 0.866 & 0.5} \vectwo{2}{3} \\ & \eql & \vectwo{-1.59}{3.23} \\ & \defn & {\bf y} }$$ Here, we've named the result ${\bf y}$. The "equals with the triangle" means "which we're defining as".
Now, let's apply ${\bf B}$ to ${\bf y}$: $$\eqb{ {\bf By} & \eql & \mat{-1 & 0\\ 0 & -1} \vectwo{-1.59}{3.23} & \eql & \vectwo{1.59}{-3.23} & \defn & {\bf z} }$$
We can write both products as: $$ {\bf z} \eql {\bf B} ({\bf Ax}) $$ What this means:
- First apply ${\bf A}$ to ${\bf x}$.
- This produces a vector (we called it ${\bf y}$).
- Then apply ${\bf B}$ to ${\bf y}$.
The obvious question is: is there a matrix ${\bf C}$ that will achieve the same thing directly: $$ {\bf z} \eql {\bf Cx}? $$
If there was one, it would be natural to say that $$ {\bf C} \eql {\bf BA} $$ because we'd write $$\eqb{ {\bf z} & \eql & {\bf B} ({\bf Ax})\\ & \eql & ({\bf B} {\bf A}) {\bf x}\\ & \eql & {\bf C} {\bf x}\\ }$$
In fact, this is how the details of matrix multiplication came about. If matrix-multiplication were done some other way, we would not have the above result.
In the above example, it turns out that $$ {\bf BA} \eql \mat{-1 & 0\\ 0 & -1} \mat{0.5 & -0.866\\ 0.866 & 0.5} \eql \mat{-0.5 & 0.866\\ -0.866 & -0.5} $$ So $$ {\bf Cx} \eql \mat{-0.5 & 0.866\\ -0.866 & -0.5} \vectwo{2}{3} \eql \vectwo{1.59}{-3.23} $$ as expected.

When we work out the algebra, it turns out that there are two useful ways of describing matrix-matrix multiplication:

We will describe ${\bf BA}$.
The standard way is: $$ \mat{ b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{nn} } \mat{ a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} } \eql \mat{ c_{11} & c_{12} & \cdots & c_{1n} \\ c_{21} & c_{22} & \cdots & c_{2n} \\ \vdots & \vdots & & \vdots \\ c_{n1} & c_{n2} & \cdots & c_{nn} } $$ where $$ c_{ij} \eql \sum_{k=1}^n b_{ik} a_{kj} $$ (this is the element in row i, column j of ${\bf C}$.
If we "unpack" the matrix ${\bf A}$ into its columns ${\bf a}_1, {\bf a}_2, \ldots $ then this is one useful way to describe it: $$ {\bf B} \mat{a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ a_{31} & a_{32} & \cdots & a_{3n}\\ \vdots & \vdots & \vdots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} } \eql {\bf B} \mat{ & & & \\ \vdots & \vdots & \vdots & \vdots\\ {\bf a}_1 & {\bf a}_2 & \cdots & {\bf a}_n\\ \vdots & \vdots & \vdots & \vdots\\ & & & } \eql \mat{ & & & \\ \vdots & \vdots & \vdots & \vdots\\ {\bf B a}_1 & {\bf B a}_2 & \cdots & {\bf B a}_n\\ \vdots & \vdots & \vdots & \vdots\\ & & & } $$ Here, the k-th column of the resulting ${\bf C}$ matrix is the vector ${\bf Ba}_k$ that results from the matrix ${\bf B}$ multiplying vector ${\bf a}_k$.

Let's look at cases where the second context arose:

Matrix multiplication turned out (happily) to be associative: $$ {\bf A} ({\bf BC}) \eql ({\bf AB}) {\bf C} $$ This associativity turns out to be extremely useful in establishing later results.
Matrix multiplication, unfortunately, is not commutative. That is, in general ${\bf AB} \neq {\bf BA} $

Next, let's say a few things about matrix inverses:

The inverse of a matrix ${\bf A}$, if it exists, is a matrix ${\bf A}^{-1}$ such that $$ {\bf A}^{-1} {\bf A} \eql {\bf I} $$ where ${\bf I}$ is the identity matrix $$ \mat{1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & 1} $$ Thus, ${\bf I}$ has 1's on the diagonal and 0's everywhere else.
Here are some things to remember about ${\bf A}^{-1}$
- It is defined only for square matrices.
- It's also true that ${\bf A} {\bf A}^{-1} = {\bf I}$
- ${\bf A}^{-1}$ exists only if it's full rank.
- We compute ${\bf A}^{-1}$ this way:
  - Identify the row operations that reduce ${\bf A}$ to its RREF.
  - Apply the same row operations to ${\bf I}$. The resulting matrix is ${\bf A}^{-1}$.
- Of course the RREF procedure can reveal that ${\bf A}$ is not full-rank (fewer pivots than columns), in which case ${\bf A}^{-1}$ doesn't exist.
What do we do when ${\bf A}$ is not square, or if ${\bf A}^{-1}$ does not exist (cannot be computed)?
- One way out is least-squares, when we typically have too many equations (more rows than columns).
- In this case, ${\bf A}^T {\bf A}$ is square and likely to have an inverse.
- Recall the least-squares solution: $$ \hat{\bf x} \eql ({\bf A}^T {\bf A})^{-1} {\bf A}^T {\bf b} $$ Here we need the inverse $({\bf A}^T {\bf A})^{-1}$ to exist.
- When there are many more rows than columns ${\bf A}_{m\times n}$ has a few column vectors with many dimensions. The likelihood that these are independent will be high.
- A somewhat difficult proof showed that if the columns of ${\bf A}$ are independent then $({\bf A}^T {\bf A})^{-1}$ does indeed exist.
- The implication is that least-squares is very likely to work.

Finally, let's look at the fourth context: matrix-multiplication as a "matrix transformer"

When we multiply one matrix by another we get a third: $$ {\bf AB} \eql {\bf C} $$
We could interpret this the same way we provided the alternative interpretation of matrix-vector multiplication: $$ {\bf Ax} \eql {\bf y} $$ That is: matrix ${\bf A}$ transforms vector ${\bf x}$ into vector ${\bf y}$.
Thus, we could interpret $$ {\bf AB} \eql {\bf C} $$ as: ${\bf A}$ transforms matrix ${\bf B}$ into matrix ${\bf C}$.
Such an interpretation does turn out to be useful for theoretical purposes, as we'll see when reasoning about RREFs.
So, now let's review key ideas in constructing an RREF.
Let's start with an example: $$\eqb{ 2x_1 & \miss & 3x_2 & \plss & 2x_3 & \eql 5\\ -2x_1 & \plss & x_2 & & & \eql -1\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$
We perform row operations to change the coefficients of variables.
For example, to change the coefficient of $x_1$ in the first equation to 1, we divide the entire first equation by 2: $$\eqb{ {\bf 1} x_1 & \miss & {\bf 1.5} x_2 & \plss & {\bf 1} x_3 & \eql {\bf 2.5}\\ -2x_1 & \plss & x_2 & & & \eql -1\\ x_1 & \plss & x_2 & \miss & x_3 & \eql 0\\ }$$
Of course, since all the manipulations are about coefficients, it's cleaner if we work just with them, incorporating the right side as well in the augmented matrix.
Our starting point is the initial augmented matrix and the identity matrix side-by-side: $$ \left[ \begin{array}{ccc|c} 2 & -3 & 2 & 5\\ -2 & 1 & 0 & -1\\ 1 & 1 & -1 & 0 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1} $$
So, applying the divide-by-2 to the first row: $$ \left[ \begin{array}{ccc|c} {\bf 1} & {\bf -1.5} & {\bf 1} & {\bf 2.5}\\ -2 & 1 & 0 & -1\\ 1 & 1 & -1 & 0 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{{\bf 0.5} & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1} $$ Here, we're applying the exact same operation to the identity matrix as well (for reasons we'll see).
Since we replaced the first row with the original divided by 2, we write this as: ${\bf r}_1 \leftarrow \frac{{\bf r}_1}{2}$.

There are three types of row operations:

Scale: divide one row by a number (like we did above) $$ {\bf r}_i \leftarrow \frac{{\bf r}_i}{\alpha} $$
Swap: swap two rows $$\eqb{ {\bf\mbox{temp}} & \leftarrow &{\bf r}_i\\ {\bf r}_i & \leftarrow & {\bf r}_j\\ {\bf r}_j & \leftarrow & {\bf\mbox{temp}} }$$
Replace a row by adding a multiple of another row. $$ {\bf r}_i \leftarrow {\bf r}_i + \alpha {\bf r}_j $$

You could devise others but these are enough.

There is also an order in which we change coefficients:

Always start with row 1, column 1, trying to make that element a pivot.
When we're successful making an element into a pivot, we change coefficients below the pivot to zero.
The search for the next pivot moves to the next row, next column.
A search may not be successful, in which case we move to the next column (same row).
After the final pivot we go back to the first one and start the process of creating zeroes above each pivot.

Here's what we get after making zeroes below the first pivot:

Continuing in this manner, we get the RREF (and by its side, if the RREF is full-rank, the inverse): $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ 0 & 1 & 0 & 3\\ 0 & 0 & 1 & 5 \end{array} \right] \;\;\;\;\;\;\;\;\;\;\;\; \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} $$ What we see:

A full-rank RREF (a pivot in every column), which tells us the equations have a unique solution.
Notice: the non-augmented part of the full-rank RREF is the identity matrix of that size.
The solution itself is in the augmented column: $x_1=2, x_2=3, x_3=5$
We know this is the solution because we can convert back to equations by looking at $$ \left[ \begin{array}{ccc|c} 1 & 0 & 0 & 2\\ 0 & 1 & 0 & 3\\ 0 & 0 & 1 & 5 \end{array} \right] $$
We get the inverse matrix on the right. $$ {\bf A}^{-1} \eql \mat{-0.5 & 0.5 & 1 \\ 1 & 2 & 2 \\ 1.5 & 2.5 & 2} $$

So now let's turn to: why does this work?

Clearly, manipulating the augmented matrix is the same thing as manipulating the coefficients of the equations.
So, if we end up with a full-rank RREF we know we just have a different version of the same equations with the same solution as the original.
Seeing that the right-side matrix is the inverse needs some reasoning.

Consider the starting non-augmented matrix for the equations example we've seen: $$ {\bf A} \eql \mat{ 2 & -3 & 2 \\ -2 & 1 & 0 \\ 1 & 1 & -1} $$

Define the matrix $$ {\bf R}_1 \eql \mat{0.5 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1} $$
Left-multiply ${\bf A}$ by ${\bf R}_1$ : $$ {\bf R}_1 {\bf A} \eql \mat{0.5 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1} \mat{ 2 & -3 & 2 \\ -2 & 1 & 0 \\ 1 & 1 & -1} \eql \mat{ 1 & -1.5 & 1 \\ -2 & 1 & 0 \\ 1 & 1 & -1} $$
This shows that multiplication by ${\bf R}_1$ achieves "multiply first row by 0.5" (or divide it by 2, which is what we seek).
In this way, ${\bf R}_1$ transforms ${\bf A}$ into another matrix (making a pivot, which was our goal).

As we've seen all three types of row operations can be achieved by a matrix multiplication that "transforms".

So, when we get an RREF at the end of $k$ row operations (transformations) we can describe it as: $$ \mbox{RREF}({\bf A}) \eql {\bf R}_k {\bf R}_{k-1} \ldots {\bf R}_2 {\bf R}_1 {\bf A} $$ where the first row op is acheived by ${\bf R}_1$, the second is represented by ${\bf R}_2$ applied to the result, and so on.

Recall: we applied these exact same transformations to the identity matrix (in parallel, on the right side of our long list of row ops).

So, the question is: what do we get when applying $$ {\bf R}_k {\bf R}_{k-1} \ldots {\bf R}_2 {\bf R}_1 {\bf I} \eql ? $$
Let's go back to $$ \mbox{RREF}({\bf A}) \eql {\bf R}_k {\bf R}_{k-1} \ldots {\bf R}_2 {\bf R}_1 {\bf A} $$ and do two things:
1. Recognize that $ \mbox{RREF}({\bf A}) = {\bf I}$
2. Multiply all the ${\bf R}_i$ matrices into one matrix: $$ {\bf R} \defn {\bf R}_k {\bf R}_{k-1} \ldots {\bf R}_2 {\bf R}_1 $$
Then, the reduction to RREF can be written as: $$ {\bf I} \eql {\bf R} {\bf A} $$ which means that ${\bf R}$ is the inverse of ${\bf A}$!

Now, we don't actually maintain the products of row operations for the sake of separately computing ${\bf R}$.

Instead, we applied all the row ops to ${\bf I}$: $$ {\bf R} {\bf I} $$
But this results in $$ {\bf R} {\bf I} \eql {\bf R} $$

So, the result at the end on the right side is the inverse of ${\bf A}$ because ${\bf R}={\bf A}^{-1}$.

One last point: if we've already solved for the variables, why do we need the inverse?

Let's state this question as follows:
- We started with wanting to solve ${\bf Ax}={\bf b}$ for some ${\bf A}$ and some ${\bf b}$, the given equations.
- We applied row reductions to get $\mbox{RREF}({\bf A})$.
- That gave us the solution ${\bf x}$.
- Isn't that enough? Why also compute ${\bf A}^{-1}$ on the right side?
In applications, it turns out that very often the equations stay the same but the right side changes to ${\bf Ax}={\bf c}$
- In this case, we solve once to get ${\bf A}^{-1}$ and apply it to a new right side: $$ {\bf x} \eql {\bf A}^{-1} {\bf c} $$
In other applications like least-squares, we need to compute the inverse of matrices like $({\bf A}^T{\bf A})$.
In fact, in general, linear algebra applications usually care more about inverses than solving equations.

r.6 Spaces, span, independence, basis

Let's consider span first:

We'll do this by example.

Consider the 3D vectors $$ {\bf u} \eql \vecthree{4}{0}{2} \;\;\;\;\;\;\;\; {\bf v} \eql \vecthree{0}{5}{3} $$

The span is all the vectors you can get by linear combinations, which we formally write as: $$ \mbox{span}({\bf u,v}) \eql \{{\bf z}: {\bf z} = \alpha{\bf u} + \beta{\bf v}, \mbox{ for some } \alpha,\beta\} $$ How to read aloud: "all z such that z can be expressed as a linear combination of u and v"

These are 3D vectors, so what does the span look like? Let's draw some vectors in the span:

All the vectors in the span lie in the plane containing ${\bf u}$ and ${\bf v}$.
The vector $(0,0,10)$, for example, which runs along the z-axis is not in the span.

Now look at the plane above and ask: is there any vector in that plane that's not in $\mbox{span}({\bf u,v})$?

No, because we can surely find a linear combination to reach it.

Now for another important question: pick any three vectors in that plane, and call them ${\bf x,y,z}$

Is every linear combination of these three in the same plane?
It seems obvious that the answer is yes because all their parallelograms will lie in the plane.

The implication is that the plane shown above is "complete" in that it contains all linear combinations of any subset of vectors.

Such a collection is called a subspace.
More formally: a subspace is a collection of vectors where the linear combination of any two vectors is in the collection.

Now consider the vectors ${\bf y}$ and ${\bf w}$ in this picture:

Clearly all linear combinations of the two lie on the dotted line, and so these two cannot linearly generate every vector in the grey plane.
The question we ask is: what is the minimum number of vectors needed to span the grey plane?
The answer is: two
These could be any two non-collinear (independent!) vectors like ${\bf u}$ and ${\bf v}$:

We are led to the definition of a basis:

A basis for a subspace is any minimal collection of vectors whose span is the subspace.
Observe: two vectors are not enough for all of 3D space.
Could ${\bf u}$, ${\bf v}$ and ${\bf y}$ be a basis for all of 3D space?
A basis for 3D space needs three, and three that are not co-planar (in the same plane).

We are often interested in an orthogonal basis:

Are ${\bf u}$ and ${\bf v}$ orthogonal?
Let's check: $$ {\bf u} \cdot {\bf v} \eql (4,0,2) \cdot (0,5,3) \eql 6 \neq 0 $$ So, no, they are not orthogonal and thus even though ${\bf u}$ and ${\bf v}$ are a basis, they are not an orthogonal basis.
Consider ${\bf u}=(4,0,2)$ and ${\bf r}=(3,-\frac{25}{2}, -6))$ $$ {\bf u} \cdot {\bf r} \eql (4,0,2) \cdot (3,-\frac{25}{2}, -6) \eql 0 $$ So, these two are orthogonal and because two vectors are enough for the grey plane, they form an orthogonal basis.

Orthogonal spaces:

Let's go back to the vectors ${\bf y}$ and ${\bf w}$ above, both of which were on the same line:
Now, $\mbox{span}({\bf y,w})$ is a subspace, because all linear combinations of any two vectors on this line are on the line.
Consider all the vectors orthogonal to ${\bf y}$.
- These are all going to be on a plane to which the line is perpendicular.
- Example vectors on this perpendicular plane are shown above.
Let ${\bf S}$ be the subspace of vectors on this perpendicular plane (perpendicular to the line).
- Notice: every vector in ${\bf S}$ is perpendicular to every vector in $\mbox{span}({\bf y,w})$.
- Thus, ${\bf S}$ and $\mbox{span}({\bf y,w})$ are orthogonal subspaces.
- There's notation for this: $$ {\bf S}^\perp \eql \mbox{span}({\bf y,w}) $$ and $$ {\bf S} \eql \mbox{span}({\bf y,w})^\perp $$
Orthogonal subspaces aren't really seen in practice, but they are useful in proofs.

Let's review independence:

Vectors ${\bf v}_1, {\bf v}_2, \ldots, {\bf v}_n$ are linearly independent if the only solution to the equation $$ x_1 {\bf v}_1 + x_2 {\bf v}_2 + \ldots + x_n {\bf v}_n \eql {\bf 0} $$ is $x_1 = x_2 = \ldots = x_n = 0$.
- Here, the ${\bf v}_i$'s are vectors (any dimension: 2D, 3D, whatever).
- Every vector in the collection needs to be non-zero (otherwise, trivially, a zero-vectors coefficient can be anything, and the definition will fail).
- The $x_i$'s are numbers (scalars in the linear combination)
- Notice the right side: that's the zero vector of the same dimension as any of the ${\bf v}_i$'s $$ {\bf 0} \eql (0,0,\ldots,0) $$
Consider these three vectors: $$\eqb{ {\bf u} & \eql & (4,0,2) \\ {\bf v} & \eql & (0,5,3) \\ {\bf r} & \eql & (-3,-\frac{25}{2},-6) \\ }$$ Are they independent?
- Observe that $$\eqb{ \frac{3}{4} {\bf u} - \frac{5}{2}{\bf v} - {\bf r} & \eql & \frac{3}{4}(4,0,2) - \frac{5}{2}(0,5,3) - (-3,-\frac{25}{2},-6)\\ & \eql & (0,0,0) }$$
- So, in setting $$ x_1 {\bf u} + x_2 {\bf u} + x_3 {\bf r} \eql {\bf 0} $$ we know that $(x_1,x_2,x_3) = (\frac{3}{4}, -\frac{5}{2}, -1)$ is one possible solution (there are others).
- This is a non-zero solution and hence the three are not independent.
- Which makes sense since all three lie on the plane (from earlier):
How can we check, given a collection of vectors, that they are independent?
- Recall, we are asking for the solution to $$ x_1 {\bf v}_1 + x_2 {\bf v}_2 + \ldots + x_n {\bf v}_n \eql {\bf 0} $$
- Place the vectors as columns of a matrix ${\bf A}$: $$ {\bf A} \eql \mat{ & & & \\ \vdots & \vdots & \ldots & \vdots \\ {\bf v}_1 & {\bf v}_2 & \ldots & {\bf v}_n\\ \vdots & \vdots & \ldots & \vdots \\ & & & } $$
- So, $$ x_1 {\bf v}_1 + x_2 {\bf v}_2 + \ldots + x_n {\bf v}_n \eql {\bf 0} $$ becomes $$ \mat{ & & & \\ \vdots & \vdots & \ldots & \vdots \\ {\bf v}_1 & {\bf v}_2 & \ldots & {\bf v}_n\\ \vdots & \vdots & \ldots & \vdots \\ & & & } \mat{x_1\\ x_2\\ x_3\\ \vdots\\ x_n} \eql \mat{0\\ 0\\ 0\\ \vdots\\ 0} $$
- Or, more compactly: solve ${\bf Ax} = {\bf 0}$.
- This is something we know how to do (compute the RREF etc).
From Theorem 8.1 we know that:
- When $\mbox{RREF}({\bf A}) = {\bf I}$ (i.e., the RREF is full-rank), then ${\bf x}={\bf 0}$ is the only solution to ${\bf Ax} = {\bf 0}$.
- Thus, the vectors will be independent if the RREF is full-rank (a pivot in every column).
- This gives us a means to identify whether a collection of vectors is independent: put them in a matrix and check its RREF.

Lastly, let's review the rowspace and colspace of a matrix:

Consider a matrix like $$ {\bf A} \eql \mat{1 & 1 & 1 & 0 & 3\\ -1 & 0 & 1 & 1 & -1\\ 0 & 1 & 2 & 1 & 2\\ 0 & 0 & 0 & 0 & 0} $$
Let's treat the rows as vectors and name them: $$\eqb{ {\bf r}_1 & \eql & (1,1,1,0,3)\\ {\bf r}_2 & \eql & (-1,0,1,1,-1)\\ {\bf r}_3 & \eql & (0,1,2,1,2)\\ {\bf r}_4 & \eql & (0,0,0,0,0) }$$
Then the span of these vectors is the rowspace of the matrix: $$ \mbox{rowspace}({\bf A}) \eql \mbox{span}({\bf r}_1, {\bf r}_2, {\bf r}_3, {\bf r}_4) $$
- That is, any linear combination of the four row vectors is in the rowspace.
- Example: $$\eqb{ 2{\bf r}_1 + 3 {\bf r}_2 + 0 {\bf r}_3 +5 {\bf r}_4 & \eql & 2(1,1,1,0,3) + 3(-1,0,1,1,-1) + 0(0,1,2,1,2) + 5(0,0,0,0,0)\\ & \eql & (-1,2,5,3,3) }$$ So $(-1,2,5,3,3)$ is a vector in the rowspace.
Similarly, if we name the 5 columns ${\bf c}_1, {\bf c}_2, {\bf c}_3,{\bf c}_4,{\bf c}_5$ then the colspace is their span: $$ \mbox{colspace}({\bf A}) \eql \mbox{span}({\bf c}_1, {\bf c}_2, {\bf c}_3,{\bf c}_4,{\bf c}_5) $$
What's of interest here is:
1. What is the dimension of the rowspace, that is, how many vectors are needed for a basis of the rowspace?
2. (same question for colspace)
Now the RREF turns out to be: $$ \mat{{\bf 1} & 0 & -1 & -1 & 1\\ 0 & {\bf 1} & 2 & 1 & 2\\ 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0} $$
Clearly, the rowspace of the RREF is the span of the first two rows.
Now for the key observation: the row operations we performed do not change the rowspace of the matrices along the way from ${\bf A}$ to its RREF.
Why?
- All row operations are linear combinations of rows.
- To see why, just recall how we wrote them. Example: ${\bf r}_1 \leftarrow {\bf r}_1 - 2{\bf r}_2$.
Unfortunately, it is not true that $\mbox{colspace}({\bf A}) = \mbox{colspace}(RREF({\bf A}))$.
What is true is that the size of the basis for $\mbox{colspace}({\bf A})$ is the same as the number of pivot columns.
Because pivots are alone in both their rows and columns, we get the remarkable result that the dimension of the rowspace is the same as the dimension of the colspace.
All of this is summarized in a picture we saw before:

r.7 Orthogonality

There are two subtopics of orthogonality that commonly appear in applications:

Orthogonal vectors and matrices.

Projections.

Let's start with orthogonal vectors:

Two vectors ${\bf u}$ and ${\bf v}$ are orthogonal if $$ {\bf u} \cdot {\bf v} \eql 0 $$ Note:
- The right side is the number 0 (since the dot product gives us a number).
- This is unlike the linear independence definition where the ${\bf 0}$ on the right side is the zero vector.
- The definition comes from the angle between them, which is a right-angle: $$ \cos(\theta) \eql \frac{ {\bf u} \cdot {\bf v}}{ |{\bf u}||{\bf v}|} $$ Since the lengths aren't zero, if the dot product is zero, that implies a zero cosine, or $\theta=90^\circ$.
This definition extends to a collection of vectors:
- Consider vectors ${\bf v}_1,{\bf v}_2,{\bf v}_3$.
- The collection is orthogonal if $$\eqb{ {\bf v}_1 \cdot {\bf v}_2 & \eql & 0\\ {\bf v}_1 \cdot {\bf v}_3 & \eql & 0\\ {\bf v}_2 \cdot {\bf v}_3 & \eql & 0 }$$ Since dot-product is commutative we don't need to specify products like ${\bf v}_2 \cdot {\bf v}_2 = 0$.
Example: ${\bf v}_1=(4,4,4), {\bf v}_2=(4,-8,4), {\bf v}_3=(6,0,-6)$ $$\eqb{ {\bf v}_1 \cdot {\bf v}_2 & \eql & (4,4,4) \cdot (4,-8,4) & \eql & 0\\ {\bf v}_1 \cdot {\bf v}_3 & \eql & (4,4,4) \cdot (6,0,-6) & \eql & 0\\ {\bf v}_2 \cdot {\bf v}_3 & \eql & (4,-8,4) \cdot (6,0,-6) & \eql & 0\\ }$$
Now consider this equation $$ \alpha_1 {\bf v}_1 + \alpha_2 {\bf v}_2 + \alpha_3 {\bf v}_3 = {\bf 0} $$ Does this imply that all the $\alpha$'s are zero, and therefore the ${\bf v}$'s are independent?
- Multiply (dot product) both sides by ${\bf v}_1$: $$ {\bf v}_1 \cdot (\alpha_1 {\bf v}_1 + \alpha_2 {\bf v}_2 + \alpha_3 {\bf v}_3) \eql {\bf v}_1 \cdot {\bf 0} $$
- The right side becomes the number 0.
- Do the algebra on the left side, passing the dot into the parens: $$ \alpha_1 ({\bf v}_1 \cdot {\bf v}_1) + \alpha_2 ({\bf v}_1 \cdot {\bf v}_2) + \alpha_3 ({\bf v}_1 \cdot {\bf v}_3) \eql 0 $$
- Because they are orthogonal, only non-zero dot product is the first one, so $$ \alpha_1 ({\bf v}_1 \cdot {\bf v}_1) + 0 + 0 \eql 0 $$
- Now apply the dot product $$ {\bf v}_1 \cdot {\bf v}_1 \eql |{\bf v}_1| |{\bf v}_1| \cos(0) $$
- And so, we get $$ \alpha_1 |{\bf v}_1| |{\bf v}_1| \eql 0 $$
- Which, because $|{\bf v}_1| \neq 0$, implies that $$ \alpha_1 \eql 0 $$
To conclude, an orthogonal collection is linearly independent.
- This makes sense geometrically too.
- For example, with the three vectors in the example above:
- ${\bf v}_3$ is not in the plane of ${\bf v}_1$ and ${\bf v}_2$, and so, cannot be expressed as a linear combination of ${\bf v}_1$ and ${\bf v}_2$.

What about complex vectors?

The visualization problem with complex vectors (even 2D complex vectors) is that we can't draw them.
For example, consider these two 3-component complex vectors: $$\eqb{ {\bf u} & \eql & (1 + 0i, \; -0.5 + 0.866i, \; -0.5 - 0.866i)\\ {\bf v} & \eql & (1 + 0i, \; 1 + 0i, \; 1 + 0i) }$$
We can't really visualize because there's no convenient way of drawing a complex vector like this, and so, there's no geometric way to define an angle between them.
However, their dot-product is: $$ (1 + 0i)(1 - 0i) + (-0.5+0.866i)(1-0i) + (-0.5 - 0.866i)(1 - 0i) \eql 0 $$ (Recall: we conjugate the elements of the second vector.)
Thus, we define orthogonality between two complex vectors to mean: their dot-product is zero.

Orthogonal matrices

First, let's sort out orthogonal vs orthonormal.
Two vectors ${\bf u}$ and ${\bf v}$ are orthonormal if
1. They are orthogonal.
2. They have unit length: $|{\bf u}| = 1$, $|{\bf v}| = 1$
There is, unfortunately, a bit of confusing nomenclature regarding orthogonal matrices.
An orthogonal matrix is a matrix whose columns are pairwise orthonormal.
Note: pairwise means "all possible pairs" like column 2 and column 5.
For example, consider $$ {\bf A} \eql \mat{4 & 4 & 6\\ 4 & -8 & 0\\ 4 & 4 & -6} $$ You can check that the columns are pairwise orthogonal but not orthonormal.
- For example: $|(4,4,4)| = \sqrt{4^2+4^2+4^2} = 6.92820$.
However, this matrix is orthonormal: $$ {\bf Q} \eql \mat{0.577 & 0.408 & 0.707\\ 0.577 & -0.816 & 0.408\\ 0.577 & 0 & -0.707} $$ The columns are pairwise orthogonal and all have unit length.
What's nice about an orthonormal matrix is this: $$ {\bf Q}^T {\bf Q} \eql {\bf I} $$
With the above example: $$ {\bf Q}^T {\bf Q} \eql \mat{0.577 & 0.577 & 0.577\\ 0.408 & -0.816 & 0\\ 0.707 & 0.408 & -0.707} \mat{0.577 & 0.408 & 0.707\\ 0.577 & -0.816 & 0.408\\ 0.577 & 0 & -0.707} \eql \mat{1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1} $$ To see why:
- The row i, column j entry of the product is the dot product of row i from ${\bf Q}^T$ and column j from ${\bf Q}$
- But row i from ${\bf Q}^T$ is just column i from ${\bf Q}$
- When $i\neq j$ column i and column j from ${\bf Q}$ are orthonormal and so their dot-product is 0.
- When i=j, it's the dot-product of a column with itself, which is 1.
The important implication is that ${\bf Q}^T$ is the inverse of ${\bf Q}$.

r.8 Projections

Let's start with a 2D example:

Here, ${\bf w}$ and ${\bf v}$ are any two vectors.

The projection of ${\bf w}$ on ${\bf v}$ is that vector ${\bf y}$ along ${\bf v}$ which will make the difference perpendicular (dot product zero): $$ ({\bf w} - \alpha{\bf v}) \cdot {\bf v} \eql 0\\ $$ That is, there is some stretch $\alpha {\bf v}$ of ${\bf v}$ which will make the difference ${\bf z}$ perpendicular to ${\bf v}$.

We can solve for the number $\alpha$: $$ \alpha \eql \frac{{\bf w} \cdot {\bf v}}{{\bf v} \cdot {\bf v}} $$

When, for example, $$\eqb{ {\bf w} & \eql & (4,3) \\ {\bf v} & \eql & (6,2) }$$ we get $$ \alpha \eql \frac{(4,3) \cdot (6,2)}{(6,2) \cdot (6,2)} \eql 0.75 $$

Although we have drawn the above with 2D vectors, a projection from one vector to another works in any number of dimensions, for example:

Next, let's look at a 3D vector that's projected onto a plane whose basis is orthogonal:

Let's make sense of this busy picture:

Think of ${\bf w}$ as a regular 3D vector sticking out into 3D space.
Next, let ${\bf v}_1$ and ${\bf v}_2$ be two orthogonal vectors in the x-y plane.
We want to ask: what's the projection of ${\bf w}$ onto each of ${\bf v}_1$ and ${\bf v}_2$ and what do those individual projections have to do with ${\bf y}$, the projection of ${\bf w}$ on the span of the two vectors?
The individual projections are: $$\eqb{ \alpha_1 {\bf v}_1 \\ \alpha_2 {\bf v}_2 \\ }$$ where $$\eqb{ \alpha_1 & \eql & \frac{{\bf w} \cdot {\bf v}_1}{{\bf v}_1 \cdot {\bf v}_1}\\ \alpha_2 & \eql & \frac{{\bf w} \cdot {\bf v}_2}{{\bf v}_2 \cdot {\bf v}_2}\\ }$$
But the sum of these is exactly ${\bf y}$: $$ {\bf y} \eql \alpha_1{\bf v}_1 + \alpha_2{\bf v}_2 $$
Note: you cannot reconstruct the original vector ${\bf w}$ knowing only the individual projections $\alpha_1{\bf v}_1$ and $\alpha_2{\bf v}_2$.
That's because ${\bf w}$ is not in the same space as ${\bf v}_1$ and ${\bf v}_2$.

So, finally, let's look at the case when ${\bf w}$ is in the span of ${\bf v}_1$ and ${\bf v}_2$.

In this case:

The individual projections add up to the original vector ${\bf w}$: $$ {\bf w} \eql \alpha_1{\bf v}_1 + \alpha_2{\bf v}_2 $$
Thus, when ${\bf w}$ is in the span of ${\bf v}_1$ and ${\bf v}_2$, we can fully reconstruct ${\bf w}$ knowing only the individual projections on the basis vectors.
Note: although we have used a basis for the x-y plane, the above reasoning applies to any subspace and any orthogonal basis for that subspace.
Thus, in the general case we'd say that $$ {\bf w} \eql \alpha_1{\bf v}_1 + \alpha_2{\bf v}_2 \; \ldots \; + \alpha_n{\bf v}_n $$ where $$ \alpha_i \eql \frac{{\bf w} \cdot {\bf v}_i}{{\bf v}_i \cdot {\bf v}_i} $$

A further simplification when the basis is orthonormal:

When the ${\bf v}_i$'s are orthonormal, they have unit length and so $$ {\bf v}_i \cdot {\bf v}_i \eql |{\bf v}_i|^2 \eql 1. $$ Which means $$ \alpha_i \eql {\bf w} \cdot {\bf v}_i $$
Then $$ {\bf w} \eql ({\bf w} \cdot {\bf v}_1) {\bf v}_1 + \; \ldots \; + ({\bf w} \cdot {\bf v}_n) {\bf v}_n $$ This is worth remembering: many applications use orthonormal bases.

Go to Part III