\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecfour}[4]{\left[\begin{array}{r} #1 \\ #2\\ #3\\ #4\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \newcommand{\bigsp}{\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\plss}{\;\;+\;\;} \newcommand{\miss}{\;\;-\;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \)

Module 8: Theory, Part II

The fundamental theorem(s) of linear algebra

Module objectives

By the end of this module, you should be able to

Describe and interpret a few key theoretical results in linear algebra.

Describe how one of the results applies to least-squares.

Describe a linear transformation and its relationship to matrices.

Before we start, let's review the two different meanings of dimension:

The dimension of a vector is the number of components in that vector.
- This is same as in the commonly used phrases 2D or 3D.
- Examples: $$ \vectwo{1}{-2} $$ is two-dimensional whereas $$ \vecthree{-1.5}{2}{4.6} $$ is three-dimensional.
The dimension of a subspace (which is a set, not a single thing) is the number of vectors in a basis for that subspace.
- This is asking: what is the minimum number of vectors (how many vectors) needed to span a given subspace.
- Example: suppose ${\bf W} = \{(x_1,x_2,0): x_i\mbox{ is real}\}$
  - ${\bf W}$ is a space consisting of three-component vectors.
  - How many vectors are needed to span ${\bf W}$?
  - The two vectors ${\bf u}=(1,0,0)$ and ${\bf v}=(0,1,0)$ are sufficient but any one vector all by itself is not.
  - Thus the dimension of ${\bf W}$ is 2.
  - We write this as $\mbox{dim}\;{\bf W} = 2$.

If this were not confusing enough, one also speaks of the dimensions (plural) of a matrix: the number of rows and the number of columns.

8.0 What is a fundamental theorem?

In-Class Exercise 1: Type the search term "fundamental theorem of a" to see suggested completions, then try "fundamental theorem of b", and so on. What is the first letter that fails?

About fundamental theorems:

Clearly, the world is rife with fundamental theorems.

Somehow, a fundamental theorem gets its appellation from an informal consensus within its field.

Linear algebra is no different. Various books appear to pick one result (see below) as the fundamental theorem of linear algebra.

Even though at least a few such results deserve equal mention.

We will thus elevate a few of these and call them all fundamental theorems, one of them designated The fundamental theorem.

Each such major theorem actually exists in many forms.
$\rhd$ For example, in real or complex (most general)

We will present the most "practical" version (with real numbers) where possible.

8.1 A few definitions as background

The following terms and notions will be needed, some of which we've seen before.

Let ${\bf A}_{m\times n}$ be a real matrix with rows $$ \mat{ \ldots & {\bf r}_1 & \ldots\\ \ldots & {\bf r}_2 & \ldots\\ & \vdots & \\ \ldots & {\bf r}_m & \ldots\\ } $$ and columns $$ \mat{ & & & \\ \vdots & \vdots & & \vdots \\ {\bf c}_1 & {\bf c}_2 & \ldots & {\bf c}_n\\ \vdots & \vdots & & \vdots \\ & & & } $$

Definition: The rowspace is the span of the rows (treating the rows as vectors): $$ \mbox{rowspace}({\bf A}) \eql \mbox{span}({\bf r}_1, {\bf r}_2, \ldots, {\bf r}_m) $$

Definition: The colspace is the span of the columns (treating the columns as vectors): $$ \mbox{colspace}({\bf A}) \eql \mbox{span}({\bf c}_1, {\bf c}_2, \ldots, {\bf c}_n) $$

Definition: The rank is the dimension of the rowspace or the colspace (which we know are the same): $$ \mbox{rank}({\bf A}) \eql \mbox{dim}(\;\mbox{colspace}({\bf A})\; ) $$

Definition: The nullspace is the set of all vectors that solve ${\bf Ax}={\bf 0}$ $$ \mbox{nullspace}({\bf A}) \eql \{{\bf x}: {\bf Ax}={\bf 0} \} $$

A useful picture to help recall the definitions and their meanings:

8.2 The theorem on invertibility and ${\bf Ax}={\bf b}$

We've seen the first one before, but let's restate it in full glory.

Theorem 8.1: The following are equivalent (each implies any of the others) for a real matrix ${\bf A}_{n\times n}$:

${\bf A}$ is invertible (the inverse exists).

${\bf A}^T$ is invertible.

$RREF({\bf A}) \eql {\bf I}$

$\mbox{rank}({\bf A}) = n$.

The rows are linearly independent.

The columns are linearly independent.

$\mbox{nullspace}({\bf A})=\{{\bf 0}\}$
$\rhd$ ${\bf Ax}={\bf 0}$ has ${\bf x}={\bf 0}$ as the only solution.

${\bf Ax}={\bf b}$ has a unique solution.

$\mbox{colspace}({\bf A})=\mbox{rowspace}({\bf A})=\mathbb{R}^n$

Note:

We saw most of this in the proofs relating to the RREF (Module 5) and the results following the definition of linear independence (Module 6).
All of the proofs essentially boil down to understanding what happens with the RREF.

8.3 The rank-nullity theorem

Theorem 8.2: For a real matrix ${\bf A}_{m\times n}$, $ \mbox{rank}({\bf A}) + \mbox{dim}(\mbox{nullspace}({\bf A})) \eql n $

Intepretation and proof:

Here, $n$ = the number of columns.

Recall: $\mbox{nullspace}({\bf A}) = \{{\bf x}: {\bf Ax}={\bf 0} \}$

That is, the set of all vectors ${\bf x}$ that satisfy ${\bf Ax}={\bf 0}$.

We first need to understand how the nullspace can have a dimension.

If a few linearly independent vectors within the nullspace span the nullspace, then those vectors form a basis for the nullspace.

The number of such linearly independent vectors is the dimension of the nullspace.

The theorem says that the rank of the matrix and this dimension are related.

Paradoxically, this will be easier to understand after the proof.

But the proof itself will be easier to understand once we look at an example.

Consider the matrix $$ {\bf A} \eql \mat{ 2 & 1 & 3 & 0 & 3\\ 1 & 0 & 1 & 1 & 2\\ 3 & 2 & 5 & -1 & 4\\ } $$

In solving ${\bf Ax}={\bf 0}$, we get the RREF $$ \left[ \begin{array}{ccccc|c} {\bf 1} & 0 & 1 & 1 & 2 & 0\\ 0 & {\bf 1} & 1 & -2 & -1 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ \end{array} \right] $$

In-Class Exercise 2: Use the RREF to write down the corresponding equations with the basic variables on the left and free variables on the right. Express $x_1$ as a linear combination of $x_3,x_4,x_5$. Then express $x_2$ as a linear combination of $x_3,x_4,x_5$. What are the solutions when:

$x_3=1,x_4=0,x_5=0$?

$x_3=0,x_4=1,x_5=0$?

$x_3=0,x_4=0,x_5=1$?

Clearly, any setting of values to the three free variables will result in fixing the values of the basic variables.

In particular consider the vectors $$ {\bf v}_1 \eql \mat{-1 \\ -1\\ 1\\ 0\\ 0} \;\;\;\;\;\; {\bf v}_2 \eql \mat{-1 \\ 2\\ 0\\ 1\\ 0} \;\;\;\;\;\; {\bf v}_3 \eql \mat{-2 \\ 1\\ 0\\ 0\\ 1} $$ Note:

We want to argue that these three vectors span the space of all possible solutions to ${\bf Ax}={\bf 0}$.
Consider any solution ${\bf y}$ to ${\bf Ax}={\bf 0}$.
Then, the 3D vector $(y_3, y_4, y_5)$ can be expressed as a linear combination of $(1,0,0), (0,1,0), (0,0,1)$, i.e. $$ \vecthree{y_3}{y_4}{y_5} \eql \alpha \vecthree{1}{0}{0} + \beta \vecthree{0}{1}{0} + \gamma \vecthree{0}{0}{1} $$ for some choice of $\alpha, \beta, \gamma$.
But notice that $$\eqb{ y_1 & \eql & (-1)\cdot y_3 + (-1)\cdot y_4 + (-2)\cdot y_5 & \;\;\;\;\;\; \eql \mbox{ a linear combination of } y_3,y_4,y_5\\ y_2 & \eql & (-1)\cdot y_3 + (2) \cdot y_4 + (1)\cdot y_5 & \;\;\;\;\;\; \eql \mbox{ a linear combination of } y_3,y_4,y_5 }$$
Thus, $$ \mat{y_1\\ y_2\\ y_3\\ y_4\\ y_5} \eql \alpha \mat{-y_3-y_4-2y_5\\ -y_3 + 2y_4 + y_5\\ 1 \\ 0 \\ 0} + \beta \mat{-y_3-y_4-2y_5\\ -y_3 + 2y_4 + y_5\\ 0 \\ 1 \\ 0} + \gamma \mat{-y_3-y_4-2y_5\\ -y_3 + 2y_4 + y_5\\ 0 \\ 0 \\ 1} $$
What is the intuition here?
- The basic variable $y_1$ is determined by whatever choice of values we make for free variables $y_3,y_4,y_5$.
- The form of this determination is a linear combination.
- For example, we saw that $$ y_1 \eql (-1)\cdot y_3 + (-1)\cdot y_4 + (-2)\cdot y_5 $$
- Thus, any linear combination of $y_3,y_4,y_5$ will determine $y_1,y_2$ such that ${\bf Ay}={\bf 0}$.
- Which means the dimension of this subspace of solutions to ${\bf Ay}={\bf 0}$ is the number of vectors needed to span all possible $y_3,y_4,y_5$.
- In other words, the dimension is 3 (the number of free variables).
Thus, the dimension of the nullspace is the same as the number of non-pivot columns.

Proof of Theorem 8.2: Consider the RREF of ${\bf A}$. As we saw in Module 5, the pivot columns determine the basic variables, which is the same as the rank. As the argument above shows, the dimension of the null space is exactly the number of non-pivot columns. Both sum to $n$. $\;\;\;\;\Box$

8.4 The Fundamental Theorem of linear algebra

To understand The fundamental theorem, we'll first need to understand the orthogonal complement of a subspace:

Think about the following two subspaces in 3D:

${\bf W}_{xy}$ = all vectors in the (x,y)-plane.
${\bf W}_{z}$ = all vectors along the z-axis.

Clearly, every vector ${\bf u}\in {\bf W}_{xy}$ is perpendicular to every vector ${\bf v}\in {\bf W}_{z}$.

More importantly, if ${\bf v}$ is any vector that is orthogonal to every ${\bf u}\in {\bf W}_{xy}$, then ${\bf v}\in {\bf W}_{z}$.

Thus, ${\bf W}_{z}$ contains all vectors that are orthogonal to some vector in ${\bf W}_{xy}$.
$\rhd$ We say that ${\bf W}_{z}$ is the orthogonal complement of ${\bf W}_{xy}$.

For our 3D example, the statements above are obvious and don't need proof.

But in general, we will define orthogonal complement as follows.

Definition: A vector ${\bf v}$ is orthogonal to a subspace ${\bf W}$ if it is orthogonal to every vector in ${\bf W}$.

Definition: The orthogonal complement of a subspace ${\bf W}$, denoted by ${\bf W}^\perp$, is the set of all vectors orthogonal to ${\bf W}$.

Clearly, ${\bf W} = ({\bf W}^\perp)^\perp$.

Theorem 8.3: (The fundamental theorem of linear algebra)
Let ${\bf A}_{m\times n}$ be a matrix with rank $r$.

$\mbox{dim}(\mbox{colspace}({\bf A})) \eql r$
$\mbox{dim}(\mbox{rowspace}({\bf A})) \eql r$
$\mbox{dim}(\mbox{nullspace}({\bf A})) \eql n-r$
$\mbox{dim}(\mbox{nullspace}({\bf A}^T)) \eql m-r$
$(\mbox{rowspace}({\bf A}))^\perp \eql \mbox{nullspace}({\bf A})$
$(\mbox{colspace}({\bf A}))^\perp \eql \mbox{nullspace}({\bf A}^T)$

Interpretation and proof:

We've already proved items 1-3:
- Parts 1-2 are Theorem 6.6.
- Part 3 follows directly from Theorem 8.2.
Part 4 follows from applying 8.2 to the transpose.
So, all that's new is Part 5 (because Part 6 is the same thing applied to the transpose).
Let's understand what it's saying:
- Every vector that's perpendicular to the row space is in the nullspace (and vice-versa).
The proof of Part 5 is surprisingly simple.
Proof of Theorem 8.3: We've already proved parts 1-4. For Part 5, let ${\bf x}$ be any vector that is perpendicular to the row space. Then, the dot product with every row is zero, which means ${\bf Ax}={\bf 0}$. To go the other way, if ${\bf x}$ satisfies ${\bf Ax}={\bf 0}$, that is, ${\bf x}\in \mbox{nullspace}({\bf A})$, then the dot product with every row is zero, which means the dot product with any linear combination of rows is also zero. Thus, the dot product with any vector in the row space is zero, or ${\bf x}$ is orthogonal to the row space. $\;\;\;\;\Box$

We will explore two implications of Theorem 8.3

A high-level conceptual one.
Theorem 7.2, whose proof we postponed in Module 7 because we needed Theorem 8.3.

Recall:
Theorem 7.2: If the columns of ${\bf A}_{m\times n}$ are linearly independent, then $({\bf A}^T {\bf A})^{-1}$ exists.

First, the conceptual one:

We'll start with a small result that will be needed.
Proposition 8.4: If ${\bf W}$ is a subspace of $\mathbb{R}^n$ and ${\bf x}\in \mathbb{R}^n$ then ${\bf x}$ can be expressed uniquely as ${\bf x}={\bf y}+{\bf z}$ where ${\bf y}\in{\bf W}$ and ${\bf z}\in{\bf W}^\perp$

In-Class Exercise 3: Before we get to the proof, draw an example in 3D with ${\bf W}$ = the (x,y)-plane.
We will do the proof in steps.
Proof: (Step 1) We need to show both that there exist such vectors ${\bf x}$ and ${\bf y}$ and that they are unique. Suppose $\mbox{dim}({\bf W})=r$. Let ${\bf v}_1, {\bf v}_2,\ldots,{\bf v}_r$ be a basis for ${\bf W}$ and ${\bf v}_{r+1}, {\bf v}_{r+2},\ldots,{\bf v}_n$ be a basis for ${\bf W}^\perp$.

In-Class Exercise 4: How do we know that ${\bf W}^\perp$ has dimension $n-r$?

(Step 2) Suppose $$ \alpha_1{\bf v}_{1} + \alpha_2{\bf v}_{2} + \ldots + \alpha_r{\bf v}_{r} + \alpha_{r+1}{\bf v}_{r+1} + \alpha_{r+2}{\bf v}_{r+2} + \ldots + \alpha_n{\bf v}_{n} \eql {\bf 0} $$ We need to show that this implies all the $\alpha_i$'s are zero. Define $$\eqb{ {\bf y} & \eql & \alpha_1{\bf v}_{1} + \alpha_2{\bf v}_{2} + \ldots + \alpha_r{\bf v}_{r}\\ {\bf z} & \eql & \alpha_{r+1}{\bf v}_{r+1} + \alpha_{r+2}{\bf v}_{r+2} + \ldots + \alpha_n{\bf v}_{n} }$$ Then ${\bf y} + {\bf z} = {\bf 0}$ and thus ${\bf y} = -{\bf z}$, which means ${\bf z}$ is a scalar multiple of ${\bf y}$ and therefore in the same subspace. But ${\bf y}\in {\bf W}$ because it's a linear combination of ${\bf W}$'s basis vectors and similarly ${\bf z}\in {\bf W}^\perp$. The only way this can happen is if $$ {\bf y},{\bf z}\in {\bf W}\cap{\bf W}^\perp=\{{\bf 0}\} $$ Since ${\bf v}_1, {\bf v}_2,\ldots,{\bf v}_r$ is a basis for ${\bf W}$ $$ \alpha_1{\bf v}_1 + \alpha_2{\bf v}_2 + \ldots \alpha_r{\bf v}_r \eql {\bf 0} \;\;\;\;\;\;\;\; \Rightarrow \alpha_1=\alpha_2=\ldots\alpha_r=0 $$ Similarly, $$ \alpha_{r+1}{\bf v}_{r+1} + \alpha_{r+2}{\bf v}_{r+2} + \ldots \alpha_n{\bf v}_n \eql {\bf 0} \;\;\;\;\;\;\;\; \Rightarrow \alpha_{r+1}=\alpha_{r+2}=\ldots\alpha_n=0 $$ Therefore, all the $\alpha_i$'s are zero.
(Step 3) We've shown that ${\bf v}_1, {\bf v}_2,\ldots,{\bf v}_n$ are linearly independent. Since the dimension of $\mathbb{R}^n$ is $n$, then ${\bf v}_1, {\bf v}_2,\ldots,{\bf v}_n$ is a basis for $\mathbb{R}^n$. This means that any ${\bf x}\in \mathbb{R}^n$ can be expressed as the sum ${\bf x}={\bf y}+{\bf z}$ above with ${\bf y}\in {\bf W}$ and ${\bf z}\in {\bf W}^\perp$.
(Step 4) We still need to show uniqueness. After all, could different choices of bases for the subspaces result in a different sum? Assume that ${\bf x}={\bf y}^\prime+{\bf z}^\prime$. Then, because ${\bf x}={\bf y}+{\bf z}$, we equate and see that ${\bf y}-{\bf y}^\prime={\bf z}-{\bf z}^\prime$, and therefore both are in the same space. But ${\bf y}-{\bf y}^\prime \in {\bf W}$ and ${\bf z}-{\bf z}^\prime \in {\bf W}^\perp$ by assumption. The only way this can happen is if ${\bf y}-{\bf y}^\prime={\bf 0} = {\bf z}-{\bf z}^\prime$ or ${\bf y}={\bf y}^\prime$ and ${\bf z}={\bf z}^\prime$. $\;\;\;\;\Box$
Whew! That was quite a long proof.
We still haven't gotten around to our high-level conceptual idea yet.
Consider ${\bf Ax}={\bf b}$ where ${\bf A}_{m\times n}$ has rank $r$.
Let ${\bf W}=\mbox{rowspace}({\bf A})$. Now, Theorem 8.3 tells us that ${\bf W}^\perp=\mbox{nullspace}({\bf A})$.
Proposition 8.4 says that any ${\bf x}\in\mathbb{R}^n$ can be written uniquely as ${\bf x}={\bf y} + {\bf z}$ where $$\eqb{ {\bf y} & \in & \mbox{rowspace}({\bf A})\\ {\bf z} & \in & \mbox{nullspace}({\bf A})\\ }$$
Then, $$\eqb{ {\bf b} & \eql & {\bf Ax} \\ & \eql & {\bf A}({\bf y} + {\bf z}) \\ & \eql & {\bf Ay} + {\bf Az} \\ & \eql & {\bf Ay} \\ }$$

In-Class Exercise 5: Why is ${\bf Az}={\bf 0}$?
Now stare at $$\eqb{ {\bf b} & \eql & {\bf Ax} \\ & \eql & {\bf Ay} \\ }$$ for a moment.
- Observe that ${\bf b}$ is in the column space of ${\bf A}$.
  
  In-Class Exercise 6: Why is this true?
- Furthermore ${\bf y}$ is in the rowspace, from our decomposition ${\bf x}={\bf y} + {\bf z}$.
Now for the important high-level observation:
- We started with ${\bf b} = {\bf Ax}$ where ${\bf x}\in\mathbb{R}^n$ is any vector.
- Then we showed that ${\bf b} = {\bf Ay}$ where ${\bf y}$ is a unique vector in the rowspace corresponding to ${\bf x}$.
- Yet ${\bf b}$ is in the column space.
- This means that every matrix implies a unique mapping of every vector in its rowspace to a vector in its column space.
Another way of saying the same thing:
- Take any vector ${\bf b}$ in the column space.
- There is a unique vector ${\bf y}$ in the rowspace such that ${\bf Ay}={\bf b}$.
Thus, simple matrix-vector multiplication provides a 1-1 mapping between its rowspace and its column space.
This is far from obvious and quite unexpected.
$\rhd$ Merely glancing through the proofs up to this point should convince you that it's not obvious.

Finally, let's get to the proof of Theorem 7.2, which we left pending from Module 7:

Theorem 7.2: If the columns of ${\bf A}_{m\times n}$ are linearly independent, then $({\bf A}^T {\bf A})^{-1}$ exists.
As a first step, let's prove the following.
Proposition 8.5: $$ \mbox{nullspace}({\bf A}) \eql \mbox{nullspace}({\bf A}^T {\bf A}) $$ Proof: Consider the solution to $$ ({\bf A}^T {\bf A}){\bf x}={\bf 0} $$ We'll first write this as $$ {\bf A}^T ({\bf A}{\bf x})={\bf 0} $$ Which, if we define ${\bf y}={\bf A}{\bf x}$, means that ${\bf A}^T{\bf y}={\bf 0}$ and therefore $$ {\bf y}\in\mbox{nullspace}({\bf A}^T). $$ Now ${\bf y}={\bf A}{\bf x}$, which means that $$ {\bf y}\in\mbox{colspace}({\bf A}). $$ Theorem 8.3 then implies that $$ {\bf y}\in\mbox{nullspace}({\bf A}^T)^\perp $$ So, $$ {\bf y}\in\mbox{nullspace}({\bf A}^T)^\perp \cap \mbox{nullspace}({\bf A}^T) \eql \{{\bf 0}\}. $$ Which means ${\bf A}{\bf x}={\bf 0}$. $\;\;\;\;\Box$
Now we will complete the proof of 7.2
Proof of Theorem 7.2: Since we've assumed that ${\bf A}$ has full column rank (columns are all linearly independent), the only solution ${\bf A}{\bf x}={\bf 0}$ is ${\bf x}={\bf 0}$. Proposition 8.5 showed that setting $({\bf A}^T {\bf A}){\bf x}={\bf 0}$ implied ${\bf x}={\bf 0}$. From Theorem 8.1, this means $({\bf A}^T {\bf A})^{-1}$ exists. $\;\;\;\;\Box$

A related useful result:

Proposition 8.6: For any matrix ${\bf A}_{m\times n}$ $$ \mbox{rank}({\bf A}) = \mbox{rank}({\bf A}^T{\bf A}) $$

In-Class Exercise 7: Prove this result using the following steps:

Lastly, let's use Theorem 8.3 to point out something that seems obvious but needs proof:

The set of all real-valued n-component vectors is called $\mathbb{R}^n$, pronounced as "R n".
For example, $\mathbb{R}^3$ is the set of all 3D vectors $(x_1, x_2, x_3)$.
Clearly, we know that $n$ vectors are sufficient to form a basis for $\mathbb{R}^n$ because we can use the $n$ standard basis vectors.
In 3D that would be: $$ \vecthree{x_1}{x_2}{x_3} \eql x_1 \vecthree{1}{0}{0} + x_2 \vecthree{0}{1}{0} + x_3 \vecthree{0}{0}{1} $$
One could ask: can we do with fewer than $n$ vectors for a basis?
Theorem 8.3 says "no":
- If it were possible, say with $m \lt n$ basis vectors then we could put these as rows into a matrix ${\bf A}_{m,n}$ where the RREF would have at most $m$ pivot rows.
- Then, Theorem 8.3 says the nullspace will have dimension $n-m$, which at least means it's non-empty.
- These nullspace vectors are orthogonal to the rowspace vectors (and thus its basis).
- Which means the nullspace vectors can't be reached by linear combination of the basis.
To summarize, $\mathbb{R}^n$ needs exactly $n$ vectors for a basis.
Of course, a non-standard non-orthogonal basis could be used but it would have to have $n$ vectors.

8.5 An important fundamental theorem: the singular value decomposition (SVD)

The SVD is easily one of the most useful results in linear algebra, and a candidate for "The Second Fundamental Theorem of Linear Algebra".

First, let's state it and then explore.

Theorem 8.7: Any real matrix ${\bf A}_{m\times n}$ with rank $r$ can be written as the product $$ {\bf A} \eql {\bf U} {\bf \Sigma} {\bf V}^T $$ where ${\bf U}_{m\times m}$ and ${\bf V}_{n\times n}$ are real orthogonal matrices and ${\bf \Sigma}_{m\times n}$ is a real diagonal matrix of the form $$ {\bf \Sigma} \eql \mat{ \sigma_1 & 0 & 0 & \ldots & & & 0\\ 0 & \sigma_2 & 0 & \ldots & & & 0\\ \vdots & \vdots & \ddots & & & & 0\\ 0 & 0 & \ldots & \sigma_r & & \ldots & 0\\ 0 & \ldots & & & 0 & \ldots & 0 \\ 0 & \ldots & & & \ldots & \ddots & 0\\ 0 & \ldots & & & & \ldots & 0\\ } $$

Let's start by exploring the middle matrix ${\bf \Sigma}$:

Definition: A diagonal matrix is a square matrix whose only non-zero values occur on the diagonal.

Thus, the diagonal may have some zeroes too.

All non-diagonal entries are zero.

What does multiplication by such a matrix do?

In-Class Exercise 8: Download DiagonalExample.java, compile and execute for different values of k. You will also need your MatrixTool, and DrawTool.java.

Next, let's explore ${\bf U}$ and ${\bf V}$:

Recall from Theorem 7.4 (about orthogonal matrices)
- Multiplying a vector by an orthogonal matrix does not change its length.
- An orthogonal matrix is equivalent to a series of reflections or rotations.
As we did in Module 4, we will generate a series of $N$ equally-spaced (by angle) vectors for various angles $\theta \in [0,2\pi]$.
For example, with $N=8$:
- The blue vectors on the left are generated in increments of $\frac{2\pi}{8}$.
- Let $\theta_i$ be the angle of the i-th vector.
- Each blue vector is multiplied by an orthogonal matrix to produce a corresponding green vector.
- The lengths are unchanged.
- Let $\alpha_i$ be the angle of the i-th resulting vector.
- (All angles are measured from the rightward x-axis).
In-Class Exercise 9: Draw a graph of $\alpha_i$ vs. $\theta_i$ for the above example.

In-Class Exercise 10: Download SVDExplore.java and compare the effect of transformation by a random ${\bf A}$ and then its ${\bf U}$ and ${\bf V}$ matrices when ${\bf A}={\bf U\Sigma V}^T$. In all three cases, a graph of $\alpha_i$ vs. $\theta_i$ is plotted. Execute a few different times for each case: each execution creates a new random matrix. You will also need lintool, UniformRandom.java, Function.java, and SimplePlotPanel.java.
Thus, we see that:
- The graph is complicated and non-linear for ${\bf A}$.
- There is a clear linear or inverse-linear relationship when the matrix is orthogonal.

About the SVD:

The SVD is one of the most powerful and practically useful results in all of linear algebra.
For example, we will see that SVD will give us something called the pseudoinverse:
- Recall: not every matrix has an inverse.
- However, every matrix ${\bf A}$ will turn out to have a pseudoinverse ${\bf A}^+$.
- To solve ${\bf Ax}={\bf b}$ approximately, compute ${\bf A}^+$ and set ${\bf x} = {\bf A}^+ {\bf b}$.
- We will later see how to do this.
The SVD is useful in applications:
- Image-search and topic identification in text.
- Compression.
- Low-rank approximations for the movie-rating problem.
The SVD is a reliable way to determine the approximate rank of a matrix.
- The RREF for very large matrices can result in compounding errors (from lots or row transformations).
In-Class Exercise 11: Create a 3x3 matrix with rank=1. Download SVDRank.java and enter your matrix to confirm the rank. Observe that the remainder of the code generates random matrices to compute the average rank. Increase the number of such matrices. What is the intuition for why it's hard to generate a random 3x3 matrix with rank less than 3? [Hint: think about the columns, their independence, and use a geometric interpretation].

One interpretation of the SVD:

Recall: every matrix has a SVD: $$ {\bf A} \eql {\bf U} {\bf \Sigma} {\bf V}^T $$ where ${\bf U}, {\bf V}$ are orthogonal and ${\bf \Sigma}$ is diagonal.
This means any transformation ${\bf Ax}$ of a vector ${\bf x}$ by a matrix ${\bf A}$ can be decomposed into a sequence of three transformations:
- An "almost" rotation/reflection (multiplication by ${\bf V}^T$).
- A scaling (multiplication by ${\bf \Sigma}$).
- A second "almost" rotation/reflection (multiplication by ${\bf U}$).
This is quite unexpected: every matrix transformation is just really just three basic transformations applied sequentially!

8.6 Linear transformations

Consider what it means to transform a vector:

So far, the only way we know is through multiplication by a matrix.

That is, we start with some vector ${\bf x}$, and multiply by a matrix ${\bf A}$ to get a vector ${\bf y}$: $$ {\bf y} \eql {\bf Ax} $$

In-Class Exercise 12: What is the relationship between the dimension of ${\bf x}$ and the dimension of ${\bf y}$?

We can think of this as a general transformation that takes in a vector and produces a vector:

Let's use the notation ${\bf y} = T({\bf x})$ for general vector-to-vector transformations.

We can imagine other ways of making a transformation.

In-Class Exercise 13: Design a transformation, as weird as you would like, that is obviously not related to matrix multiplication.

Linearity in transformations:

Definition: A linear transformation $T$ has the property that for all vectors ${\bf x}, {\bf y}$ and scalars $\alpha, \beta$ the following is true: $$ T(\alpha{\bf x} + \beta{\bf y}) \eql \alpha T({\bf x}) + \beta T({\bf y}) $$

In-Class Exercise 14: Is the transformation example you designed in an earlier exercise a linear transformtion?

In-Class Exercise 15: Show that:
(Remember, in the second case, both the argument and result are the zero vector).

And now for a fundamental, yet surprisingly easy to prove result:

Theorem 8.9: For every linear transformation $T$ there is an equivalent matrix ${\bf A}$ such that $T({\bf x}) = {\bf Ax}$ for all ${\bf x}$.
Before proving, let's understand what it's saying:
- In essence, a linear transformation can only be a matrix multiplication, nothing else.
- Even if a linear transformation looks very not-matrix-like, it is equivalent to a matrix.
Proof of Theorem 8.9: Let's do this in steps. Suppose the transformation $T$ maps n-dimensional vectors into m-dimensional vectors.
- Consider the n standard vectors of dimension n: ${\bf e}_1, {\bf e}_2, \ldots, {\bf e}_n$.
  In-Class Exercise 16: What are the standard vectors for $n=4$?
- Define the vectors $$ {\bf c}_1 \eql T({\bf e}_1), \;\;\;\;\;\; {\bf c}_2 \eql T({\bf e}_2), \;\;\;\;\;\; \ldots, \;\;\;\;\;\; {\bf c}_n \eql T({\bf e}_n), $$
  In-Class Exercise 17: What is the dimension of each $T({\bf e}_i)$?
- Put the $n$ ${\bf c}_i$'s as columns of a matrix ${\bf A}$.
- Any vector ${\bf x}=(x_1,x_2,\ldots,x_n)$ can be written as $$ x_1 {\bf e}_1 + x_2 {\bf e}_2 + \ldots + x_n {\bf e}_n $$
  In-Class Exercise 18: Why is this true? Give an example with $n=4$.
- Now apply the transformation $T$: $$\eqb{ T({\bf x}) & \eql & T(x_1 {\bf e}_1 + x_2 {\bf e}_2 + \ldots + x_n {\bf e}_n) \\ & \eql & x_1 T({\bf e}_1) + x_2 T({\bf e}_2) + \ldots + x_n T({\bf e}_n)\\ & \eql & x_1 {\bf c}_1 + x_2 {\bf c}_2 + \ldots + x_n {\bf c}_n\\ }$$
  In-Class Exercise 19: The third step is from the definition of the ${\bf c}_i$'s but what let's us conclude the second step from the first?
- Finally, the last step above is a linear combination of the columns of ${\bf A}$ using the elements of ${\bf x}$.
- In other words: $$ x_1 {\bf c}_1 + x_2 {\bf c}_2 + \ldots + x_n {\bf c}_n \eql {\bf Ax} $$
- And so, $T({\bf x}) = {\bf Ax}$. $\;\;\;\;\;\Box$

An interpretation of Theorem 8.9:

Imposing linearity on a transformation is actually quite restrictive.
To see why, let's go back to the step where $$\eqb{ T({\bf x}) & \eql & T(x_1 {\bf e}_1 + x_2 {\bf e}_2 + \ldots + x_n {\bf e}_n) \\ & \eql & x_1 T({\bf e}_1) + x_2 T({\bf e}_2) + \ldots + x_n T({\bf e}_n)\\ }$$
Once a linear transformation has taken a linear combination of the standard vectors, it is forced to produce the same linear combination of the transformed standard vectors.
Thus, a linear transformations actions on standard vectors completely determine its action on any vector.
It so happens, of course, that a linear combination of vectors is written as a matrix-vector product
$\rhd$ This is how matrices were defined
Thus, it is now perhaps not so surprising that linear transformations and matrix-vector multiplications are the same thing.

8.7 Highlights and study tips

Highlights:

The purpose of this module was to give you a high-level overview of some of the major theoretical results in linear algebra.

When ${\bf Ax}={\bf b}$ has a solution, many things are simultaneously implied:

${\bf A}$ has an inverse.
${\bf A}$'s rows and columns are linearly independent.
The solution is unique.

There are two obvious spaces related to a matrix ${\bf A}$:

Row space: the span of its rows.
Column space: the span of its columns.

The less apparent ones are the the nullspace and the nullspace of the transpose.

The Fundamental Theorem of Linear Algebra finds two different relationships between these spaces:

In terms of dimension: for example, the sum of dimensions of the rowspace and nullspace is the number of columns.
In terms of orthogonality: the rowspace and nullspace are orthogonal complements.

The orthogonality of spaces is useful in some proofs.

The SVD should really be a called Second Fundamental Theorem of Linear Algebra.

The SVD's internals and uses are still to be described (later modules).

A linear transformation and a matrix are one and the same thing.

Study tips:

The most important thing to learn in this module is how to read and interpret the theorems.
In general, in the future, learning to understand what a result is saying is much more important than learning to reproduce the proof.
Start by breaking a theorem down into pieces. Are the pieces understandable on their own?
Create an example to go hand-in-hand with understanding a theorem, as we did with Theorem 8.2.