\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecfour}[4]{\left[\begin{array}{r} #1 \\ #2\\ #3\\ #4\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \newcommand{\bigsp}{\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\plss}{\;\;+\;\;} \newcommand{\miss}{\;\;-\;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \)


Review: Part I

Part II    Part III

 


r.1    What are vectors and what do you do with them?

 

Let's start with: what are vectors and how did they come about?

 

Notation for many vectors:

 

About that "column style" of writing a vector:

 


r.2    Dot products, lengths, angles, orthogonal vectors

 

The dot-product conveniently gives us two nice results:

 


r.3    Matrix-vector multiplication and what it means

 

Let's go back to the linear combination of three 3D vectors from earlier: $$\eqb{ \alpha \vecthree{u_1}{u_2}{u_3} \plss \beta \vecthree{v_1}{v_2}{v_3} \plss \gamma \vecthree{v_1}{v_2}{v_3} & \eql & \vecthree{\alpha u_1 + \beta v_1 + \gamma w_1}{\alpha u_2 + \beta v_2 + \gamma w_2}{\alpha u_3 + \beta v_3 + \gamma w_3} }$$

  • Think of the left side as the linear combination and the right side as the vector that results from the linear combination.

  • Matrices came about by visually reorganizing the left side into $$\eqb{ \alpha \vecthree{u_1}{u_2}{u_3} \plss \beta \vecthree{v_1}{v_2}{v_3} \plss \gamma \vecthree{v_1}{v_2}{v_3} & \eql & \mat{ u_1 & v_1 & w_1 \\ u_1 & v_2 & w_2 \\ u_1 & v_3 & w_3 \\ } \vecthree{\alpha}{\beta}{\gamma} }$$

  • Since it's the same left-side (just visually) reorganized, it must equal the original right side: $$ \mat{ u_1 & v_1 & w_1 \\ u_2 & v_2 & w_2 \\ u_3 & v_3 & w_3 \\ } \vecthree{\alpha}{\beta}{\gamma} \eql \vecthree{\alpha u_1 + \beta v_1 + \gamma w_1}{\alpha u_2 + \beta v_2 + \gamma w_2}{\alpha u_3 + \beta v_3 + \gamma w_3} $$

  • Let's give each of these names: $$\eqb{ {\bf A} & \defn & \mat{ u_1 & v_1 & w_1 \\ u_1 & v_2 & w_2 \\ u_1 & v_3 & w_3 \\ } \\ {\bf x} & \defn & \vecthree{\alpha}{\beta}{\gamma}\\ {\bf b} & \defn & \vecthree{\alpha u_1 + \beta v_1 + \gamma w_1}{\alpha u_2 + \beta v_2 + \gamma w_2}{\alpha u_3 + \beta v_3 + \gamma w_3} }$$

  • Then, written in these symbols, we have $$ {\bf Ax} \eql {\bf b} $$

  • What we have is a "matrix times a vector gives a vector".
        ⇒ The matrix \({\bf A}\) times vector \({\bf x}\) gives the vector \({\bf b}\)
 

There are two meanings of \({\bf Ax} = {\bf b}\).
 

Let's look at the first one, what we just did above:

  • Here, \({\bf x}\) is only incidentally a vector (because it looks like one).

  • The only thing that matters about \({\bf x}\) is that it contains the scalars in the linear combination.

  • And the linear combination of what? The columns of \({\bf A}\), each of which is a vector.

  • The result is the right side vector, remembering that a linear combination of vectors is a vector.

  • So, when we "unpack" \({\bf Ax} = {\bf b}\), the left side unpacks into: $$ \mat{ u_1 & v_1 & w_1 \\ u_2 & v_2 & w_2 \\ u_3 & v_3 & w_3 \\ } \vecthree{\alpha}{\beta}{\gamma} \eql \alpha \vecthree{u_1}{u_2}{u_3} \plss \beta \vecthree{v_1}{v_2}{v_3} \plss \gamma \vecthree{v_1}{v_2}{v_3} $$

  • Whereas the right side is the result vector (of the linear combination): $$ \vecthree{\alpha u_1 + \beta v_1 + \gamma w_1}{\alpha u_2 + \beta v_2 + \gamma w_2}{\alpha u_3 + \beta v_3 + \gamma w_3} $$

  • Let's look at a 2D example:
    • First, examine the linear combination $$ 3 \vectwo{1}{4} \plss 1 \vectwo{3}{-2} \eql \vectwo{6}{10} $$
    • We've learned that the left side can be written in matrix form as: $$\eqb{ \mat{1 & 3\\ 4 & -2} \vectwo{3}{1} & \eql & \vectwo{6}{10}\\ {\bf A} \;\;\;\;\;\;\;\; {\bf x} \;\;\; & \eql & \;\;\;\; {\bf b} }$$
    • Now let's draw these:

  • Let's write "linear combination of columns" generally for the case of \(n\) dimensions to see what it looks like: $$\eqb{ {\bf Ax} & \eql & \mat{\vdots & \vdots & \cdots & \vdots\\ {\bf c}_1 & {\bf c}_2 & \cdots & {\bf c}_n\\ \vdots & \vdots & \cdots & \vdots} \mat{x_1 \\ x_2 \\ \vdots \\ x_n}\\ & \eql & x_1 {\bf c}_1 \plss x_2 {\bf c}_2 \plss \ldots \plss x_n {\bf c}_n }$$ Thus, the \({\bf x}\) vector, using its component values (which are numbers) creates a linear combination of the columns of \({\bf A}\).

  • When we ask whether there is an \({\bf x}\) vector such that \({\bf Ax}={\bf b}\), we are asking whether there is some linear combination of the column vectors to produce the vector \({\bf b}\).
 

The second interpretation: a matrix transforms a vector

  • When seeing $$\eqb{ \mat{1 & 3\\ 4 & -2} \vectwo{3}{1} & \eql & \vectwo{6}{10}\\ {\bf A} \;\;\;\;\;\;\;\; {\bf x} \;\;\; & \eql & \;\;\;\; {\bf b} }$$ think of
    • \({\bf x}\) is just some vector $$ {\bf x} \eql \vectwo{3}{1} $$
    • \({\bf A}\) as a matrix that "does something" to \({\bf x}\) to produce another vector \({\bf b}\).

  • Pictorially:

    • Here, we've actually drawn \({\bf x}\) as a vector.
 

There is one more useful way to view matrix-vector multiplication:

  • Suppose we think of the matrix \({\bf A}\)'s rows as vectors.

  • Then, the calculation of the resulting vector \({\bf b}\)'s components can be written as a row of \({\bf A}\) times \({\bf x}\).

  • In our 2D example, the first element of \({\bf b}\) is: $$ \mat{{\bf 1} & {\bf 3}\\ 4 & -2} \vectwo{{\bf 3}}{{\bf 1}} \eql \vectwo{{\bf 1}\times {\bf 3} \; + \; {\bf 3}\times {\bf 1}}{10} \eql \vectwo{{\bf 6}}{10} $$ and the second element of \({\bf b}\) is computed by multiplying the second row of \({\bf A}\) times \({\bf x}\): $$ \mat{1 & 3\\ {\bf 4} & {\bf -2}} \vectwo{{\bf 3}}{{\bf 1}} \eql \vectwo{6}{{\bf 4}\times {\bf 3} \; {\bf -} \; {\bf 2}\times {\bf 1}} \eql \vectwo{6}{{\bf 10}}\\ $$

  • But the row-of-\({\bf A}\) times \({\bf x}\) can be written as a dot product. For example: $$ \mat{1 & 3\\ {\bf 4} & {\bf -2}} \vectwo{{\bf 3}}{{\bf 1}} \eql \vectwo{6}{(4,-2)\cdot (3,1)} \eql \vectwo{6}{{\bf 10}}\\ $$

  • Suppose we give names to the rows of \({\bf A}\). Then: $$ \mat{\cdots & {\bf r}_1 & \cdots\\ \cdots & {\bf r}_2 & \cdots} {\bf x} \eql \vectwo{{\bf r}_1 \cdot {\bf x}}{{\bf r}_2 \cdot {\bf x}} $$

  • This works for any size matrix-vector multiplication: $$ \mat{\cdots & {\bf r}_1 & \cdots\\ \cdots & {\bf r}_2 & \cdots\\ \vdots & \vdots & \vdots\\ \cdots & {\bf r}_m & \cdots} {\bf x} \eql \mat{ {\bf r}_1 \cdot {\bf x} \\ {\bf r}_2 \cdot {\bf x} \\ \vdots \\ {\bf r}_m \cdot {\bf x}} $$

  • This interpretation of matrix-vector multiplication does not come with geometric meaning.
 


r.4    Solving \({\bf Ax} = {\bf b}\) exactly and approximately

 

Let's start with some equations: $$ \eqb{ x_1 + 3x_2 & = 7.5 \\ 4x_1 + 2x_2 & = 10 \\ } $$

  • In "vector stretch" form, they become: $$ x_1 \vectwo{1}{4} + x_2 \vectwo{3}{2} \eql \vectwo{7.5}{10} $$

  • So, asking to solve the equations is the same as asking "is there a linear combination (values of \(x_1,x_2\)) such that the linear combintion on left yields the vector on the right?"

  • Thus \(x_1=1.5, \;\; x_2=2\) is a solution: $$ 1.5 \vectwo{1}{4} + 2 \vectwo{3}{2} \eql \vectwo{7.5}{10} $$
 

Next, let's examine a case where no solution exists: $$ \eqb{ x_1 + 3x_2 & = 7.5 \\ 2x_1 + 6x_2 & = 10 \\ } $$

  • In "vector stretch" form, this is asking if there exist scalars \(x_1,x_2\) such that: $$ x_1 \vectwo{1}{2} + x_2 \vectwo{3}{6} \eql \vectwo{7.5}{10} $$

  • Pictorially, it's easy to see that there's no solution:

  • Digging a bit deeper, notice that the vector \((3,6)\), which is the second column is in fact a multiple of the first column \((1,2)\).
    • Thus, it so happens that $$ 3 \vectwo{1}{2} - \vectwo{3}{6} \eql \vectwo{0}{0} $$
    • In other words, the special equation $$ x_1 \vectwo{1}{2} + x_2 \vectwo{3}{6} \eql \vectwo{0}{0} $$ has a non-zero solution \((x_1,x_2) = (3,-1)\).
    • This means that the two columns are not independent (by the definition of linear independence).
    • Thus, the matrix $$ \mat{1 & 3\\ 2 & 6} $$ is not full-rank.
    • As an exercise: work out the RREF to see that there's only one pivot.
 

Now, let's look at the case where multiple solutions exist: $$ \eqb{ x_1 + 3x_2 & = 4 \\ 2x_1 + 6x_2 & = 8 \\ } $$

  • In "stretch form": $$ x_1 \vectwo{1}{2} + x_2 \vectwo{3}{6} \eql \vectwo{4}{8} $$

  • Pictorially:

  • So, for example \(x_1=1, x_2=1\) is one solution.

  • So are \(x_1=0.7, x_2=1.1\), \(x_1=0.667, x_2=1.111\), and countless others.
 

Let's examine a 3D example:

  • Consider $$ \mat{6 & 4 & 8\\ 2 & 3 & 1\\ 0 & 0 & 0} \vecthree{x_1}{x_2}{x_3} \eql \vecthree{5}{6}{0} $$

  • Let's draw them:

  • It turns out that \(x_1=-0.9, x_2=2.6, x_3=0\) is one solution:

  • Now consider this problem: $$ \mat{6 & 4 & 8\\ 2 & 3 & 1\\ 0 & 0 & 0} \vecthree{x_1}{x_2}{x_3} \eql \vecthree{5}{6}{3} $$

  • We are of course asking for values of \(x_1,x_2,x_3\) such that the linear combination $$ x_1 \vecthree{6}{2}{0} + x_2\vecthree{4}{3}{0} + x_3\vecthree{8}{1}{0} \eql \vecthree{5}{6}{3} $$

  • This is of course impossible:

    And therefore there is no solution to the corresponding equations,

  • If we were to write them out as: $$\eqb{ 6x_1 & + & 4x_2 & + & 8x_3 & \eql & 5 \\ 2x_1 & + & 3x_2 & + & x_3 & \eql & 6\\ 0 & + & & + & 0 & \eql & 3 }$$ Here, too, it's evident that there is no way to satisfy the last equation.
 

What do we do when no solution is possible?

  • Consider \({\bf Ax}={\bf b}\) and \({\bf A}\)'s columns: $$ {\bf A} \eql \mat{\vdots & \vdots & \ldots & \vdots\\ {\bf c}_1 & {\bf c}_2 & \ldots & {\bf c}_n\\ \vdots & \vdots & \ldots & \vdots} $$
  • In general, if \({\bf Ax}={\bf b}\) does not have a solution, it means right-side vector \({\bf b}\) is not in the span of the columns of \({\bf A}\).
    • Recall: the span of a set of vectors is all the vectors you can reach by linear combinations.
    • Thus, there is no linear combination such that $$ x_1 {\bf c}_1 + x_2{\bf c}_2 + \ldots + x_n{\bf c}_n \eql {\bf b} $$

  • Suppose we were to seek a "next best" or approximate solution \(\hat{\bf x}\).

  • The first thing to realize is that an approximate solution is still a linear combination of the columns: $$ \hat{x}_1 {\bf c}_1 + \hat{x}_2{\bf c}_2 + \ldots + \hat{x}_n{\bf c}_n \eql ? $$

  • Thus, in our example, we seek $$ \hat{x}_1 \vecthree{6}{2}{0} + \hat{x}_2\vecthree{4}{3}{0} + \hat{x}_3\vecthree{8}{1}{0} $$ that's closest to the desired target \((5,6,3)\).

  • Our example provides an intuition:

    • The difference vector between the "closest" and \({\bf b}\) is perpendicular to every vector in the span of the columns.
    • This is what the least-squares method exploits.

  • The least squares method: Suppose $$ {\bf y} \eql \mbox{ the closest vector in the colspan} $$ Let $$\eqb{ {\bf z} & \eql & \mbox{ the difference vector}\\ & \eql & {\bf b} - {\bf y} }$$ Here, \({\bf z}\) is the red vector in the diagram above.

  • Since \({\bf z}\) is perpendicular to the plane containing the columns, it's perpendicular to the columns themselves: $$\eqb{ {\bf c}_1 \cdot {\bf z} \eql 0 \\ {\bf c}_2 \cdot {\bf z} \eql 0 \\ }$$

  • The rest of least squares stems from the observation that the columns of \({\bf A}\) are rows of \({\bf A}^T\):
    • If we were to multiply \({\bf A}^T\) into \({\bf z}\): $$ {\bf A}^T {\bf z} \eql \mat{\ldots & {\bf c}_1 & \ldots\\ \ldots & {\bf c}_2 & \ldots\\ \vdots & \vdots & \vdots\\ \ldots & {\bf c}_n & \ldots} \eql \mat{ {\bf c}_1 \cdot {\bf z} \\ {\bf c}_2 \cdot {\bf z} \\ \vdots \\ {\bf c}_n \cdot {\bf z}} \eql \mat{ 0 \\ 0\\ 0\\ 0} $$
    • Now plug in $$ {\bf y} \eql {\bf A} \hat{{\bf x}} $$ in $$ {\bf z} \eql {\bf b} - {\bf y} $$ and do the algebra to get $$ {\bf A}^T {\bf b} - {\bf A}^T {\bf A}\hat{\bf x} \eql {\bf 0} $$ after which we get $$ \hat{\bf x} \eql ({\bf A}^T {\bf A})^{-1} {\bf A}^T {\bf b} $$ provided the inverse exists.
 

Go to Part II
 


© 2020, Rahul Simha