\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecfour}[4]{\left[\begin{array}{r} #1 \\ #2\\ #3\\ #4\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \newcommand{\bigsp}{\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\plss}{\;\;+\;\;} \newcommand{\miss}{\;\;-\;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \)


Summary

What we learned: through the 10 applications

First, let's review the 10 applications from Module 1, in the order we encountered them there:

  1. Solving equations.
    • This is the founding problem of linear algebra (over 1000 years old).
    • We expressed this as \({\bf Ax}={\bf b}\) and saw this as the question: is there a linear combination of the columns (they are vectors) of \({\bf A}\) that result in vector \({\bf b}\)?
    • With that, we were able to see when you get a solution, when there's no solution, and when there are multiple.
    • Then in Module 5, we developed the RREF approach to get actual solutions.
    • The RREF itself turned out to be the starting point for the theory of linear algebra, with important concepts like pivot rows/columns.
  2. Curve fitting.
    • If \({\bf Ax}={\bf b}\) cannot be solved because \({\bf b}\) is not in \({\bf A}\)'s colspace, then we solve approximately using the "closest" vector in the colspace.
    • This is the least-squares method, credited to Gauss when he sought to fit an ellipse onto data.
  3. Bezier curves.
    • We saw that linear combinations of Bernstein polynomials can be used to approximate any function.
    • When we use two such linear combinations, one each to determine \(x(t)\) and \(y(t)\), the curve traced by the points \(x(t), y(t)\) is a Bezier curve, a workhorse of computer graphics.
    • The coefficients in the linear combination are conveniently rendered as control points for the curve, making it easy for graphic designers to work with.
  4. 3D view.
    • The basic matrix-vector equation can be interpreted as an expression of coordinates in a different basis (the basis formed by the columns of the matrix).
    • This lets us design matrices to achieve transformations.
    • To go the other way, and retrieve coordinates when we know the transformations, we use the inverses of the matrices involved.
  5. Compression.
    • We saw two types of compression: one from the SVD, and the other from the DFT.
    • The DFT compression throws out high frequencies, making a bet that (for an audio application), the highest frequencies won't be missed by the ear.
    • The SVD compression throws out terms in the outer-product that correspond to small \(\sigma_i\)'s on the diagonal of \(\Sigma\). The reconstruction of the data via the SVD then is approximate, but close enough.
  6. Face recognition.
    • This part involves recasting data into new coordinates, using the basis vectors in \({\bf U}\), which is the same thing as PCA.
    • The intuition is that better separation will ease the search problem.
    • One trick we had to use: converting a 2D array of pixels into a 1D array (a vector).
  7. Google's page-rank algorithm.
    • This models webpage importance with the random-surfer approach.
    • The probabilistic solution to the random-surfer is the highest-eigenvalue eigenvector, when we examined the transition matrix in the right way.
    • There were a few more tricks to use in dealing with large sizes and dangling nodes.
  8. Text search.
    • This is PCA/SVD applied to documents when documents are represented as word-vectors.
  9. Recommendation engine.
    • In this problem, we're given a matrix (user-movie ratings) with missing entries.
    • The straightforward way is to try different approximations with the SVD.
    • A slightly more advanced way treats row-column combinations in the SVD as variables, and optimizes those.
  10. Audio filtering.
    • This involved linear combinations of complex vectors.
    • When the vectors are special vectors using the N-th roots of unity, we get the DFT matrix.
    • The DFT is possibly one of the most widely used algorithms (most often in its optimized FFT form).
 


What we learned: through \({\bf Ax}={\bf b}\)

 

 


What we learned: through one picture

 

Look back at Module 1 (Section 1.14) to see all the major topics laid out in a picture.

Let's annotate that picture to summarize some key lessons:

 


Where to go from here

 

Now that you have a basic understanding of linear algebra, what comes next? Where can you go from here?

  • In computer graphics:
    • Most transformation matrices are \(4\times 4\), the affine enhancement for 3D space.
    • You can learn how to implement common transformations efficiently, including, interestingly, one that creates perspective.
    • It turns out many curves, including Beziers, are conveniently represented with groups of matrices.
    • See, for example, the basic collection in a popular graphics tool like Open-GL
    • A lot of object geometry also involves matrices, but in a different (equation-solving) way: to compute intersections to see whether two moving objects intersect.

  • In computer vision and image processing:
    • Working with images (or video) requires operations on arrays of pixels, many of which are described and implemented using matrices.
    • Example applications: filtering, scaling, watermarking, compression, noise removal.
    • One operation worth learning about: convolution.

  • Robotics, and graphics involving motion:
    • Matrices play a useful role in compactly describing moving objects, whether inside a 3D world, or actual robots.
    • From physics, you might remember that motion involves velocity (first derivative of position), acceleration (2nd derivative). When applied to multiple objects, each using a vector to describe position in 3D space, it's convenient to use matrices called Jacobians and Hessians for representing derivatives.

  • Data science:
    • We've already seen regression (curve-fitting) via least-squares. Many other statistical methods are conveniently represented with matrix operations.
    • Beyond PCA/SVD, there are non-linear methods that are useful in dimension reduction.
    • The missing-entry problem has seen recent advances with techniques like compressive sensing.

  • Graphs:
    • In computer science, when we think about graphs, we often think about popular algorithm like shortest-paths, depth-first search, and the like.
    • Surprisingly, one can represent a graph using an adjacency matrix (of 0s and 1s), and use its eigenvalues for some graph problems.

  • Quantum computing: the starting point for quantum computing is a combination of probability, and the linear algebra of complex vectors.

  • Computational linear algebra:
    • This is a fairly specialized subdiscipline, and quite involved.
    • There are sophisticated techniques now for matrix inversion, SVD, and eigenvalue computations.
    • Some of this field involves linear-algebra computations on specialized hardware (parallel, GPU).

  • The most important use: reading
    • Many algorithms are presented with the language of linear algebra.
    • Thus, even though the key ideas don't involve linear algebra concepts (like independence or orthogonality), the presentation makes use of the compact boldface matrix/vector notation.
 

What you should consider doing as next steps:

  • Mark your calendar a year from now, setting aside 8-10 hours.
  • When that time comes, return to the Review section of this course, and to your own narrative review, and spend those hours reviewing. How much do you remember? How much of the modules do you need to re-read? This is how you are going to build the long-term building blocks that stay with you.
  • When you next encounter linear algebra, don't just skim lightly. Dig a little deeper and see if you can understand finer details.
 

Lastly: to complete your mathematical toolset for computer science, spend at least twice the time on probability that you spent on linear algebra. And I mean probability, not statistics, although some knowledge of the latter is useful too. Probability is bigger, less intuitive, and has far more CS applicability. See my thoughts on what math is good for CS and what constitutes a complete CS education.
 


© 2020, Rahul Simha