$ \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} $

Module 1: Introduction

What is this course about?

Module objectives

By the end of this module, you will:

Explore 10 different applications of linear algebra.

Explain the high-level purpose of each application.

Be able describe the input-output requirements of each application.

Be able to articulate the high-level goals of the course.

1.0 Application demos

In-Class Exercise 1: We will explore 10 different applications of linear algebra. Start by following these instructions.

1.1 Application #1: the oldest application

Consider these simultaneous linear equations: $$ \eqb{ x + 3y & = 7.5 \\ 4x + 2y & = 10 \\ } $$ which we can also write as $$ \eqb{ x_1 + 3x_2 & = 7.5 \\ 4x_1 + 2x_2 & = 10 \\ } $$

In-Class Exercise 2: Solve these equations by hand using a systematic approach. Describe your "algorithm" in plain English. Next, verify your solution:

Enter the equationsDemo directory.

Copy the plain text file problem1.txt to equations.txt.

Execute SolveEquations.

Examine the file problem1.txt and explain how the data matches the above problem.

Next, consider these equations $$ \eqb{ x_1 & + 3x_2 & -2x_3 & = 9.5 \\ 4x_1 & + 2x_2 & & = 10 \\ 2x_1 & - x_2 & + x_3 & = 0 } $$

In-Class Exercise 3: Show how your algorithm applies to the above equations. Verify by using the data in problem2.txt. (To use the demo on problem2.txt copy over problem2.txt to equations.txt, which is the input file to the program.)

In-Class Exercise 4: What are the two parameters to the solveEquations() method? What is returned from the method?

Next, consider the equations $$ \eqb{ x_1 + 3x_2 & = 7.5 \\ 2x_1 + 6x_2 & = 10 \\ } $$

In-Class Exercise 5: Show what happens when your algorithm is applied to these equations. What do you conclude? Verify by using problem3.txt.

In-Class Exercise 6: Examine the data in problem4.txt and write down the corresponding equations. Why does the problem have multiple solutions?

In-Class Exercise 7: Examine the data in problem5.txt, write down the corresponding equations and apply your algorithm. Do these equations have a solution? What do you see when you run the demo? Compare the equations to the first data set (in problem1.txt).

In-Class Exercise 8: Run the demo on the data in problem6.txt. Write down, then compare these equations to the equations in problem5.txt. What is dissatifying about the demo's output?

In-Class Exercise 9: What are desirable features in a general-purpose solver for simultaneous linear equations?

Some unanswered questions so far:

Is it always possible to tell whether a particular set of equations has a solution?
What is common to problems that have a solution, or that don't have a solution?
Are approximate solutions possible when there isn't a solution?
Does the solution method depend on the number of variables? The number of equations?

Any others?

1.2 Application #2: Gauss' discovery

Carl Friedrich Gauss, considered one of the greatest mathematicians of all time, developed a method for predicting orbits based on observations:

Ceres ("say rees") is the largest known minor planet in the asteroid belt.

Discovered by astronomers in 1801.

Difficult to track, owing to Sun's glare, except for a few observations.

Thought to be impossible to predict at the time.

Gauss, at age 24, developed a method and correctly predicted the orbit.

The method had two parts, both remarkable: (1) Solving some astronomy problems; (2) figuring out how to use "too many equations" for the best approximation.

The latter led to the linear algbera method of least squares.

In-Class Exercise 10: In the curveFittingDemo, compile and execute CurveFittingDemo.java. Then, try each of the four examples in turn as directed in main(). Read the code in the ellipse() method.

How many parameters does an ellipse have?

What is the minimum number of data points needed?

Note: the original (true) curve is plotted in red, whereas the fitted curve is the blue one.

At this point, several questions arise:

What is the relationship between data points and what linear algebra needs?
Does accuracy always increase with more data points?
Does the method always produce some result or approximation?
Amongst all approximate solutions, how does the linear algebra solution fare?
If an exact solution is possible, does the approach produce it?

1.3 Application #3: A car designer's dilemma

Any graphical design tool needs to allow designers to draw curves and then to store them.

Three options:

Approximate with many straight lines.

Find a mathematical equation to represent each curve of interest.

Bezier (or similar) curves.

In-Class Exercise 11: What are comparative advantages and disadvantages of A and B above?

In-Class Exercise 12: Run the BezierDemo. Draw an interesting curve with four or more control points. What are the storage requirements for such a curve using a traditional (type A) approach? What are the storage requirements using the Bezier approach? Read the code. What are the inputs to linear algebra?

About Bezier curves:

Originate in Bernstein polynomials (which we will also visit) from early 1900's.
Developed as a curve-representation approach by Pierre Bezier (at Renault) and Paul de Casteljau (at Citroen).
They have useful computational and numerical properties.
Generalize to surfaces.
Bezier curves or surfaces, and their variations, are now the standard in any kind of graphical design, fonts etc.

Questions at this time:

How exactly does it work? What are the limitations?
What is the computational cost?
What does it have to do with linear algebra?

1.4 Application #4: 3D to 2D

Our viewing devices are all 2D. Which makes it a challenge to represent a 3D object.

In-Class Exercise 13: Consider a cuboid with corners at these coordinates: (5,1,8), (5,3,8), (9,3,8), (9,1,8) (5,1,11), (5,3,11), (9,3,11), (9,1,11). On a sheet of paper, draw x,y,z axes with the z axis going up, the x axis going roughly south-west, and the y axis southeast. Then draw the cube. Now consider the left edge of the paper as the y-axis of a 2D plane; the lower edge of the paper is the z-axis. What are the coordinates of the cuboid's corners in this 2D world?

In-Class Exercise 14: Run the Coordinates demo in coordinatesDemo directory. Then examine the code. Why is the eye position necessary?

In general, linear algebra provides a systematic approach to many standard geometric transformations:

Transformations within a coordinate system: rotations, reflections, translations.

Transformations between coordinate systems: 3D to 2D, camera-to-realworld, perspective.

1.5 Application #5: less is more

The goal of nearly every media format is to compress the raw data into something smaller.

There are fundamentally two types of compression:

Lossless compression which does not lose any information when "un-compressed" and tries to exploit redundancy.

Lossy compression which stands to gain in compression by accepting some non-human-detectable loss.

Let's look at image compression.

In-Class Exercise 15: Examine the simple image testimage.png in the compressionDemo directory. Now study the code in ImageExample.java, then compile and execute.

What is the size (total number of pixels) of the image?

What is the size of the png file?

Is this an example of lossless or lossy compression?

Linear algebra turns out to play a key role in lossy compression.

Let's take a look at an example.

In-Class Exercise 16: Try out the three demos in CompressionDemo.java one by one. (The first one is a one-dimensional image: a 1D array). In each case, compare the un-compressed image with the original to get a sense of how "lossy" the compression was.

1.6 Application #6: who is this person?

In the face recognition problem:

We have a database of face images (photos).

For a given query image (another face), we wish to find the "closest" face in the database.

This is merely a special case of the more general image search problem: find the closest image to given a query image in a database of images.

In-Class Exercise 17: Enter the eigenImageSearchDemo directory and examine the example images in trainingImages, and then the query images in queryImages. What are the closest images to the query images?

In-Class Exercise 18: Run the ImageSearchDemo. Compare the images found by the eigen algorithm with the images you found earlier. Now examine the images in the eigenImages directory. What do you notice? Examine the code. How is a query image transformed before providing it as input to linear algebra?

1.7 Application #7: genesis of a search engine

Consider a hyperlink graph:

Each node is a webpage.

If webpage A links to webpage B, the graph has a directed edge from A to B.

Now consider the page importance problem:

Among all the webpages that satisfy a certain search criterion, how to list them in order of importance or usefulness?

The general principle: a webpage that other webpages link to must be important.

In-Class Exercise 19: Suppose a webpage's importance is defined as the number of incoming edges in the hyperlink graph. Describe some advantages and disadvantages of this approach.

Consider instead a random walk approach:

Imagine a surfer following hyperlinks randomly, and for a very long time.
A webpage's importance is defined as the fraction of visits the surfer makes to that page.
⇒ the probability a surfer visits a node

In-Class Exercise 20: In the pageRankDemo directory, copy the file network1.txt to network.txt.

Draw the network by hand. Which node has the most incoming edges?

Run PageRankDemo. Click "start", and then "next" several times to see where the random surfer goes (yellow). The amount of red in a node is the visit fraction (an estimate of the visit probability).

Click "fast-run" and let it run for a minute, then "stop".

Examine the node visit probabilities (by estimate, and calculation). Which node has the highest visit probability?

In-Class Exercise 21: Copy the file network2.txt to network.txt.

Draw the network. In what way is it fundamentally different from network1?

Click "start" and then "fast-run" for a minute, then "stop". What do you observe?

Next, "googolize", then "start, fast-run etc". What do you observe? (We'll ignore the $alpha$ parameter for now, and explain that later.) What does "googolize" do?

In-Class Exercise 22: Examine the code. Where in the code does there appear to be linear algebra at work?

1.8 Application #8: is this text relevant?

The next step in search, beyond mere keyword-matching, is to find relevant text that matches the topic of interest
⇒ webpages whose text matches the topic of the search text.

In-Class Exercise 23: Provide an example of two chunks of text with the same topic but where a simple keyword match might not work to identify them as similar.

In-Class Exercise 24: Examine the files in wikinews/trainingText in the latentSADemo directory and group them into topics. Create by hand a 7 x 7 table, and in table cell (i,j), write down what you think is the correlation (a number between -1 and 1) in topic between text i and text j.

In-Class Exercise 25: Run the LSA demo and compare your correlations with the "before" and "after" correlations found by the demo. What are "stop words" and why do we remove them? Read through the code to see where linear algebra is used.

1.9 Application #9: if you liked that movie, try this one

The goal of a movie recommendation engine is to examine user ratings of movies and to make recommendations that accommodate individual preferences as revealed by the ratings.

For example:

Here:

There are M users and N movies in a table of ratings.

Users are rows, movies are columns.

Every movie rating is a number between 1 and 5.

Unknown ratings are represented by ?

We would like to predict these entries as part of developing a recommendation engine ("if you like X, we think you'll like Y"), by using the predicted missing entries.

In real world data, the ratings matrix is typically very sparse (most entries are missing).

In-Class Exercise 26: In the collabFilteringDemo, examine the file ratings.txt with 9 users and 6 movies.

What fraction of ratings are unknown?
Which movie is most popular?
What would go wrong if we merely sorted movies by their average rating and offered those as recommendations?
Can you identify users who are similar to each other in their ratings?

In-Class Exercise 27: Run CollabFilteringDemo and examine the predicted ratings. Read the code. What is the input and output when linear algebra is used to predict ratings?

1.10 Application #10: more bass, less treble

Signal processing is the engineering discipline devoted to detecting and processing electrical signals of all kinds, especially audio.

A type of audio processing we're all familiar with: adjusting bass or treble.

The Discrete Fourier Transform (DFT) is a key technique used in many signal processing applications.

And linear algebra provides the theoretical foundation for the DFT.

Let's see how decreasing the "treble" content (a high note) changes the sound.

In-Class Exercise 28: Enter the signalProcDemo, compile and execute DFT_Demo.java. Then, comment out playUnFiltered(), compile playFiltered, and execute.

1.11 Why study linear algebra?

Here's why:

The language of linear algebra (matrices, vectors) is widely used and assumed:

Used without explanation in other theoretical work.
Sometimes the use is simple, such as matrix inversion.
⇒ it's just assumed that you know the notation and meaning.

Linear algebra's applications stand on their own, as we've seen.

Linear algebra is a constructive theory
⇒ Most (but not all) results result in a useful algorithm

Linear algebra is a great yet accessible example of a one-course math theory:

It develops in digestible pieces.
The proofs need very little background
⇒ High-school math
It has both small, easy results, as well as deeper, beautiful results.
You can see how meta-level questions about a theory arise. For example:
- Basic question: how does one invert a matrix?
- Higher-level theoretical question: what types of matrices have inverses?
- Even more high-level: if a matrix doesn't have an inverse, is there a next-best thing to an inverse?

Linear algebra can be learned both symbolically (the usual math way) and computationally
⇒ writing code that "does linear algebra" is a fun way to learn

In-Class Exercise 29: Use a search to see how many PDF documents have the word "matrix". How many of these are scientific publications?

1.11 Sizes of problems

In practice, many of the applications above feature large amounts of data.

Consider this simple and common example:

This is a graphics/robotics application where:

A 3D object's surface (here, a legged robot) is modeled by thousands of polygons.
When objects interact, the application needs to compute points of contact and interaction forces.
The computation needs to be quick to simulate real time.

A key part of computing forces and motion involves solving simultaneous equations in many variables.

In the above example: approximately 500 variables in 500 equations every 10 milliseconds.
$\rhd$ Here's the data

Larger examples:

The movie ratings data set released by Netflix had a 100 million ratings from about half-million users.
Google routinely computes page rankings for more than a billion webpages.

1.12 The landscape of linear algebra

There are broadly three bodies of work in linear algebra:

Basic linear algebra. Core concepts of vectors and matrices, orthogonality, eigenvalues and SVD.
⇒ This course.

Numerical linear algebra. How to perform key computations (matrix inverse, eigenvalues, SVD) accurately and fast, especially for large data sets

This is generally quite difficult.
⇒ Even someone like Von Neumann admitted being challenged.
It's a specialization by itself.
It took 50+ years to develop robust matrix inversion.

Advanced theory. This is of purely mathematical interest, mostly theory for theory's sake.

1.13 The architecture of linear algebra

In a nutshell:

Linear algebra will feature two recurring themes.

The first one: linear combinations:

Suppose $a_1, a_2, \ldots, a_n$ are real numbers.

Consider this expression: $$ a_1 \boxed{\color{white}{\text{LyL}}}_1 + a_2 \boxed{\color{white}{\text{LyL}}}_2 + \ldots + a_n \boxed{\color{white}{\text{LyL}}}_n $$

For example: $$ 0.7 \boxed{\color{white}{\text{LyL}}}_1 + 4.23 \boxed{\color{white}{\text{LyL}}}_2 - 6.18 \boxed{\color{white}{\text{LyL}}}_3 + 0.0003 \boxed{\color{white}{\text{LyL}}}_4 $$

Here, we are taking a weighted combination of $n$ things.
⇒ This is a linear combination.

Linear algebra is really about such linear combinations.

Sometimes the boxes contain numbers.

At other times, they contain other mathematical objects: vectors, complex numbers, functions.

What is astonishing is that such a simple operation has so many applications and a rich theory.

The second one: linear transformations:

Think of a transformation as a function that takes a mathematical object and produces another (perhaps of the same type): $$ T \left( \boxed{\color{white}{\text{LyL}}} \right) = \boxed{\color{white}{\text{LyL}}}\; ' $$
What if the transformation had this property: $$ T \left( \alpha\; \boxed{\color{white}{\text{LyL}}} \right) = \alpha\; T \left( \boxed{\color{white}{\text{LyL}}} \right) $$
This is one notion of linearity.
More generally, a transformation is considered linear if $$ T \left( \alpha\; \boxed{\color{white}{\text{LyL}}}_1 + \beta\; \boxed{\color{white}{\text{LyL}}}_2 \right) = \alpha\; T \left( \boxed{\color{white}{\text{LyL}}}_1 \right) + \beta\; T \left( \boxed{\color{white}{\text{LyL}}}_2 \right) $$

The idea of a linear transformation will turn out to be an important abstraction for the theory in linear algebra. (Although, for practical applications, we won't need to use the idea.)

1.14 Goals of this course

By the end of this course, you should:

Be comfortable with the language of linear algebra
⇒ basic concepts, the standard notation

Get immersed in the core ideas and concepts of basic linear algebra
⇒ left alone on a deserted island, you should be able to reconstruct the theory

Develop an intuition for the main ideas.

Even though proofs are short, linear algebra is not necessarily intuitive.
Combining the symbolic with the computational should strengthen intuition.

Grapple with the proofs of the main results.

Understand exactly how the core concepts are used in the above applications.

Understand the architecture of linear algebra: how the different "chapters" fit together.

Be able to, 10 years from now, recognize linear algebra when you see it, and be able to review it easily.

Fall in love.
⇒ With the beauty of linear algebra, that is.

1.15 The ultimate payoff: QM and QC, the two killer apps of linear algebra

QM = Quantum Mechanics

Quantum mechanics is the theory of light and matter at the microscopic level.

Both light and matter have strange, probabilistic properties that cannot be explained by conventional (Newtonian) physics.

Enter QM, based on the linear algebra of complex vectors.

QM has been spectacularly successful in mathematically explaining (and predicting) the properties of light, matter, and their interactions all the way down to the zoo of sub-atomic particles (like quarks).

Some of the linear algebraic elements:

The state of an entity (like an electron) is described by a complex vector.
The coefficients in the linear combination relate to probabilities.
Observations are defined by special kinds of matrices
When observed, the resulting state is an eigenvector of the observation matrix.

QC = Quantum Computing

Quantum computing is about using microscopic hardware that exploits the strange quantum properties of light and matter.
Instead of regular bits, one builds qubits, whose values can go beyond 0 and 1 to any linear combination.
By cleverly arranging for interactions between qubits, one can compute.
One surprising application: integers can be factored fast enough to break conventional cryptography.
A related application: one can use 'flying qubits' to implement cryptographically unbreakable communication.
Other fascinating features of QC: quantum teleportation, quantum search that's faster than conventional search.
All of QC is based on linear algebra:
- Qubits are complex vectors.
- Operations on qubits are (unitary) matrices.
- Observations of qubits are (Hermitian) matrices.
- Results of observations are eigenvectors of observation matrices.
Coming soon ... a course on QC.