Least-Squares Approximation

Problem Statement

Problem: You are given a set of points.

You want to model this physical phenomena by building a function that behaves like what we think the underlying function behaves like.

We’ve already done this with polynomial interpolation. Polynomial interpolation has its issues though, because it’s a polynomial.

If you have 100 points, using Newton’s polynomial to get a 99 degree polynomial would be crazy.

e.g., if you have 100 points that follow a simple quadratic function, Newton’s sh*t’s itself; so let’s work the other direction and just try to find a,b,c in ax^2 + bx + c; or m,b in a line, etc.

We want a curve that is “closest to” all the points.

This means computing the distance between the point and the curve.

Deriving ???:

\text{Cumulative Distance Function: } \Psi_i = \sum_i | f(x_i) - y_i |

In the end, we’ll be defining \Phi as a vector of coefficients so we can combine it linearly, but for this example it’s just a function.
Notation note: f is phi

This function is just the sum of the differences between the actual and estimated points.

Suppose f(x)=ax + b.

We want to minimize this.

\Psi_i = \sum_i | ax_i + b - y_i |

In other words, we want to find the point where i is the least possible.
- In other other words, we want to derivate w.r.t. to the unknowns, this means taking a partial derivative because we have two unknowns. This doesn’t work though, because we just get two constants.

Instead of minimizing distance, we can minimize the square of the distances.

\Psi_i = \sum_i | ax_i + b - y_i |^2

The partial derivative of psi w.r.t. a is:

\frac{d \Psi}{d a} = \sum_i 2 ( ax_i + b - y_i ) x_i \\ = \sum_i 2(ax_i^2 + bx_i - x_i y_i)

\frac{d \Psi}{d b} = \sum_i 2 ( ax_i + b - y_i ) 1 \\

To minimize, we make it equal to zero.

0 = \sum_i 2(ax_i + b - y_i) \\ \text{skipping some steps...}

Now we have a system of equations:

a \sum_i x_i^2 + b \sum x_i = \sum_i y_i x_i

a \sum_i x_i + b \sum_i 1 = \sum_i y_i

This ??? gives us the minimized square of the differences.

We can generalize this to any function to find the regression of any function, next class.

Then we can put this in an augmented matrix and do Gaussian elimination on it and we have our ???.

TODO Example augmented matrix force.

Deriving I

Example:

(0,3.5), (1,5.2), (2, 7.8), (3, 12.6), (4,19.2)

As you remember, the system looks like:

\begin{bmatrix} &\sum x_i^2 &\sum x_i &\sum x_i y_i \\ &\sum x_i &\sum n &\sum y_i \\ \end{bmatrix} TODO Cleanup this matrix

Let’s calculate all of numbers.

\sum x_i = ... = 10 \\ \sum y_i = ... = 48.3 \\ \sum x_i y_i = ... = 135.4 \\ \sum x_i^2 = ... = 30 \\

Therefore, the matrix is:

\begin{bmatrix} &30 &10 &135.4 \\ &10 &5 &48.3 \\ \end{bmatrix}

Doing Gaussian elimination (or in this case, just doing simple arithmetic), we get:

a = .88 \quad b= 1.9

TODO what happened in-between here and the next line???

Therefore, y=3.88x + 1.9

This is called linear regression

We can see this is a linear combination of two functions.

Linear Combination: Sum of those variables multiplied with constant coefficients. The sum of the product of those functions multiplied by a constant coefficient (TODO I have no idea what this means)

Deriving II:

Basis Functions: When you represent a ??? as a linear combination.

How:

For the function:

f(x)=c_1 g_1(x) + c_2 g_2 (x) + ... + c_n g_n(x)

The basis is: \{ TODO \}

The problem is now to find the constants (c_n) that minimize the points between the points and the function.

We can write the solution as:

f(x)= \sum c_i g_i (x) TODO Why what how???

Applying least squares for m points, we want to minimize

TODO The phi function and summation with the square

Taking the partial derivative, we get ??? (something something everything else is a constant w.r.t c_1)

Thus,

d \Phi / d i = \sum_{i=1}^m 2 \left( TODO \right)

n = number of functions in the basis of the function
- i iterates over this
m = Number of points
- j iterates over this
k iterates over the number of differences
- We have to change the variables a bit because we have so many iteration variables, so don’t get confused.

Like before, we make this zero. But this time, we’ll have n derivatives (because we have n functions).

Now, the system of equations becomes

TODO

Which, in an augmented matrix is

TODO

===

ASIDE

Suppose the function I want to approximate is a quadratic function.

THen clearly, my basis will be

G = \{ 1, x, x^2\}

So the f(x) now looks like this.

f(x)=ax^2 + bx + c

(p.s. we may wish to name the constants c_i to avoid running out of vars, but for this example we shant)

Put the function basis vertically, to the left of the augmented matrix, to signal what is multiplied by what.

TODO Copy Diagram 1 in from my paper

===

Example: The first reala example of Quadratic Regression

(1,2.1), (2,4.6), (3,9.8), (4, 17.3), (5, 27.2)

Aside — Unlike polynomial interpolation, we can’t plug in the points.

Use the basis G = \{ 1, x ,x^2\}

TODO Augmented matrice (re use diagram 1, just fill-in the summations) TODo

Note: If we know polynomial regression new could figure out the basis mathematically, but for now we’ll just be either told the basis or told to figure it out by just eyeballing it.

Speeding ahead, we get a=2.65, b=-7.67, c=7.88

So, the answer is f(x) = 2.65x^2 - 7.67x + 7.88

Tip: The final will not be a polynomial, and will be a 4x4, so program my calculator to solve Sys. of Equations.