Goal: Approximate a function from a set of points.
Yes, we’ve already “done this” with polynomial interpolation, though polynomial interpolation has its issues.
Example — If you have 100 points that follow a simple quadratic function like y=x^2, using Newton’s method to construct a crackpot 99-degree polynomial has limited use.
LSQA works in the other direction and just tries to find the right a,b,c for ax^2 + bx + c, or m,b in a line, etc.; that minimizes the distance between the curve and the data points without trying to perfectly go through all of them.
On Notation — I’m always loosey-goosey with my notation, but this page especially so. Every \sum on this page has limits from i=1 to n.
Least Squares Approximation:
\def\arraystretch{1.4} \boxed{ \begin{array}{cc} & \begin{matrix} \quad g_0 \quad & \quad g_1 \quad & \dots & \quad y \quad \end{matrix} \\ \begin{matrix} g_0 \\ g_1 \\ \vdots \end{matrix} & \left[ \begin{array}{ccc|c} \sum g_0 g_0 & \sum g_0 g_1 & \dots & \sum g_0 y \\ \sum g_1 g_0 & \sum g_1 g_1 & \dots & \sum g_1 y \\ \vdots & \vdots & \ddots & \vdots \end{array} \right] \end{array} }
Q:
Construct the matrix for the basis G = \{ x^2, x^1, x \}
A:
\def\arraystretch{1.5} \begin{array}{cc} & \begin{matrix} \quad x^2 \quad & \quad x \quad & \quad 1 \quad & \quad y \quad \end{matrix} \\ \begin{matrix} x^2 \\ x \\ 1 \end{matrix} & \left[ \begin{array}{ccc|c} \phantom{\sum x_i^4} & \phantom{\sum x_i^3} & \phantom{\sum x_i^2} & \phantom{\sum x_i^2 y_i} \\ & & & \\ & & & \end{array} \right] \end{array} \\~\\ \downarrow \text{(multiply rows and columns)} \downarrow \\~\\ \begin{array}{cc} & \begin{matrix} \quad x^2 \quad & \quad x \quad & \quad 1 \quad & \quad y \quad \end{matrix} \\ \begin{matrix} x^2 \\ x \\ 1 \end{matrix} & \left[ \begin{array}{ccc|c} \sum x_i^4 & \sum x_i^3 & \sum x_i^2 & \sum x_i^2 y_i \\ \sum x_i^3 & \sum x_i^2 & \sum x_i & \sum x_i y_i \\ \sum x_i^2 & \sum x_i & n & \sum y_i \end{array} \right] \end{array}
Note — Order doesn’t matter as long as y is in the right spot, because x^2 + x = 10 is same as x + x^2 = 10
At this point, if we had a real dataset, we’d calculate the summations and just solve the linear system of equations.
Q:
Find the quadratic regression of the following dataset:
(1,2.1), (2,4.6), (3,9.8), (4, 17.3), (5, 27.2)
A:
We will use the basis G = \{ x^2, x, 1 \}.
First, calculate the necessary summations (n=5):
Now, construct the augmented matrix using the structure from the previous section:
\def\arraystretch{1.4} \left[ \begin{array}{ccc|c} \sum x^4 & \sum x^3 & \sum x^2 & \sum x^2 y \\ \sum x^3 & \sum x^2 & \sum x & \sum x y \\ \sum x^2 & \sum x & n & \sum y \end{array} \right] \Rightarrow \left[ \begin{array}{ccc|c} 979 & 225 & 55 & 1065.5 \\ 225 & 55 & 15 & 245.9 \\ 55 & 15 & 5 & 61.0 \end{array} \right]
Solving this system, we get: a \approx 1.086, b \approx -0.449, c = 1.78
So, the answer is: f(x) = 1.086x^2 - 0.449x + 1.78
Tip: The final exam question will likely not be a polynomial, and will be a 4x4, so program my calculator to solve Systems of Equations.
We turn the problem into an error-minimizing problem (where the error is the distance between the curve and data points) and get a neat general form.
In some more—but not that much—detail, we:
Application is pretty easy, we just: